diff mbox series

[kernel] KVM: Don't null dereference ops->destroy

Message ID 20220524055208.1269279-1-aik@ozlabs.ru
State New
Headers show
Series [kernel] KVM: Don't null dereference ops->destroy | expand

Commit Message

Alexey Kardashevskiy May 24, 2022, 5:52 a.m. UTC
There are 2 places calling kvm_device_ops::destroy():
1) when creating a KVM device failed;
2) when a VM is destroyed: kvm_destroy_devices() destroys all devices
from &kvm->devices.

All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native)
do not define kvm_device_ops::destroy() and only define release() which
is normally fine as device fds are closed before KVM gets to 2) but
by then the &kvm->devices list is empty.

However Syzkaller manages to trigger 1).

This adds checks in 1) and 2).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

I could define empty handlers for XICS/XIVE guys but
kvm_ioctl_create_device() already checks for ops->init() so I guess
kvm_device_ops are expected to not have certain handlers.

---
 virt/kvm/kvm_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Sean Christopherson May 24, 2022, 8:01 p.m. UTC | #1
On Tue, May 24, 2022, Alexey Kardashevskiy wrote:
> There are 2 places calling kvm_device_ops::destroy():
> 1) when creating a KVM device failed;
> 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices
> from &kvm->devices.
> 
> All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native)
> do not define kvm_device_ops::destroy() and only define release() which
> is normally fine as device fds are closed before KVM gets to 2) but
> by then the &kvm->devices list is empty.
> 
> However Syzkaller manages to trigger 1).
> 
> This adds checks in 1) and 2).
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> I could define empty handlers for XICS/XIVE guys but
> kvm_ioctl_create_device() already checks for ops->init() so I guess
> kvm_device_ops are expected to not have certain handlers.

Oof.  IMO, ->destroy() should be mandatory in order to pair with ->create().
kvmppc_xive_create(), kvmppc_xics_create(), and kvmppc_core_destroy_vm() are doing
some truly funky stuff to avoid leaking the device that's allocate in ->create().
A nop/dummy ->destroy() would be a good place to further document those shenanigans.
There's a comment at the end of the ->release() hooks, but that's still not very
obvious.

The comment above kvmppc_xive_get_device() strongly suggests that keeping the
allocation is a hack to avoid having to audit all relevant code paths, i.e. isn't
done for performance reasons.
Alexey Kardashevskiy May 25, 2022, 1:52 a.m. UTC | #2
On 5/25/22 06:01, Sean Christopherson wrote:
> On Tue, May 24, 2022, Alexey Kardashevskiy wrote:
>> There are 2 places calling kvm_device_ops::destroy():
>> 1) when creating a KVM device failed;
>> 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices
>> from &kvm->devices.
>>
>> All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native)
>> do not define kvm_device_ops::destroy() and only define release() which
>> is normally fine as device fds are closed before KVM gets to 2) but
>> by then the &kvm->devices list is empty.
>>
>> However Syzkaller manages to trigger 1).
>>
>> This adds checks in 1) and 2).
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> I could define empty handlers for XICS/XIVE guys but
>> kvm_ioctl_create_device() already checks for ops->init() so I guess
>> kvm_device_ops are expected to not have certain handlers.
> 
> Oof.  IMO, ->destroy() should be mandatory in order to pair with ->create().
> kvmppc_xive_create(), kvmppc_xics_create(), and kvmppc_core_destroy_vm() are doing
> some truly funky stuff to avoid leaking the device that's allocate in ->create().

Huh it used to be release() actually, nice:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5422e95103cf9663bc


> A nop/dummy ->destroy() would be a good place to further document those shenanigans.
> There's a comment at the end of the ->release() hooks, but that's still not very
> obvious.

I could probably borrow some docs from here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f868405faf067e8cfb


Thanks for the review! At very least I'll add the dummies.


> The comment above kvmppc_xive_get_device() strongly suggests that keeping the
> allocation is a hack to avoid having to audit all relevant code paths, i.e. isn't
> done for performance reasons.
Alexey Kardashevskiy May 25, 2022, 2:58 a.m. UTC | #3
On 5/25/22 11:52, Alexey Kardashevskiy wrote:
> 
> 
> On 5/25/22 06:01, Sean Christopherson wrote:
>> On Tue, May 24, 2022, Alexey Kardashevskiy wrote:
>>> There are 2 places calling kvm_device_ops::destroy():
>>> 1) when creating a KVM device failed;
>>> 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices
>>> from &kvm->devices.
>>>
>>> All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, 
>>> XIVE-native)
>>> do not define kvm_device_ops::destroy() and only define release() which
>>> is normally fine as device fds are closed before KVM gets to 2) but
>>> by then the &kvm->devices list is empty.
>>>
>>> However Syzkaller manages to trigger 1).
>>>
>>> This adds checks in 1) and 2).
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>
>>> I could define empty handlers for XICS/XIVE guys but
>>> kvm_ioctl_create_device() already checks for ops->init() so I guess
>>> kvm_device_ops are expected to not have certain handlers.
>>
>> Oof.  IMO, ->destroy() should be mandatory in order to pair with 
>> ->create().


After reading
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e 
and neighboring commits, it sounds that each create() should be paired 
with either destroy() or release() but not necessarily both.

So I'm really not sure dummy handlers is a good idea. Thanks,


>> kvmppc_xive_create(), kvmppc_xics_create(), and 
>> kvmppc_core_destroy_vm() are doing
>> some truly funky stuff to avoid leaking the device that's allocate in 
>> ->create().
> 
> Huh it used to be release() actually, nice:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5422e95103cf9663bc
> 
> 
>> A nop/dummy ->destroy() would be a good place to further document 
>> those shenanigans.
>> There's a comment at the end of the ->release() hooks, but that's 
>> still not very
>> obvious.
> 
> I could probably borrow some docs from here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f868405faf067e8cfb
> 
> 
> Thanks for the review! At very least I'll add the dummies.
> 
> 
>> The comment above kvmppc_xive_get_device() strongly suggests that 
>> keeping the
>> allocation is a hack to avoid having to audit all relevant code paths, 
>> i.e. isn't
>> done for performance reasons.
Paolo Bonzini May 25, 2022, 7:47 a.m. UTC | #4
On 5/25/22 04:58, Alexey Kardashevskiy wrote:
>>>
> 
> 
> After reading
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e 
> and neighboring commits, it sounds that each create() should be paired 
> with either destroy() or release() but not necessarily both.

I agree, if release() is implemented then destroy() will never be called 
(except in error situations).

kvm_destroy_devices() should not be touched, except to add a WARN_ON 
perhaps.

> So I'm really not sure dummy handlers is a good idea. Thanks,

But in that case shouldn't kvm_ioctl_create_device also try 
ops->release, i.e.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6d971fb1b08d..f265e2738d46 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4299,8 +4299,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
  		kvm_put_kvm_no_destroy(kvm);
  		mutex_lock(&kvm->lock);
  		list_del(&dev->vm_node);
+		if (ops->release)
+			ops->release(dev);
  		mutex_unlock(&kvm->lock);
-		ops->destroy(dev);
+		if (ops->destroy)
+			ops->destroy(dev);
  		return ret;
  	}

?

Paolo
Alexey Kardashevskiy May 31, 2022, 4:32 a.m. UTC | #5
On 5/25/22 17:47, Paolo Bonzini wrote:
> On 5/25/22 04:58, Alexey Kardashevskiy wrote:
>>>>
>>
>>
>> After reading
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e and neighboring commits, it sounds that each create() should be paired with either destroy() or release() but not necessarily both.
> 
> I agree, if release() is implemented then destroy() will never be called 
> (except in error situations).
> 
> kvm_destroy_devices() should not be touched, except to add a WARN_ON 
> perhaps.

I'll leave it as is.

> 
>> So I'm really not sure dummy handlers is a good idea. Thanks,
> 
> But in that case shouldn't kvm_ioctl_create_device also try 
> ops->release, i.e.
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6d971fb1b08d..f265e2738d46 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4299,8 +4299,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
>           kvm_put_kvm_no_destroy(kvm);
>           mutex_lock(&kvm->lock);
>           list_del(&dev->vm_node);
> +        if (ops->release)
> +            ops->release(dev);
>           mutex_unlock(&kvm->lock);
> -        ops->destroy(dev);
> +        if (ops->destroy)
> +            ops->destroy(dev);


btw why is destroy() not under the kvm->lock here? The comment in 
kvm_destroy_devices() suggests that it is an exception there but not 
necessarily here. Thanks,



>           return ret;
>       }
> 
> ?
> 
> Paolo
>
Paolo Bonzini May 31, 2022, 5:13 p.m. UTC | #6
On 5/31/22 06:32, Alexey Kardashevskiy wrote:
>> -        ops->destroy(dev); 
>> +        if (ops->destroy)
>> +            ops->destroy(dev);
> 
> 
> btw why is destroy() not under the kvm->lock here? The comment in 
> kvm_destroy_devices() suggests that it is an exception there but not 
> necessarily here. Thanks,

The comment refers to walking the list.  The ops->destroy contract is 
that it's called outside kvm->lock, and that's followed in both places 
(both before and after the suggested patch).

Paolo
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f30bb8c16f26..17f698ccddd1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1205,7 +1205,8 @@  static void kvm_destroy_devices(struct kvm *kvm)
 	 */
 	list_for_each_entry_safe(dev, tmp, &kvm->devices, vm_node) {
 		list_del(&dev->vm_node);
-		dev->ops->destroy(dev);
+		if (dev->ops->destroy)
+			dev->ops->destroy(dev);
 	}
 }
 
@@ -4300,7 +4301,8 @@  static int kvm_ioctl_create_device(struct kvm *kvm,
 		mutex_lock(&kvm->lock);
 		list_del(&dev->vm_node);
 		mutex_unlock(&kvm->lock);
-		ops->destroy(dev);
+		if (ops->destroy)
+			ops->destroy(dev);
 		return ret;
 	}