Message ID | 20220524055208.1269279-1-aik@ozlabs.ru |
---|---|
State | New |
Headers | show |
Series | [kernel] KVM: Don't null dereference ops->destroy | expand |
On Tue, May 24, 2022, Alexey Kardashevskiy wrote: > There are 2 places calling kvm_device_ops::destroy(): > 1) when creating a KVM device failed; > 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices > from &kvm->devices. > > All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native) > do not define kvm_device_ops::destroy() and only define release() which > is normally fine as device fds are closed before KVM gets to 2) but > by then the &kvm->devices list is empty. > > However Syzkaller manages to trigger 1). > > This adds checks in 1) and 2). > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > > I could define empty handlers for XICS/XIVE guys but > kvm_ioctl_create_device() already checks for ops->init() so I guess > kvm_device_ops are expected to not have certain handlers. Oof. IMO, ->destroy() should be mandatory in order to pair with ->create(). kvmppc_xive_create(), kvmppc_xics_create(), and kvmppc_core_destroy_vm() are doing some truly funky stuff to avoid leaking the device that's allocate in ->create(). A nop/dummy ->destroy() would be a good place to further document those shenanigans. There's a comment at the end of the ->release() hooks, but that's still not very obvious. The comment above kvmppc_xive_get_device() strongly suggests that keeping the allocation is a hack to avoid having to audit all relevant code paths, i.e. isn't done for performance reasons.
On 5/25/22 06:01, Sean Christopherson wrote: > On Tue, May 24, 2022, Alexey Kardashevskiy wrote: >> There are 2 places calling kvm_device_ops::destroy(): >> 1) when creating a KVM device failed; >> 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices >> from &kvm->devices. >> >> All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native) >> do not define kvm_device_ops::destroy() and only define release() which >> is normally fine as device fds are closed before KVM gets to 2) but >> by then the &kvm->devices list is empty. >> >> However Syzkaller manages to trigger 1). >> >> This adds checks in 1) and 2). >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> I could define empty handlers for XICS/XIVE guys but >> kvm_ioctl_create_device() already checks for ops->init() so I guess >> kvm_device_ops are expected to not have certain handlers. > > Oof. IMO, ->destroy() should be mandatory in order to pair with ->create(). > kvmppc_xive_create(), kvmppc_xics_create(), and kvmppc_core_destroy_vm() are doing > some truly funky stuff to avoid leaking the device that's allocate in ->create(). Huh it used to be release() actually, nice: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5422e95103cf9663bc > A nop/dummy ->destroy() would be a good place to further document those shenanigans. > There's a comment at the end of the ->release() hooks, but that's still not very > obvious. I could probably borrow some docs from here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f868405faf067e8cfb Thanks for the review! At very least I'll add the dummies. > The comment above kvmppc_xive_get_device() strongly suggests that keeping the > allocation is a hack to avoid having to audit all relevant code paths, i.e. isn't > done for performance reasons.
On 5/25/22 11:52, Alexey Kardashevskiy wrote: > > > On 5/25/22 06:01, Sean Christopherson wrote: >> On Tue, May 24, 2022, Alexey Kardashevskiy wrote: >>> There are 2 places calling kvm_device_ops::destroy(): >>> 1) when creating a KVM device failed; >>> 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices >>> from &kvm->devices. >>> >>> All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, >>> XIVE-native) >>> do not define kvm_device_ops::destroy() and only define release() which >>> is normally fine as device fds are closed before KVM gets to 2) but >>> by then the &kvm->devices list is empty. >>> >>> However Syzkaller manages to trigger 1). >>> >>> This adds checks in 1) and 2). >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> >>> I could define empty handlers for XICS/XIVE guys but >>> kvm_ioctl_create_device() already checks for ops->init() so I guess >>> kvm_device_ops are expected to not have certain handlers. >> >> Oof. IMO, ->destroy() should be mandatory in order to pair with >> ->create(). After reading https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e and neighboring commits, it sounds that each create() should be paired with either destroy() or release() but not necessarily both. So I'm really not sure dummy handlers is a good idea. Thanks, >> kvmppc_xive_create(), kvmppc_xics_create(), and >> kvmppc_core_destroy_vm() are doing >> some truly funky stuff to avoid leaking the device that's allocate in >> ->create(). > > Huh it used to be release() actually, nice: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5422e95103cf9663bc > > >> A nop/dummy ->destroy() would be a good place to further document >> those shenanigans. >> There's a comment at the end of the ->release() hooks, but that's >> still not very >> obvious. > > I could probably borrow some docs from here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f868405faf067e8cfb > > > Thanks for the review! At very least I'll add the dummies. > > >> The comment above kvmppc_xive_get_device() strongly suggests that >> keeping the >> allocation is a hack to avoid having to audit all relevant code paths, >> i.e. isn't >> done for performance reasons.
On 5/25/22 04:58, Alexey Kardashevskiy wrote: >>> > > > After reading > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e > and neighboring commits, it sounds that each create() should be paired > with either destroy() or release() but not necessarily both. I agree, if release() is implemented then destroy() will never be called (except in error situations). kvm_destroy_devices() should not be touched, except to add a WARN_ON perhaps. > So I'm really not sure dummy handlers is a good idea. Thanks, But in that case shouldn't kvm_ioctl_create_device also try ops->release, i.e. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6d971fb1b08d..f265e2738d46 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4299,8 +4299,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm, kvm_put_kvm_no_destroy(kvm); mutex_lock(&kvm->lock); list_del(&dev->vm_node); + if (ops->release) + ops->release(dev); mutex_unlock(&kvm->lock); - ops->destroy(dev); + if (ops->destroy) + ops->destroy(dev); return ret; } ? Paolo
On 5/25/22 17:47, Paolo Bonzini wrote: > On 5/25/22 04:58, Alexey Kardashevskiy wrote: >>>> >> >> >> After reading >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bde9b3ec8bdf60788e9e and neighboring commits, it sounds that each create() should be paired with either destroy() or release() but not necessarily both. > > I agree, if release() is implemented then destroy() will never be called > (except in error situations). > > kvm_destroy_devices() should not be touched, except to add a WARN_ON > perhaps. I'll leave it as is. > >> So I'm really not sure dummy handlers is a good idea. Thanks, > > But in that case shouldn't kvm_ioctl_create_device also try > ops->release, i.e. > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 6d971fb1b08d..f265e2738d46 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4299,8 +4299,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm, > kvm_put_kvm_no_destroy(kvm); > mutex_lock(&kvm->lock); > list_del(&dev->vm_node); > + if (ops->release) > + ops->release(dev); > mutex_unlock(&kvm->lock); > - ops->destroy(dev); > + if (ops->destroy) > + ops->destroy(dev); btw why is destroy() not under the kvm->lock here? The comment in kvm_destroy_devices() suggests that it is an exception there but not necessarily here. Thanks, > return ret; > } > > ? > > Paolo >
On 5/31/22 06:32, Alexey Kardashevskiy wrote: >> - ops->destroy(dev); >> + if (ops->destroy) >> + ops->destroy(dev); > > > btw why is destroy() not under the kvm->lock here? The comment in > kvm_destroy_devices() suggests that it is an exception there but not > necessarily here. Thanks, The comment refers to walking the list. The ops->destroy contract is that it's called outside kvm->lock, and that's followed in both places (both before and after the suggested patch). Paolo
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f30bb8c16f26..17f698ccddd1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1205,7 +1205,8 @@ static void kvm_destroy_devices(struct kvm *kvm) */ list_for_each_entry_safe(dev, tmp, &kvm->devices, vm_node) { list_del(&dev->vm_node); - dev->ops->destroy(dev); + if (dev->ops->destroy) + dev->ops->destroy(dev); } } @@ -4300,7 +4301,8 @@ static int kvm_ioctl_create_device(struct kvm *kvm, mutex_lock(&kvm->lock); list_del(&dev->vm_node); mutex_unlock(&kvm->lock); - ops->destroy(dev); + if (ops->destroy) + ops->destroy(dev); return ret; }
There are 2 places calling kvm_device_ops::destroy(): 1) when creating a KVM device failed; 2) when a VM is destroyed: kvm_destroy_devices() destroys all devices from &kvm->devices. All 3 Book3s's interrupt controller KVM devices (XICS, XIVE, XIVE-native) do not define kvm_device_ops::destroy() and only define release() which is normally fine as device fds are closed before KVM gets to 2) but by then the &kvm->devices list is empty. However Syzkaller manages to trigger 1). This adds checks in 1) and 2). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- I could define empty handlers for XICS/XIVE guys but kvm_ioctl_create_device() already checks for ops->init() so I guess kvm_device_ops are expected to not have certain handlers. --- virt/kvm/kvm_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)