mbox series

[kernel,v2,0/2] KVM: PPC: Check if IOMMU page is contained in the pinned physical page

Message ID 20180626055926.27703-1-aik@ozlabs.ru
Headers show
Series KVM: PPC: Check if IOMMU page is contained in the pinned physical page | expand

Message

Alexey Kardashevskiy June 26, 2018, 5:59 a.m. UTC
This is to improve page boundaries checking and should probably
be cc:stable. I came accross this while debugging nvlink2 passthrough
but the lack of checking might be exploited by the existing userspace.

Changes:
v2:
* 2/2: explicitly check for compound pages before calling compound_order()


Please comment. Thanks.



Alexey Kardashevskiy (2):
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page

 arch/powerpc/include/asm/mmu_context.h |  4 ++--
 arch/powerpc/kvm/book3s_64_vio.c       |  2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c    |  6 ++++--
 arch/powerpc/mm/mmu_context_iommu.c    | 20 +++++++++++++++++---
 drivers/vfio/vfio_iommu_spapr_tce.c    | 10 +++++-----
 5 files changed, 29 insertions(+), 13 deletions(-)

Comments

Michael Ellerman June 29, 2018, 1:55 a.m. UTC | #1
Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> This is to improve page boundaries checking and should probably
> be cc:stable. I came accross this while debugging nvlink2 passthrough
> but the lack of checking might be exploited by the existing userspace.

Do you really mean "exploited" ? As in there's a security issue?

Your change log for patch 2 sort of suggests that but then says that
without the fix you just hit an error in vfio code.

So I'm not clear on what the exposure is.

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kardashevskiy June 29, 2018, 3 a.m. UTC | #2
On Fri, 29 Jun 2018 11:55:40 +1000
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > This is to improve page boundaries checking and should probably
> > be cc:stable. I came accross this while debugging nvlink2 passthrough
> > but the lack of checking might be exploited by the existing userspace.  
> 
> Do you really mean "exploited" ? As in there's a security issue?
> 
> Your change log for patch 2 sort of suggests that but then says that
> without the fix you just hit an error in vfio code.


The bug is that I can easily make unmodified guest use 16MB IOMMU pages
while guest RAM is backed with system 64K pages so unless the guest RAM
is allocated contigously (which is unlikely), a 16MB TCE will provide
the hardware access to the host physical memory it is not supposed to
have access to, which is 16MB minus first 64K.

There is a fast path for H_PUT_TCE - via KVM - there is no contained
test.

There is a slow path for H_PUT_TCE - via VFIO ioctl() - there is a
contained test.

Because of a different feature of VFIO on sPAPR (it stores an array of
userspace addresses which we received from QEMU and translated to host
physical addresses and programmed those to the TCE table) we do not take
the fast path on the very first H_PUT_TCE (because I allocate the
array when the slow path is taken the very first time), fail there,
pass the failure to the guest the guest decides that is over.

But a modified guest could ignore that initial H_PUT_TCE failure and
simply repeat H_PUT_TCE again - this time it will take the fast path
and allow the bad mapping.


> So I'm not clear on what the exposure is.
> 
> cheers



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Gibson June 29, 2018, 4:14 a.m. UTC | #3
On Fri, Jun 29, 2018 at 01:00:07PM +1000, Alexey Kardashevskiy wrote:
> On Fri, 29 Jun 2018 11:55:40 +1000
> Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> > Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > 
> > > This is to improve page boundaries checking and should probably
> > > be cc:stable. I came accross this while debugging nvlink2 passthrough
> > > but the lack of checking might be exploited by the existing userspace.  
> > 
> > Do you really mean "exploited" ? As in there's a security issue?
> > 
> > Your change log for patch 2 sort of suggests that but then says that
> > without the fix you just hit an error in vfio code.
> 
> 
> The bug is that I can easily make unmodified guest use 16MB IOMMU pages
> while guest RAM is backed with system 64K pages so unless the guest RAM
> is allocated contigously (which is unlikely), a 16MB TCE will provide
> the hardware access to the host physical memory it is not supposed to
> have access to, which is 16MB minus first 64K.
> 
> There is a fast path for H_PUT_TCE - via KVM - there is no contained
> test.
> 
> There is a slow path for H_PUT_TCE - via VFIO ioctl() - there is a
> contained test.
> 
> Because of a different feature of VFIO on sPAPR (it stores an array of
> userspace addresses which we received from QEMU and translated to host
> physical addresses and programmed those to the TCE table) we do not take
> the fast path on the very first H_PUT_TCE (because I allocate the
> array when the slow path is taken the very first time), fail there,
> pass the failure to the guest the guest decides that is over.
> 
> But a modified guest could ignore that initial H_PUT_TCE failure and
> simply repeat H_PUT_TCE again - this time it will take the fast path
> and allow the bad mapping.

In short, yes, it's an exploitable security hole in the host.  An
unmodified Linux guest kernel just doesn't happen to exploit it, even
if the guest userspace tries to get it to.