diff mbox

[v3,04/12] memory: fix address_space_get_iotlb_entry()

Message ID 1494403315-12760-5-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu May 10, 2017, 8:01 a.m. UTC
This function has an assumption that we will definitely call translate()
once (or say, the addr will be located inside one IOMMU memory region),
otherwise an empty IOTLB will be returned. Nevertheless, this is not
what we want. When there is no IOMMU memory region, we should build up a
static mapping for the caller, instead of an invalid IOTLB.

We won't trigger this path before VT-d passthrough mode. When
passthrough mode for a vhost device is setup, VT-d is possible to
disable the IOMMU region for that device. Without current patch, we'll
get a vhost boot failure, and it'll be failed over to virtio userspace
mode.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 exec.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

David Gibson May 11, 2017, 1:56 a.m. UTC | #1
On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> This function has an assumption that we will definitely call translate()
> once (or say, the addr will be located inside one IOMMU memory region),
> otherwise an empty IOTLB will be returned. Nevertheless, this is not
> what we want. When there is no IOMMU memory region, we should build up a
> static mapping for the caller, instead of an invalid IOTLB.
> 
> We won't trigger this path before VT-d passthrough mode. When
> passthrough mode for a vhost device is setup, VT-d is possible to
> disable the IOMMU region for that device. Without current patch, we'll
> get a vhost boot failure, and it'll be failed over to virtio userspace
> mode.

This doesn't look right to me.  You're assuming the target is
address_space_memory, which might not be the case - and you should be
able to check from the MR you do hit.  Furthermore it doesn't look
like you're accounting for the trivial translation if the section's
offset in the address space is different from its offset in the MR.

I think for the fallback path you're going to want something based on
address_space_translate() instead.

> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Jason Wang <jasowang@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  exec.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index 072de5d..5cfdacd 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -463,12 +463,13 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
>  }
>  
>  /* Called from RCU critical section */
> -IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr iova,
>                                              bool is_write)
>  {
>      IOMMUTLBEntry iotlb = {0};
>      MemoryRegionSection *section;
>      MemoryRegion *mr;
> +    hwaddr addr = iova, psize;
>  
>      for (;;) {
>          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
> @@ -478,6 +479,23 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
>          mr = section->mr;
>  
>          if (!mr->iommu_ops) {
> +            /*
> +             * We didn't translate() but reached here. It possibly
> +             * means it's a static mapping. If so (it should be RAM),
> +             * we set the IOTLB up.
> +             */
> +            if (!iotlb.target_as && memory_region_is_ram(mr) &&
> +                !memory_region_is_rom(mr)) {
> +                psize = mr->ram_block->page_size;
> +                iova &= ~(psize - 1);
> +                iotlb = (IOMMUTLBEntry) {
> +                    .target_as = &address_space_memory,
> +                    .iova = iova,
> +                    .translated_addr = iova,
> +                    .addr_mask = psize - 1,
> +                    .perm = IOMMU_RW,
> +                };
> +            }
>              break;
>          }
>
Peter Xu May 11, 2017, 9:36 a.m. UTC | #2
On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote:
> On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> > This function has an assumption that we will definitely call translate()
> > once (or say, the addr will be located inside one IOMMU memory region),
> > otherwise an empty IOTLB will be returned. Nevertheless, this is not
> > what we want. When there is no IOMMU memory region, we should build up a
> > static mapping for the caller, instead of an invalid IOTLB.
> > 
> > We won't trigger this path before VT-d passthrough mode. When
> > passthrough mode for a vhost device is setup, VT-d is possible to
> > disable the IOMMU region for that device. Without current patch, we'll
> > get a vhost boot failure, and it'll be failed over to virtio userspace
> > mode.
> 
> This doesn't look right to me.  You're assuming the target is
> address_space_memory, which might not be the case - and you should be
> able to check from the MR you do hit.  Furthermore it doesn't look
> like you're accounting for the trivial translation if the section's
> offset in the address space is different from its offset in the MR.

Do you mean this line?

        addr = addr - section->offset_within_address_space
               + section->offset_within_region;

I thought it was calculating the relative address against that memory
region. That should only be useful if we want to do further
translate(), right? For the path that this patch tries to handle (when
there is no translate() call), then this "addr" is useless here?

Regarding to the address space assignment - do you mean, e.g., I
should use section->address_space here instead of
&system_address_space? If so, I can do the switch. But after all, for
now address_space_get_iotlb_entry() is only used by vhost codes, and
it only check against iotlb.target_as == NULL, so the address space
didn't count too much here...

Another reason I used &address_space_memory is that in
vfio_iommu_map_notify() we have a check against it:

    if (iotlb->target_as != &address_space_memory) {
        error_report("Wrong target AS \"%s\", only system memory is allowed",
                     iotlb->target_as->name ? iotlb->target_as->name : "none");
        return;
    }

Or say, we have some assumptions (not only in this patch) that assumes
this iotlb.target_as should be system_address_space.

Thanks,

> 
> I think for the fallback path you're going to want something based on
> address_space_translate() instead.
> 
> > CC: Paolo Bonzini <pbonzini@redhat.com>
> > CC: Jason Wang <jasowang@redhat.com>
> > CC: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  exec.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 072de5d..5cfdacd 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -463,12 +463,13 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
> >  }
> >  
> >  /* Called from RCU critical section */
> > -IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> > +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr iova,
> >                                              bool is_write)
> >  {
> >      IOMMUTLBEntry iotlb = {0};
> >      MemoryRegionSection *section;
> >      MemoryRegion *mr;
> > +    hwaddr addr = iova, psize;
> >  
> >      for (;;) {
> >          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
> > @@ -478,6 +479,23 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> >          mr = section->mr;
> >  
> >          if (!mr->iommu_ops) {
> > +            /*
> > +             * We didn't translate() but reached here. It possibly
> > +             * means it's a static mapping. If so (it should be RAM),
> > +             * we set the IOTLB up.
> > +             */
> > +            if (!iotlb.target_as && memory_region_is_ram(mr) &&
> > +                !memory_region_is_rom(mr)) {
> > +                psize = mr->ram_block->page_size;
> > +                iova &= ~(psize - 1);
> > +                iotlb = (IOMMUTLBEntry) {
> > +                    .target_as = &address_space_memory,
> > +                    .iova = iova,
> > +                    .translated_addr = iova,
> > +                    .addr_mask = psize - 1,
> > +                    .perm = IOMMU_RW,
> > +                };
> > +            }
> >              break;
> >          }
> >  
> 
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson
David Gibson May 12, 2017, 5:25 a.m. UTC | #3
On Thu, May 11, 2017 at 05:36:03PM +0800, Peter Xu wrote:
> On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote:
> > On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> > > This function has an assumption that we will definitely call translate()
> > > once (or say, the addr will be located inside one IOMMU memory region),
> > > otherwise an empty IOTLB will be returned. Nevertheless, this is not
> > > what we want. When there is no IOMMU memory region, we should build up a
> > > static mapping for the caller, instead of an invalid IOTLB.
> > > 
> > > We won't trigger this path before VT-d passthrough mode. When
> > > passthrough mode for a vhost device is setup, VT-d is possible to
> > > disable the IOMMU region for that device. Without current patch, we'll
> > > get a vhost boot failure, and it'll be failed over to virtio userspace
> > > mode.
> > 
> > This doesn't look right to me.  You're assuming the target is
> > address_space_memory, which might not be the case - and you should be
> > able to check from the MR you do hit.  Furthermore it doesn't look
> > like you're accounting for the trivial translation if the section's
> > offset in the address space is different from its offset in the MR.
> 
> Do you mean this line?
> 
>         addr = addr - section->offset_within_address_space
>                + section->offset_within_region;

Uh.. where is that line?  But.. wait, yes, I think I was mistaken.  I saw:
	.translated_addr = iova,
	
and thought that meant you were assuming an identify mapping from iova
to translated addr.  But thinking more carefull, IIRC iova and
translated_addr are both relative to the MR, not the AS, so I think
that is correct after all.

> I thought it was calculating the relative address against that memory
> region. That should only be useful if we want to do further
> translate(), right? For the path that this patch tries to handle (when
> there is no translate() call), then this "addr" is useless here?
> 
> Regarding to the address space assignment - do you mean, e.g., I
> should use section->address_space here instead of
> &system_address_space? If so, I can do the switch.

Yes, I think you should.

> But after all, for
> now address_space_get_iotlb_entry() is only used by vhost codes, and
> it only check against iotlb.target_as == NULL, so the address space
> didn't count too much here...

> Another reason I used &address_space_memory is that in
> vfio_iommu_map_notify() we have a check against it:
> 
>     if (iotlb->target_as != &address_space_memory) {
>         error_report("Wrong target AS \"%s\", only system memory is allowed",
>                      iotlb->target_as->name ? iotlb->target_as->name : "none");
>         return;
>     }
> 
> Or say, we have some assumptions (not only in this patch) that assumes
> this iotlb.target_as should be system_address_space.

Right, the vhost code can only handle some IOMMU setups - something
like nested IOMMUs would break it.  But this way if someone sets up a
machine with an IOMMU configuration that vhost can't handle, you'll
get an error message, rather than accesses to unexpected locations,
which could cause really hard to debug corruption.

In other words we make assumptions, but we should _test_ those
assumptions.

I also think it would make sense to use address_space_translate() if we
can, since it's an existing interface for a very similar operation.


> 
> Thanks,
> 
> > 
> > I think for the fallback path you're going to want something based on
> > address_space_translate() instead.
> > 
> > > CC: Paolo Bonzini <pbonzini@redhat.com>
> > > CC: Jason Wang <jasowang@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  exec.c | 20 +++++++++++++++++++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/exec.c b/exec.c
> > > index 072de5d..5cfdacd 100644
> > > --- a/exec.c
> > > +++ b/exec.c
> > > @@ -463,12 +463,13 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
> > >  }
> > >  
> > >  /* Called from RCU critical section */
> > > -IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> > > +IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr iova,
> > >                                              bool is_write)
> > >  {
> > >      IOMMUTLBEntry iotlb = {0};
> > >      MemoryRegionSection *section;
> > >      MemoryRegion *mr;
> > > +    hwaddr addr = iova, psize;
> > >  
> > >      for (;;) {
> > >          AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
> > > @@ -478,6 +479,23 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
> > >          mr = section->mr;
> > >  
> > >          if (!mr->iommu_ops) {
> > > +            /*
> > > +             * We didn't translate() but reached here. It possibly
> > > +             * means it's a static mapping. If so (it should be RAM),
> > > +             * we set the IOTLB up.
> > > +             */
> > > +            if (!iotlb.target_as && memory_region_is_ram(mr) &&
> > > +                !memory_region_is_rom(mr)) {
> > > +                psize = mr->ram_block->page_size;
> > > +                iova &= ~(psize - 1);
> > > +                iotlb = (IOMMUTLBEntry) {
> > > +                    .target_as = &address_space_memory,
> > > +                    .iova = iova,
> > > +                    .translated_addr = iova,
> > > +                    .addr_mask = psize - 1,
> > > +                    .perm = IOMMU_RW,
> > > +                };
> > > +            }
> > >              break;
> > >          }
> > >  
> > 
> 
> 
>
Peter Xu May 15, 2017, 9 a.m. UTC | #4
On Fri, May 12, 2017 at 03:25:08PM +1000, David Gibson wrote:
> On Thu, May 11, 2017 at 05:36:03PM +0800, Peter Xu wrote:
> > On Thu, May 11, 2017 at 11:56:38AM +1000, David Gibson wrote:
> > > On Wed, May 10, 2017 at 04:01:47PM +0800, Peter Xu wrote:
> > > > This function has an assumption that we will definitely call translate()
> > > > once (or say, the addr will be located inside one IOMMU memory region),
> > > > otherwise an empty IOTLB will be returned. Nevertheless, this is not
> > > > what we want. When there is no IOMMU memory region, we should build up a
> > > > static mapping for the caller, instead of an invalid IOTLB.
> > > > 
> > > > We won't trigger this path before VT-d passthrough mode. When
> > > > passthrough mode for a vhost device is setup, VT-d is possible to
> > > > disable the IOMMU region for that device. Without current patch, we'll
> > > > get a vhost boot failure, and it'll be failed over to virtio userspace
> > > > mode.
> > > 
> > > This doesn't look right to me.  You're assuming the target is
> > > address_space_memory, which might not be the case - and you should be
> > > able to check from the MR you do hit.  Furthermore it doesn't look
> > > like you're accounting for the trivial translation if the section's
> > > offset in the address space is different from its offset in the MR.
> > 
> > Do you mean this line?
> > 
> >         addr = addr - section->offset_within_address_space
> >                + section->offset_within_region;
> 
> Uh.. where is that line?  But.. wait, yes, I think I was mistaken.  I saw:
> 	.translated_addr = iova,
> 	
> and thought that meant you were assuming an identify mapping from iova
> to translated addr.  But thinking more carefull, IIRC iova and
> translated_addr are both relative to the MR, not the AS, so I think
> that is correct after all.
> 
> > I thought it was calculating the relative address against that memory
> > region. That should only be useful if we want to do further
> > translate(), right? For the path that this patch tries to handle (when
> > there is no translate() call), then this "addr" is useless here?
> > 
> > Regarding to the address space assignment - do you mean, e.g., I
> > should use section->address_space here instead of
> > &system_address_space? If so, I can do the switch.
> 
> Yes, I think you should.
> 
> > But after all, for
> > now address_space_get_iotlb_entry() is only used by vhost codes, and
> > it only check against iotlb.target_as == NULL, so the address space
> > didn't count too much here...
> 
> > Another reason I used &address_space_memory is that in
> > vfio_iommu_map_notify() we have a check against it:
> > 
> >     if (iotlb->target_as != &address_space_memory) {
> >         error_report("Wrong target AS \"%s\", only system memory is allowed",
> >                      iotlb->target_as->name ? iotlb->target_as->name : "none");
> >         return;
> >     }
> > 
> > Or say, we have some assumptions (not only in this patch) that assumes
> > this iotlb.target_as should be system_address_space.
> 
> Right, the vhost code can only handle some IOMMU setups - something
> like nested IOMMUs would break it.  But this way if someone sets up a
> machine with an IOMMU configuration that vhost can't handle, you'll
> get an error message, rather than accesses to unexpected locations,
> which could cause really hard to debug corruption.
> 
> In other words we make assumptions, but we should _test_ those
> assumptions.
> 
> I also think it would make sense to use address_space_translate() if we
> can, since it's an existing interface for a very similar operation.

I did give it a shot to generalize these two functions in this series
(I just posted it):

  [PATCH 0/4] exec: address space translation cleanups

Please see whether that'll be a good replacement of this single patch.

Thanks,
diff mbox

Patch

diff --git a/exec.c b/exec.c
index 072de5d..5cfdacd 100644
--- a/exec.c
+++ b/exec.c
@@ -463,12 +463,13 @@  address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
 }
 
 /* Called from RCU critical section */
-IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
+IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr iova,
                                             bool is_write)
 {
     IOMMUTLBEntry iotlb = {0};
     MemoryRegionSection *section;
     MemoryRegion *mr;
+    hwaddr addr = iova, psize;
 
     for (;;) {
         AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
@@ -478,6 +479,23 @@  IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
         mr = section->mr;
 
         if (!mr->iommu_ops) {
+            /*
+             * We didn't translate() but reached here. It possibly
+             * means it's a static mapping. If so (it should be RAM),
+             * we set the IOTLB up.
+             */
+            if (!iotlb.target_as && memory_region_is_ram(mr) &&
+                !memory_region_is_rom(mr)) {
+                psize = mr->ram_block->page_size;
+                iova &= ~(psize - 1);
+                iotlb = (IOMMUTLBEntry) {
+                    .target_as = &address_space_memory,
+                    .iova = iova,
+                    .translated_addr = iova,
+                    .addr_mask = psize - 1,
+                    .perm = IOMMU_RW,
+                };
+            }
             break;
         }