Message ID | 20150130235443.14915.89419.stgit@gimli.home |
---|---|
State | New |
Headers | show |
On 31/01/2015 00:55, Alex Williamson wrote: > Commit d8d95814609e replaced a number of memory_region_destroy() > calls with object_unparent() calls. The logic appears to be that > subregions need to be unparented, but the base region is destroyed > with the device object. Doing hotplug testing with vfio-pci I > occasionally get a segfault from object_finalize_child_property() > due to completely bogus class pointers on the child Object. Adding > the explicit object_unparent() for these subregions resolves the > problem, however I question the sanity of the Memory API now where > we sometimes need to destroy MemoryRegions, but the rules aren't > clear There is no memory_region_destroy API because you cannot destroy MemoryRegions. All you do is releasing the link between the VFIO device (the parent, specified in memory_region_init*) and the MemoryRegion. The link caused the VFIO device to keep the MemoryRegion alive. There can be pending references to the VFIO device at unrealize time, and this is why the memory_region_destroy() API was not enough. For example if someone was doing I/O to a BAR and thus address_space_map is keeping the VFIO device alive. The explicit memory_region_destroy() function made it much harder to handle this case. You had to define an instance_finalize function for every class, and do memory_region_destroy() there. Not surprisingly, no one did that. Sure, it's not a common case and a well-behaving guest does not do that, but if it does it means use-after-frees and thus a possible guest->host escalation. Instead, the implicit destruction via reference counting makes this case easy to handle, because reclamation is done automatically when the VFIO device dies. Explicit object_unparent() is only needed if you recreate the memory region during the lifetime of the object. This is rarely needed, and it is simple to spot if it's needed. If you do memory_region_init* outside the realize function, most likely you need a matching object_unparent somewhere else in the device logic. This was the idea behind commit d8d95814609e. It only touched a handful of files because almost no one does memory_region_init* outside the realize function, and in particular VFIO doesn't. VFIO follows the common convention of only creating regions in realize, and thus does not need object_unparent. > and there's no longer a memory_region_destroy() function, so > we need to reach over to some other random QEMU API It's not random. Object is the parent class of MemoryRegion. object_unparent is a method for MemoryRegion. > and unparent an object that we barely know about I'm not sure about this? You certainly know the memory regions you create. > and certainly didn't explicitly parent previously. You did when you passed the VFIO device to memory_region_init*. I'm afraid this patch is incorrect. You have to find out where the region is being overwritten. Thanks, Paolo > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: qemu-stable@nongnu.org > --- > > hw/vfio/pci.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 014a92c..c71499e 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) > > memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem); > munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem)); > + object_unparent(OBJECT(&bar->region.mmap_mem)); > > if (vdev->msix && vdev->msix->table_bar == nr) { > memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem); > munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem)); > + object_unparent(OBJECT(&vdev->msix->mmap_mem)); > } > } > > > >
On Sat, 2015-01-31 at 09:43 +0100, Paolo Bonzini wrote: > > On 31/01/2015 00:55, Alex Williamson wrote: > > Commit d8d95814609e replaced a number of memory_region_destroy() > > calls with object_unparent() calls. The logic appears to be that > > subregions need to be unparented, but the base region is destroyed > > with the device object. Doing hotplug testing with vfio-pci I > > occasionally get a segfault from object_finalize_child_property() > > due to completely bogus class pointers on the child Object. Adding > > the explicit object_unparent() for these subregions resolves the > > problem, however I question the sanity of the Memory API now where > > we sometimes need to destroy MemoryRegions, but the rules aren't > > clear > > There is no memory_region_destroy API because you cannot destroy > MemoryRegions. All you do is releasing the link between the VFIO device > (the parent, specified in memory_region_init*) and the MemoryRegion. > The link caused the VFIO device to keep the MemoryRegion alive. > > There can be pending references to the VFIO device at unrealize time, > and this is why the memory_region_destroy() API was not enough. For > example if someone was doing I/O to a BAR and thus address_space_map is > keeping the VFIO device alive. > > The explicit memory_region_destroy() function made it much harder to > handle this case. You had to define an instance_finalize function for > every class, and do memory_region_destroy() there. Not surprisingly, no > one did that. Sure, it's not a common case and a well-behaving guest > does not do that, but if it does it means use-after-frees and thus a > possible guest->host escalation. > > Instead, the implicit destruction via reference counting makes this case > easy to handle, because reclamation is done automatically when the VFIO > device dies. > > Explicit object_unparent() is only needed if you recreate the memory > region during the lifetime of the object. This is rarely needed, and it > is simple to spot if it's needed. If you do memory_region_init* outside > the realize function, most likely you need a matching object_unparent > somewhere else in the device logic. > > This was the idea behind commit d8d95814609e. It only touched a handful > of files because almost no one does memory_region_init* outside the > realize function, and in particular VFIO doesn't. VFIO follows the > common convention of only creating regions in realize, and thus does not > need object_unparent. > > > and there's no longer a memory_region_destroy() function, so > > we need to reach over to some other random QEMU API > > It's not random. Object is the parent class of MemoryRegion. > object_unparent is a method for MemoryRegion. > > > and unparent an object that we barely know about > > I'm not sure about this? You certainly know the memory regions you create. > > > and certainly didn't explicitly parent previously. > > You did when you passed the VFIO device to memory_region_init*. > > I'm afraid this patch is incorrect. You have to find out where the > region is being overwritten. Thanks Paolo, so if I look more closely at where you added object_unparent() calls in d8d95814609e, I can see that they're associated with dynamically allocated objects that are freed as part of the vfio device exitfn. vdev->msix is also such a structure and is the property causing us the segfaults. Being associated with a free also explains the randomness of the segfault. So, I think the second object_unparent() call is correct and that the guiding principle is that any MemoryRegion associated with a dynamically allocated structure and freed as part of the class exit callback needs to be explicitly unparented. Does that sound right? Thanks, Alex > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: qemu-stable@nongnu.org > > --- > > > > hw/vfio/pci.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > > index 014a92c..c71499e 100644 > > --- a/hw/vfio/pci.c > > +++ b/hw/vfio/pci.c > > @@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) > > > > memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem); > > munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem)); > > + object_unparent(OBJECT(&bar->region.mmap_mem)); > > > > if (vdev->msix && vdev->msix->table_bar == nr) { > > memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem); > > munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem)); > > + object_unparent(OBJECT(&vdev->msix->mmap_mem)); > > } > > } > > > > > > > >
On 31/01/2015 16:10, Alex Williamson wrote: >> Explicit object_unparent() is only needed if you recreate the memory >> region during the lifetime of the object. This is rarely needed, and it >> is simple to spot if it's needed. If you do memory_region_init* outside >> the realize function, most likely you need a matching object_unparent >> somewhere else in the device logic. >> >> This was the idea behind commit d8d95814609e. It only touched a handful >> of files because almost no one does memory_region_init* outside the >> realize function, and in particular VFIO doesn't. VFIO follows the >> common convention of only creating regions in realize, and thus does not >> need object_unparent. > > Thanks Paolo, so if I look more closely at where you added > object_unparent() calls in d8d95814609e, I can see that they're > associated with dynamically allocated objects that are freed as part of > the vfio device exitfn. vdev->msix is also such a structure and is the > property causing us the segfaults. Yeah, this is a different case than the one I mentioned above, and you're right that this is also not exactly the common convention---so it is a second case of needing an explicit object_unparent. > So, I think the second > object_unparent() call is correct and that the guiding principle is that > any MemoryRegion associated with a dynamically allocated structure and > freed as part of the class exit callback needs to be explicitly > unparented. Does that sound right? The pedant in me asks to do the object_unparent in vfio_put_device, just before freeing vdev->msix. This way you already abide by RCU's principle of separating removal and reclamation (and I won't have to move it in part 3 of the RCU patches, which is the next to be posted; that part will move the vfio_put_device call to instance_finalize). Another possible change would be to make vdev->msix statically allocated (checking vdev->msix.entries != 0 instead of vdev->msix != NULL, possibly hidden beneath an inline function vfio_has_msix). Then you'd be in the exact same case as the other memory regions and wouldn't need an unparent at all. But I am not certain myself of the relative beauty of this, and it is obviously less suitable for qemu-stable, so do go ahead with the one-liner. I'll improve the documentation as soon as the code actually follows the principles that have to be documented. But now that the ball is rolling on RCU and multithreading, documentation is indeed getting more and more important. Thanks, Paolo > Alex > >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Cc: qemu-stable@nongnu.org >>> --- >>> >>> hw/vfio/pci.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >>> index 014a92c..c71499e 100644 >>> --- a/hw/vfio/pci.c >>> +++ b/hw/vfio/pci.c >>> @@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) >>> >>> memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem); >>> munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem)); >>> + object_unparent(OBJECT(&bar->region.mmap_mem)); >>> >>> if (vdev->msix && vdev->msix->table_bar == nr) { >>> memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem); >>> munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem)); >>> + object_unparent(OBJECT(&vdev->msix->mmap_mem)); >>> } >>> } >>> >>> >>> >>> > > > > >
On Sat, 2015-01-31 at 21:47 +0100, Paolo Bonzini wrote: > > On 31/01/2015 16:10, Alex Williamson wrote: > >> Explicit object_unparent() is only needed if you recreate the memory > >> region during the lifetime of the object. This is rarely needed, and it > >> is simple to spot if it's needed. If you do memory_region_init* outside > >> the realize function, most likely you need a matching object_unparent > >> somewhere else in the device logic. > >> > >> This was the idea behind commit d8d95814609e. It only touched a handful > >> of files because almost no one does memory_region_init* outside the > >> realize function, and in particular VFIO doesn't. VFIO follows the > >> common convention of only creating regions in realize, and thus does not > >> need object_unparent. > > > > Thanks Paolo, so if I look more closely at where you added > > object_unparent() calls in d8d95814609e, I can see that they're > > associated with dynamically allocated objects that are freed as part of > > the vfio device exitfn. vdev->msix is also such a structure and is the > > property causing us the segfaults. > > Yeah, this is a different case than the one I mentioned above, and > you're right that this is also not exactly the common convention---so it > is a second case of needing an explicit object_unparent. > > > So, I think the second > > object_unparent() call is correct and that the guiding principle is that > > any MemoryRegion associated with a dynamically allocated structure and > > freed as part of the class exit callback needs to be explicitly > > unparented. Does that sound right? > > The pedant in me asks to do the object_unparent in vfio_put_device, just > before freeing vdev->msix. This way you already abide by RCU's > principle of separating removal and reclamation (and I won't have to > move it in part 3 of the RCU patches, which is the next to be posted; > that part will move the vfio_put_device call to instance_finalize). Ok. > Another possible change would be to make vdev->msix statically allocated > (checking vdev->msix.entries != 0 instead of vdev->msix != NULL, > possibly hidden beneath an inline function vfio_has_msix). Then you'd > be in the exact same case as the other memory regions and wouldn't need > an unparent at all. But I am not certain myself of the relative beauty > of this, and it is obviously less suitable for qemu-stable, so do go > ahead with the one-liner. The pendant in me prefers that the MSI-X structure is only allocated for devices that actually have MSI-X ;) v2 posted. > I'll improve the documentation as soon as the code actually follows the > principles that have to be documented. But now that the ball is rolling > on RCU and multithreading, documentation is indeed getting more and more > important. Thanks, it's definitely not obvious how this works in the current docs. Alex > >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > >>> Cc: Paolo Bonzini <pbonzini@redhat.com> > >>> Cc: qemu-stable@nongnu.org > >>> --- > >>> > >>> hw/vfio/pci.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> > >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >>> index 014a92c..c71499e 100644 > >>> --- a/hw/vfio/pci.c > >>> +++ b/hw/vfio/pci.c > >>> @@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) > >>> > >>> memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem); > >>> munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem)); > >>> + object_unparent(OBJECT(&bar->region.mmap_mem)); > >>> > >>> if (vdev->msix && vdev->msix->table_bar == nr) { > >>> memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem); > >>> munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem)); > >>> + object_unparent(OBJECT(&vdev->msix->mmap_mem)); > >>> } > >>> } > >>> > >>> > >>> > >>> > > > > > > > > > >
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 014a92c..c71499e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2294,10 +2294,12 @@ static void vfio_unmap_bar(VFIOPCIDevice *vdev, int nr) memory_region_del_subregion(&bar->region.mem, &bar->region.mmap_mem); munmap(bar->region.mmap, memory_region_size(&bar->region.mmap_mem)); + object_unparent(OBJECT(&bar->region.mmap_mem)); if (vdev->msix && vdev->msix->table_bar == nr) { memory_region_del_subregion(&bar->region.mem, &vdev->msix->mmap_mem); munmap(vdev->msix->mmap, memory_region_size(&vdev->msix->mmap_mem)); + object_unparent(OBJECT(&vdev->msix->mmap_mem)); } }
Commit d8d95814609e replaced a number of memory_region_destroy() calls with object_unparent() calls. The logic appears to be that subregions need to be unparented, but the base region is destroyed with the device object. Doing hotplug testing with vfio-pci I occasionally get a segfault from object_finalize_child_property() due to completely bogus class pointers on the child Object. Adding the explicit object_unparent() for these subregions resolves the problem, however I question the sanity of the Memory API now where we sometimes need to destroy MemoryRegions, but the rules aren't clear and there's no longer a memory_region_destroy() function, so we need to reach over to some other random QEMU API and unparent an object that we barely know about and certainly didn't explicitly parent previously. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-stable@nongnu.org --- hw/vfio/pci.c | 2 ++ 1 file changed, 2 insertions(+)