mbox series

[v2,0/4] Balloon inhibit enhancements, vfio restriction

Message ID 153299204130.14411.11438396195753743913.stgit@gimli.home
Headers show
Series Balloon inhibit enhancements, vfio restriction | expand

Message

Alex Williamson July 30, 2018, 11:13 p.m. UTC
v2:
 - Use atomic ops for balloon inhibit counter (Peter)
 - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
   default, vfio-pci opt-in by device option, only allowed for mdev
   devices, no support added for platform as there are no platform
   mdev devices.

See patch 3/4 for detailed explanation why ballooning and device
assignment typically don't mix.  If this eventually changes, flags
on the iommu info struct or perhaps device info struct can inform
us for automatic opt-in.  Thanks,

Alex

---

Alex Williamson (4):
      balloon: Allow nested inhibits
      kvm: Use inhibit to prevent ballooning without synchronous mmu
      vfio: Inhibit ballooning based on group attachment to a container
      vfio/ccw/pci: Allow devices to opt-in for ballooning


 accel/kvm/kvm-all.c           |    4 ++++
 balloon.c                     |   13 ++++++++++---
 hw/vfio/ccw.c                 |    9 +++++++++
 hw/vfio/common.c              |   26 ++++++++++++++++++++++++++
 hw/vfio/pci.c                 |   26 +++++++++++++++++++++++++-
 hw/vfio/trace-events          |    1 +
 hw/virtio/virtio-balloon.c    |    4 +---
 include/hw/vfio/vfio-common.h |    2 ++
 8 files changed, 78 insertions(+), 7 deletions(-)

Comments

Michael S. Tsirkin July 31, 2018, 12:29 p.m. UTC | #1
On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:
> v2:
>  - Use atomic ops for balloon inhibit counter (Peter)
>  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
>    default, vfio-pci opt-in by device option, only allowed for mdev
>    devices, no support added for platform as there are no platform
>    mdev devices.
> 
> See patch 3/4 for detailed explanation why ballooning and device
> assignment typically don't mix.  If this eventually changes, flags
> on the iommu info struct or perhaps device info struct can inform
> us for automatic opt-in.  Thanks,
> 
> Alex

So this patch seems to block ballooning when vfio is added.
But what if balloon is added and inflated first?

I'd suggest making qemu_balloon_inhibit fail in that case,
and then vfio realize will fail as well.


> ---
> 
> Alex Williamson (4):
>       balloon: Allow nested inhibits
>       kvm: Use inhibit to prevent ballooning without synchronous mmu
>       vfio: Inhibit ballooning based on group attachment to a container
>       vfio/ccw/pci: Allow devices to opt-in for ballooning
> 
> 
>  accel/kvm/kvm-all.c           |    4 ++++
>  balloon.c                     |   13 ++++++++++---
>  hw/vfio/ccw.c                 |    9 +++++++++
>  hw/vfio/common.c              |   26 ++++++++++++++++++++++++++
>  hw/vfio/pci.c                 |   26 +++++++++++++++++++++++++-
>  hw/vfio/trace-events          |    1 +
>  hw/virtio/virtio-balloon.c    |    4 +---
>  include/hw/vfio/vfio-common.h |    2 ++
>  8 files changed, 78 insertions(+), 7 deletions(-)
Alex Williamson July 31, 2018, 2:44 p.m. UTC | #2
On Tue, 31 Jul 2018 15:29:17 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:
> > v2:
> >  - Use atomic ops for balloon inhibit counter (Peter)
> >  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
> >    default, vfio-pci opt-in by device option, only allowed for mdev
> >    devices, no support added for platform as there are no platform
> >    mdev devices.
> > 
> > See patch 3/4 for detailed explanation why ballooning and device
> > assignment typically don't mix.  If this eventually changes, flags
> > on the iommu info struct or perhaps device info struct can inform
> > us for automatic opt-in.  Thanks,
> > 
> > Alex  
> 
> So this patch seems to block ballooning when vfio is added.
> But what if balloon is added and inflated first?

Good point.
 
> I'd suggest making qemu_balloon_inhibit fail in that case,
> and then vfio realize will fail as well.

That might be the correct behavior for vfio, but I wonder about the
existing postcopy use case.  Dave Gilbert, what do you think?  We might
need a separate interface for callers that cannot tolerate existing
ballooned pages.  Of course we'll also need another atomic counter to
keep a tally of ballooned pages.  Thanks,

Alex
Dr. David Alan Gilbert July 31, 2018, 3:07 p.m. UTC | #3
* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Tue, 31 Jul 2018 15:29:17 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:
> > > v2:
> > >  - Use atomic ops for balloon inhibit counter (Peter)
> > >  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
> > >    default, vfio-pci opt-in by device option, only allowed for mdev
> > >    devices, no support added for platform as there are no platform
> > >    mdev devices.
> > > 
> > > See patch 3/4 for detailed explanation why ballooning and device
> > > assignment typically don't mix.  If this eventually changes, flags
> > > on the iommu info struct or perhaps device info struct can inform
> > > us for automatic opt-in.  Thanks,
> > > 
> > > Alex  
> > 
> > So this patch seems to block ballooning when vfio is added.
> > But what if balloon is added and inflated first?
> 
> Good point.
>  
> > I'd suggest making qemu_balloon_inhibit fail in that case,
> > and then vfio realize will fail as well.
> 
> That might be the correct behavior for vfio, but I wonder about the
> existing postcopy use case.  Dave Gilbert, what do you think?  We might
> need a separate interface for callers that cannot tolerate existing
> ballooned pages.  Of course we'll also need another atomic counter to
> keep a tally of ballooned pages.  Thanks,

For postcopy, preinflation isn't a problem; our only issue is ballooning
during the postcopy phase itself.

Dave

> Alex
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Alex Williamson July 31, 2018, 9:50 p.m. UTC | #4
On Tue, 31 Jul 2018 16:07:46 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Alex Williamson (alex.williamson@redhat.com) wrote:
> > On Tue, 31 Jul 2018 15:29:17 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:  
> > > > v2:
> > > >  - Use atomic ops for balloon inhibit counter (Peter)
> > > >  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
> > > >    default, vfio-pci opt-in by device option, only allowed for mdev
> > > >    devices, no support added for platform as there are no platform
> > > >    mdev devices.
> > > > 
> > > > See patch 3/4 for detailed explanation why ballooning and device
> > > > assignment typically don't mix.  If this eventually changes, flags
> > > > on the iommu info struct or perhaps device info struct can inform
> > > > us for automatic opt-in.  Thanks,
> > > > 
> > > > Alex    
> > > 
> > > So this patch seems to block ballooning when vfio is added.
> > > But what if balloon is added and inflated first?  
> > 
> > Good point.
> >    
> > > I'd suggest making qemu_balloon_inhibit fail in that case,
> > > and then vfio realize will fail as well.  
> > 
> > That might be the correct behavior for vfio, but I wonder about the
> > existing postcopy use case.  Dave Gilbert, what do you think?  We might
> > need a separate interface for callers that cannot tolerate existing
> > ballooned pages.  Of course we'll also need another atomic counter to
> > keep a tally of ballooned pages.  Thanks,  
> 
> For postcopy, preinflation isn't a problem; our only issue is ballooning
> during the postcopy phase itself.

On further consideration, I think device assignment is in the same
category.  The balloon inhibitor does not actually stop the guest
balloon driver from grabbing and freeing pages, it only changes whether
QEMU releases the pages with madvise DONTNEED.  The problem we have
with ballooning and device assignment is when we have an existing HPA
mapping in the IOMMU that isn't invalidated on DONTNEED and becomes
inconsistent when the page is re-populated.  Zapped pages at the time
an assigned device is added do not trigger this, those pages will be
repopulated when pages are pinned for the assigned device.  This is the
identical scenario to a freshly started VM that doesn't use memory
preallocation and therefore faults in pages on demand.  When an
assigned device is attached to such a VM, page pinning will fault in
and lock all of those pages.

This is observable behavior, for example if I start a VM with 16GB of
RAM, booted to a command prompt the VM shows less that 1GB of RAM
resident in the host.  If I set the balloon to 2048, there's no
observable change in the QEMU process size on the host.  If I hot-add
an assigned device while we're ballooned down, the resident memory size
from the host jumps up to 16GB.  All of the zapped pages have been
reclaimed.  Adjusting ballooning at this point only changes the balloon
size in the guest, inflating the balloon no longer zaps pages from the
process.

The only oddity I see is the one Dave noted in the commit introducing
balloon inhibiting (371ff5a3f04c):

    Queueing the requests until after migration would be nice, but is
    non-trivial, since the set of inflate/deflate requests have to
    be compared with the state of the page to know what the final
    outcome is allowed to be.

So for this example of a 16GB VM ballooned down to 2GB then an assigned
device added and subsequently removed, the resident memory remains 16GB
and I need to deflate the balloon and reinflate it in order to zap them
from the QEMU process.  Therefore, I think that with respect to this
inquiry, the series stands as is.  Thanks,

Alex
Michael S. Tsirkin Aug. 3, 2018, 6:42 p.m. UTC | #5
On Tue, Jul 31, 2018 at 03:50:30PM -0600, Alex Williamson wrote:
> On Tue, 31 Jul 2018 16:07:46 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Alex Williamson (alex.williamson@redhat.com) wrote:
> > > On Tue, 31 Jul 2018 15:29:17 +0300
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:  
> > > > > v2:
> > > > >  - Use atomic ops for balloon inhibit counter (Peter)
> > > > >  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
> > > > >    default, vfio-pci opt-in by device option, only allowed for mdev
> > > > >    devices, no support added for platform as there are no platform
> > > > >    mdev devices.
> > > > > 
> > > > > See patch 3/4 for detailed explanation why ballooning and device
> > > > > assignment typically don't mix.  If this eventually changes, flags
> > > > > on the iommu info struct or perhaps device info struct can inform
> > > > > us for automatic opt-in.  Thanks,
> > > > > 
> > > > > Alex    
> > > > 
> > > > So this patch seems to block ballooning when vfio is added.
> > > > But what if balloon is added and inflated first?  
> > > 
> > > Good point.
> > >    
> > > > I'd suggest making qemu_balloon_inhibit fail in that case,
> > > > and then vfio realize will fail as well.  
> > > 
> > > That might be the correct behavior for vfio, but I wonder about the
> > > existing postcopy use case.  Dave Gilbert, what do you think?  We might
> > > need a separate interface for callers that cannot tolerate existing
> > > ballooned pages.  Of course we'll also need another atomic counter to
> > > keep a tally of ballooned pages.  Thanks,  
> > 
> > For postcopy, preinflation isn't a problem; our only issue is ballooning
> > during the postcopy phase itself.
> 
> On further consideration, I think device assignment is in the same
> category.  The balloon inhibitor does not actually stop the guest
> balloon driver from grabbing and freeing pages, it only changes whether
> QEMU releases the pages with madvise DONTNEED.  The problem we have
> with ballooning and device assignment is when we have an existing HPA
> mapping in the IOMMU that isn't invalidated on DONTNEED and becomes
> inconsistent when the page is re-populated.  Zapped pages at the time
> an assigned device is added do not trigger this, those pages will be
> repopulated when pages are pinned for the assigned device.  This is the
> identical scenario to a freshly started VM that doesn't use memory
> preallocation and therefore faults in pages on demand.  When an
> assigned device is attached to such a VM, page pinning will fault in
> and lock all of those pages.

Granted this means memory won't be corrupted, but it is
also highly unlikely to be what the user wanted.

> This is observable behavior, for example if I start a VM with 16GB of
> RAM, booted to a command prompt the VM shows less that 1GB of RAM
> resident in the host.  If I set the balloon to 2048, there's no
> observable change in the QEMU process size on the host.  If I hot-add
> an assigned device while we're ballooned down, the resident memory size
> from the host jumps up to 16GB.  All of the zapped pages have been
> reclaimed.  Adjusting ballooning at this point only changes the balloon
> size in the guest, inflating the balloon no longer zaps pages from the
> process.
> 
> The only oddity I see is the one Dave noted in the commit introducing
> balloon inhibiting (371ff5a3f04c):
> 
>     Queueing the requests until after migration would be nice, but is
>     non-trivial, since the set of inflate/deflate requests have to
>     be compared with the state of the page to know what the final
>     outcome is allowed to be.
> 
> So for this example of a 16GB VM ballooned down to 2GB then an assigned
> device added and subsequently removed, the resident memory remains 16GB
> and I need to deflate the balloon and reinflate it in order to zap them
> from the QEMU process.  Therefore, I think that with respect to this
> inquiry, the series stands as is.  Thanks,
> 
> Alex
Alex Williamson Aug. 3, 2018, 8:12 p.m. UTC | #6
On Fri, 3 Aug 2018 21:42:18 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jul 31, 2018 at 03:50:30PM -0600, Alex Williamson wrote:
> > On Tue, 31 Jul 2018 16:07:46 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Alex Williamson (alex.williamson@redhat.com) wrote:  
> > > > On Tue, 31 Jul 2018 15:29:17 +0300
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >     
> > > > > On Mon, Jul 30, 2018 at 05:13:26PM -0600, Alex Williamson wrote:    
> > > > > > v2:
> > > > > >  - Use atomic ops for balloon inhibit counter (Peter)
> > > > > >  - Allow endpoint driver opt-in for ballooning, vfio-ccw opt-in by
> > > > > >    default, vfio-pci opt-in by device option, only allowed for mdev
> > > > > >    devices, no support added for platform as there are no platform
> > > > > >    mdev devices.
> > > > > > 
> > > > > > See patch 3/4 for detailed explanation why ballooning and device
> > > > > > assignment typically don't mix.  If this eventually changes, flags
> > > > > > on the iommu info struct or perhaps device info struct can inform
> > > > > > us for automatic opt-in.  Thanks,
> > > > > > 
> > > > > > Alex      
> > > > > 
> > > > > So this patch seems to block ballooning when vfio is added.
> > > > > But what if balloon is added and inflated first?    
> > > > 
> > > > Good point.
> > > >      
> > > > > I'd suggest making qemu_balloon_inhibit fail in that case,
> > > > > and then vfio realize will fail as well.    
> > > > 
> > > > That might be the correct behavior for vfio, but I wonder about the
> > > > existing postcopy use case.  Dave Gilbert, what do you think?  We might
> > > > need a separate interface for callers that cannot tolerate existing
> > > > ballooned pages.  Of course we'll also need another atomic counter to
> > > > keep a tally of ballooned pages.  Thanks,    
> > > 
> > > For postcopy, preinflation isn't a problem; our only issue is ballooning
> > > during the postcopy phase itself.  
> > 
> > On further consideration, I think device assignment is in the same
> > category.  The balloon inhibitor does not actually stop the guest
> > balloon driver from grabbing and freeing pages, it only changes whether
> > QEMU releases the pages with madvise DONTNEED.  The problem we have
> > with ballooning and device assignment is when we have an existing HPA
> > mapping in the IOMMU that isn't invalidated on DONTNEED and becomes
> > inconsistent when the page is re-populated.  Zapped pages at the time
> > an assigned device is added do not trigger this, those pages will be
> > repopulated when pages are pinned for the assigned device.  This is the
> > identical scenario to a freshly started VM that doesn't use memory
> > preallocation and therefore faults in pages on demand.  When an
> > assigned device is attached to such a VM, page pinning will fault in
> > and lock all of those pages.  
> 
> Granted this means memory won't be corrupted, but it is
> also highly unlikely to be what the user wanted.

This is the current state of balloon inhibiting, it does not revoke
ballooned pages, ie. giving them back to the back to the guest, or
prevent the guest from inflating the balloon, it only controls whether
the pages are released back to the host.  I suspect a much more
involved overhaul of ballooning in general would be necessary to
support such features.  Thanks,

Alex