Message ID | 1349962023-560-8-git-send-email-avi@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Oct 11, 2012 at 03:44:10PM +0200, Avi Kivity wrote: > On 10/11/2012 03:44 PM, Michael S. Tsirkin wrote: > > On Thu, Oct 11, 2012 at 03:34:54PM +0200, Avi Kivity wrote: > >> On 10/11/2012 03:31 PM, Michael S. Tsirkin wrote: > >> > On Thu, Oct 11, 2012 at 03:27:03PM +0200, Avi Kivity wrote: > >> >> vhost doesn't support guest iommus yet, indicate it to the user > >> >> by gently depositing a core on their disk. > >> >> > >> >> Signed-off-by: Avi Kivity <avi@redhat.com> > >> > > >> > Actually there is no problem. virtio bypasses an IOMMU, > >> > so vhost works fine by writing into guest memory directly. > >> > > >> > So I don't think we need this patch. > >> > >> The pci subsystem should set up the iommu so that it ignores virtio > >> devices. If it does, an emulated iommu will not reach vhost. If it > >> doesn't, then it will, and the assert() will alert us that we have a bug. > > > > You mean pci subsystem in the guest? I'm pretty sure that's not > > the case at the moment: iommu is on by default and applies > > to all devices unless you do something special. > > I see where you are coming from but it does > > not look right to break all existing guests. > > No, qemu should configure virtio devices to bypass the iommu, even if it > is on. Okay so there will be some API that virtio devices should call to achieve this? > > Also - I see no reason to single out vhost - I think same applies with > > any virtio device, since it doesn't use the DMA API. > > True. > > -- > error compiling committee.c: too many arguments to function
On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: >> No, qemu should configure virtio devices to bypass the iommu, even if it >> is on. > > Okay so there will be some API that virtio devices should call > to achieve this? The iommu should probably call pci_device_bypasses_iommu() to check for such devices.
On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: > On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: > > >> No, qemu should configure virtio devices to bypass the iommu, even if it > >> is on. > > > > Okay so there will be some API that virtio devices should call > > to achieve this? > > The iommu should probably call pci_device_bypasses_iommu() to check for > such devices. So maybe this patch should depend on the introduction of such an API. > -- > error compiling committee.c: too many arguments to function
On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote: > On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: >> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: >> >> >> No, qemu should configure virtio devices to bypass the iommu, even if it >> >> is on. >> > >> > Okay so there will be some API that virtio devices should call >> > to achieve this? >> >> The iommu should probably call pci_device_bypasses_iommu() to check for >> such devices. > > So maybe this patch should depend on the introduction of such > an API. I've dropped it for now. In fact, virtio/vhost are safe since they use cpu_physical_memory_rw() and the memory listener watches address_space_memory, no iommu there. vfio needs to change to listen to pci_dev->bus_master_as, and need special handling for iommu regions (abort for now, type 2 iommu later).
On Thu, 2012-10-11 at 17:48 +0200, Avi Kivity wrote: > On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote: > > On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: > >> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: > >> > >> >> No, qemu should configure virtio devices to bypass the iommu, even if it > >> >> is on. > >> > > >> > Okay so there will be some API that virtio devices should call > >> > to achieve this? > >> > >> The iommu should probably call pci_device_bypasses_iommu() to check for > >> such devices. > > > > So maybe this patch should depend on the introduction of such > > an API. > > I've dropped it for now. > > In fact, virtio/vhost are safe since they use cpu_physical_memory_rw() > and the memory listener watches address_space_memory, no iommu there. > vfio needs to change to listen to pci_dev->bus_master_as, and need > special handling for iommu regions (abort for now, type 2 iommu later). I don't see how we can ever support an assigned device with the translate function. Don't we want a flat address space at run time anyway? IOMMU drivers go to pains to make IOTLB updates efficient and drivers optimize for long running translations, but here we impose a penalty on every access. I think we'd be more efficient and better able to support assigned devices if the per device/bus address space was updated and flattened when it changes. Being able to implement an XOR IOMMU is impressive, but is it practical? We could be doing much more practical things like nested device assignment with a flatten translation ;) Thanks, Alex
On Thu, Oct 11, 2012 at 11:48 PM, Avi Kivity <avi@redhat.com> wrote: > On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote: >> On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: >>> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: >>> >>> >> No, qemu should configure virtio devices to bypass the iommu, even if it >>> >> is on. >>> > >>> > Okay so there will be some API that virtio devices should call >>> > to achieve this? >>> >>> The iommu should probably call pci_device_bypasses_iommu() to check for >>> such devices. >> >> So maybe this patch should depend on the introduction of such >> an API. > > I've dropped it for now. > > In fact, virtio/vhost are safe since they use cpu_physical_memory_rw() > and the memory listener watches address_space_memory, no iommu there. Not quite sure your meaning. My understanding is that as a pci device, vhost can lie behind a iommu in topology, which result in the transaction launched can be snapped by the emulated iommu. BUT we make a exception for vhost-dev and enforce address_space_rw(address_space_memory, ..) NOT address_space_rw(pci_dev->bus_master_as,..) for vhost device, so we bypass the iommu. Right? Regards, pingfan > vfio needs to change to listen to pci_dev->bus_master_as, and need > special handling for iommu regions (abort for now, type 2 iommu later). > > -- > error compiling committee.c: too many arguments to function
On 10/11/2012 09:38 PM, Alex Williamson wrote: > On Thu, 2012-10-11 at 17:48 +0200, Avi Kivity wrote: >> On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote: >> > On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: >> >> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: >> >> >> >> >> No, qemu should configure virtio devices to bypass the iommu, even if it >> >> >> is on. >> >> > >> >> > Okay so there will be some API that virtio devices should call >> >> > to achieve this? >> >> >> >> The iommu should probably call pci_device_bypasses_iommu() to check for >> >> such devices. >> > >> > So maybe this patch should depend on the introduction of such >> > an API. >> >> I've dropped it for now. >> >> In fact, virtio/vhost are safe since they use cpu_physical_memory_rw() >> and the memory listener watches address_space_memory, no iommu there. >> vfio needs to change to listen to pci_dev->bus_master_as, and need >> special handling for iommu regions (abort for now, type 2 iommu later). > > I don't see how we can ever support an assigned device with the > translate function. We cannot. > Don't we want a flat address space at run time > anyway? Not if we want vfio-in-the-guest (for nested virt or OS bypass). > IOMMU drivers go to pains to make IOTLB updates efficient and > drivers optimize for long running translations, but here we impose a > penalty on every access. I think we'd be more efficient and better able > to support assigned devices if the per device/bus address space was > updated and flattened when it changes. A flattened address space cannot be efficiently implemented with a ->translate() callback. Describing the transformed address space requires walking all the iommu page tables; these can change very frequently for some use cases, and the io page tables can be built after the iommu is configured but before dma is initiated, so you have no hook from which to call ->translate(); and the representation of the address space can be huge. > Being able to implement an XOR > IOMMU is impressive, but is it practical? The XOR IOMMU is just a way for me to test and demonstrate the API. > We could be doing much more > practical things like nested device assignment with a flatten > translation ;) Thanks, No, a flattened translation is impractical, at least when driven from qemu. My plans wrt vfio/kvm here are to have memory_region_init_iommu() provide, in addition to ->translate(), a declarative description of the translation function. In practical terms, this means that the API will receive the name of the spec that the iommu implements: MemoryRegionIOMMUOps amd_iommu_v2_ops = { .translate = amd_iommu_v2_ops, .translation_type = IOMMU_AMD_V2, }; qemu-side vfio would then match ->translation_type with what the kernel provides, and configure the kernel for this type of translation. As some v2 hardware supports two levels of translations, all vfio has to do is to set up the lower translation level to match the guest->host translation (which it does already), and to set up the upper translation level to follow the guest configuration. From then on the hardware does the rest. If the hardware supports only one translation level, we may still be able to implement nested iommu using the same techniques we use for the processor page tables - shadowing. kvm would write-protect the iommu page tables and pass any updates to vfio, which would update the shadow io page tables that implement the ngpa->gpa->hpa translation. However given the complexity and performance problems on one side, and the size of the niche that nested device assignment serves, we'll probably limit ourselves to hardware that supports two levels of translations. If nested virtualization really takes off we can use shadowing to provide the guest with emulated hardware that supports two translation level (the solution above uses host hardware with two levels to expose guest hardware with one level).
On 10/15/2012 10:44 AM, liu ping fan wrote: > On Thu, Oct 11, 2012 at 11:48 PM, Avi Kivity <avi@redhat.com> wrote: >> On 10/11/2012 05:34 PM, Michael S. Tsirkin wrote: >>> On Thu, Oct 11, 2012 at 04:35:23PM +0200, Avi Kivity wrote: >>>> On 10/11/2012 04:35 PM, Michael S. Tsirkin wrote: >>>> >>>> >> No, qemu should configure virtio devices to bypass the iommu, even if it >>>> >> is on. >>>> > >>>> > Okay so there will be some API that virtio devices should call >>>> > to achieve this? >>>> >>>> The iommu should probably call pci_device_bypasses_iommu() to check for >>>> such devices. >>> >>> So maybe this patch should depend on the introduction of such >>> an API. >> >> I've dropped it for now. >> >> In fact, virtio/vhost are safe since they use cpu_physical_memory_rw() >> and the memory listener watches address_space_memory, no iommu there. > > Not quite sure your meaning. My understanding is that as a pci > device, vhost can lie behind a iommu in topology, which result in the > transaction launched can be snapped by the emulated iommu. BUT we make > a exception for vhost-dev and enforce > address_space_rw(address_space_memory, ..) NOT > address_space_rw(pci_dev->bus_master_as,..) for vhost device, so we > bypass the iommu. Right? The exception is not just for vhost, but for every virtio device. So the iommu needs to be aware of that, and if it manages a virtio device, it needs to provide a 1:1 translation.
diff --git a/hw/vhost.c b/hw/vhost.c index 0b4ac3f..cd5d9f5 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -451,6 +451,8 @@ static void vhost_region_add(MemoryListener *listener, struct vhost_dev *dev = container_of(listener, struct vhost_dev, memory_listener); + assert(!memory_region_is_iommu(section.mr)); + if (!vhost_section(section)) { return; }
vhost doesn't support guest iommus yet, indicate it to the user by gently depositing a core on their disk. Signed-off-by: Avi Kivity <avi@redhat.com> --- hw/vhost.c | 2 ++ 1 file changed, 2 insertions(+)