Message ID | 1340087992-2399-8-git-send-email-benh@kernel.crashing.org |
---|---|
State | New |
Headers | show |
Hi, > Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate > callback function. When IOMMU support is added, this will mean that > usb_packet_map() and the corresponding usb_packet_unmap() must be called in > close proximity without dropping the qemu device lock Well, that isn't guaranteed ... > - otherwise the guest > might invalidate IOMMU mappings while they are still in use by the device > code. Guest tearing down mapping while usb packets using them are still in flight would be a guest bug. Still not impossible to happen though. How is this case supposed to be handled? cheers, Gerd
On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote: > Well, that isn't guaranteed ... > > > - otherwise the guest > > might invalidate IOMMU mappings while they are still in use by the device > > code. > > Guest tearing down mapping while usb packets using them are still in > flight would be a guest bug. Still not impossible to happen though. How > is this case supposed to be handled? Like with any other device, it's hard ... what would happen on real hardware is that the USB controller will get a target abort, which will result in the controller reporting an error (typically in the PCI status register) and stopping. In qemu we tend not to deal with DMA failures at all. If the scenario above happens, we will potentially continue accessing the guest memory after it has been unmapped. While this is bad, in practice, it's not a huge deal because the USB controller is only accessed by the guest kernel so it's a matter of the guest kernel shooting itself in the foot. So we don't have to fix it as a pre-req to merging the patches, though it would be nice if we did in the long run. The way to fix it is to register a cancel callback (dma_memory_map_with_cancel), which will be called by the iommu code when the translation is invalidated, and which can be used to cancel pending transactions etc... and generally prevent further access to the memory. However the current implementation never calls cancel. Cheers. Ben.
On Wed, Jun 20, 2012 at 06:23:58AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2012-06-19 at 15:42 +0200, Gerd Hoffmann wrote: > > Well, that isn't guaranteed ... > > > > > - otherwise the guest > > > might invalidate IOMMU mappings while they are still in use by the device > > > code. > > > > Guest tearing down mapping while usb packets using them are still in > > flight would be a guest bug. Still not impossible to happen though. How > > is this case supposed to be handled? > > Like with any other device, it's hard ... what would happen on real > hardware is that the USB controller will get a target abort, which will > result in the controller reporting an error (typically in the PCI status > register) and stopping. > > In qemu we tend not to deal with DMA failures at all. > > If the scenario above happens, we will potentially continue accessing > the guest memory after it has been unmapped. While this is bad, in > practice, it's not a huge deal because the USB controller is only > accessed by the guest kernel so it's a matter of the guest kernel > shooting itself in the foot. > > So we don't have to fix it as a pre-req to merging the patches, though > it would be nice if we did in the long run. > > The way to fix it is to register a cancel callback > (dma_memory_map_with_cancel), which will be called by the iommu code > when the translation is invalidated, and which can be used to cancel > pending transactions etc... and generally prevent further access to the > memory. So, in fact the original comment is a bit out of date. With the current version of this series, then a guest attempt to invalidate will be delayed until the unmap occurs. If we discover that leads to delays which are too long then we can add the cancel callback to handle this. However, the USB case should be ok - it may not be theoretically guaranteed that the calls are close, but it's certainly the case at the moment.
On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote: > So, in fact the original comment is a bit out of date. With the > current version of this series, then a guest attempt to invalidate > will be delayed until the unmap occurs. No, this code was dropped, including the tracking of the maps, following comments from Anthony and others. The API for providing a cancel callback is still there but nothing will call it unless the backend does its own tracking and decides to do so. As it is, the race exist but: - It will only hurt the guest - And only for a very buggy guest So the worst case is that it hurts something like kdump. I plan to re-introduce some of the mechanisms for cancellation eventually, but we agreed that it wasn't going to be a show stopper and that we could work on getting that sorted in a second phase. I'm looking at a more efficient way to deal with the tracking of the maps as well since some devices uses them often. Cheers, Ben.
Hi, >> Like with any other device, it's hard ... what would happen on real >> hardware is that the USB controller will get a target abort, which will >> result in the controller reporting an error (typically in the PCI status >> register) and stopping. Not that hard, code to cancel in-flight transactions is in place already as this can happen for other reasons too. > handle this. However, the USB case should be ok - it may not be > theoretically guaranteed that the calls are close, but it's certainly > the case at the moment. Depends on the device. For the usb hid devices (which is the most important use case for power I think) packets will be processed synchronously, so there is no problem here. usb-storage can keep packets in flight without holding the qemu lock (waiting for async block I/O finish). Shouldn't be too long though. usb-host keeps pretty much every packet in flight without holding the qemu lock as it passes on the requests to the hosts usbfs, then waits asynchronously for the request finish before returning the result to the guest. Depending on the kind of device you are passing though this can be *very* long (minutes). cheers, Gerd
On Wed, 2012-06-20 at 08:25 +0200, Gerd Hoffmann wrote: > Hi, > > >> Like with any other device, it's hard ... what would happen on real > >> hardware is that the USB controller will get a target abort, which will > >> result in the controller reporting an error (typically in the PCI status > >> register) and stopping. > > Not that hard, code to cancel in-flight transactions is in place already > as this can happen for other reasons too. Ok, good,. > > handle this. However, the USB case should be ok - it may not be > > theoretically guaranteed that the calls are close, but it's certainly > > the case at the moment. > > Depends on the device. For the usb hid devices (which is the most > important use case for power I think) packets will be processed > synchronously, so there is no problem here. > > usb-storage can keep packets in flight without holding the qemu lock > (waiting for async block I/O finish). Shouldn't be too long though. > > usb-host keeps pretty much every packet in flight without holding the > qemu lock as it passes on the requests to the hosts usbfs, then waits > asynchronously for the request finish before returning the result to the > guest. Depending on the kind of device you are passing though this can > be *very* long (minutes). Right so with the initial patch series I sent, nothing will happen in that we don't actually keep track of mappings, don't call the cancel callback and anyways, OHCI/EHCI don't register a cancel callback. As I wrote earlier, this is not very harmful so it's good to get merged, and we can look into improving it and add the cancellation mechanism on top. There was some original invalidation code from David that was trying to wait on all pending maps but that had issues, Anthony wasn't too happy about it, so I decided to attempt to submit/merge the patch series without solving that issue. To properly implement cancel without too much overhead, we need some tracking of qemu maps and we need a quick way to know when the guest invalidates a translation, if that translation has maps associated with it. The best way to do that, from my little experience messing around with it, is going to essentially be implementation specific (ie depends on the actual iommu backend). For example, on TCEs, I could keep a parallel bitmap indicating when a map is present for a given entry. That could be very efficient if I know that there won't be more than one qemu map at a time for a given entry though, so we should discuss whether that's an acceptable limitation. There's also some "interesting" issues due to the fact that we populate the TCE tables directly from KVM in "real mode" for speed, so that bitmap would need to be some kind of shared memory with the kernel (without locks !) and the kernel would have to be updated to fallback to sending the invalidation hypercalls to qemu when it collides with a populated map entry. It's all doable, it's also a bit tricky, potentially quite a bit of code, new KVM/qemu interfaces, etc... for a problem that's going to be a non-issue pretty much 99.9% of the time :-) We still need to address it, but I haven't convinced myself yet that I have come up with the best solution :-) Cheers, Ben.
Hi, > [ lots of interesting background info snipped ] > It's all doable, it's also a bit tricky, potentially quite a bit of > code, new KVM/qemu interfaces, etc... for a problem that's going to be a > non-issue pretty much 99.9% of the time :-) We still need to address it, > but I haven't convinced myself yet that I have come up with the best > solution :-) Yea, sure, it's in the "nice-to-have" not "must-have" category. And adding complex code which almost never actually runs needs some care indeed. Not having that for the initial merge is perfectly fine with me, just wanted to know what dragons might be lurking there ;) Oh, and: Acked-by: Gerd Hoffmann <kraxel@redhat.com> cheers, Gerd
On Wed, Jun 20, 2012 at 01:52:12PM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-06-20 at 13:14 +1000, David Gibson wrote: > > So, in fact the original comment is a bit out of date. With the > > current version of this series, then a guest attempt to invalidate > > will be delayed until the unmap occurs. > > No, this code was dropped, including the tracking of the maps, following > comments from Anthony and others. The API for providing a cancel > callback is still there but nothing will call it unless the backend does > its own tracking and decides to do so. Ah, right.
diff --git a/hw/usb.h b/hw/usb.h index 2a56fe5..a5623d3 100644 --- a/hw/usb.h +++ b/hw/usb.h @@ -345,7 +345,7 @@ void usb_packet_check_state(USBPacket *p, USBPacketState expected); void usb_packet_setup(USBPacket *p, int pid, USBEndpoint *ep); void usb_packet_addbuf(USBPacket *p, void *ptr, size_t len); int usb_packet_map(USBPacket *p, QEMUSGList *sgl); -void usb_packet_unmap(USBPacket *p); +void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl); void usb_packet_copy(USBPacket *p, void *ptr, size_t bytes); void usb_packet_skip(USBPacket *p, size_t bytes); void usb_packet_cleanup(USBPacket *p); diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c index 5298204..81bbc54 100644 --- a/hw/usb/hcd-ehci.c +++ b/hw/usb/hcd-ehci.c @@ -1422,8 +1422,8 @@ static void ehci_execute_complete(EHCIQueue *q) set_field(&q->qh.token, p->tbytes, QTD_TOKEN_TBYTES); } ehci_finish_transfer(q, p->usb_status); + usb_packet_unmap(&p->packet, &p->sgl); qemu_sglist_destroy(&p->sgl); - usb_packet_unmap(&p->packet); q->qh.token ^= QTD_TOKEN_DTOGGLE; q->qh.token &= ~QTD_TOKEN_ACTIVE; @@ -1547,7 +1547,7 @@ static int ehci_process_itd(EHCIState *ehci, usb_packet_map(&ehci->ipacket, &ehci->isgl); ret = usb_handle_packet(dev, &ehci->ipacket); assert(ret != USB_RET_ASYNC); - usb_packet_unmap(&ehci->ipacket); + usb_packet_unmap(&ehci->ipacket, &ehci->isgl); } else { DPRINTF("ISOCH: attempt to addess non-iso endpoint\n"); ret = USB_RET_NAK; diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c index 9871e24..86888ce 100644 --- a/hw/usb/hcd-uhci.c +++ b/hw/usb/hcd-uhci.c @@ -871,7 +871,7 @@ static int uhci_handle_td(UHCIState *s, uint32_t addr, UHCI_TD *td, done: len = uhci_complete_td(s, td, async, int_mask); - usb_packet_unmap(&async->packet); + usb_packet_unmap(&async->packet, &async->sgl); uhci_async_free(async); return len; } diff --git a/hw/usb/libhw.c b/hw/usb/libhw.c index 2462351..c0de30e 100644 --- a/hw/usb/libhw.c +++ b/hw/usb/libhw.c @@ -26,15 +26,15 @@ int usb_packet_map(USBPacket *p, QEMUSGList *sgl) { - int is_write = (p->pid == USB_TOKEN_IN); - target_phys_addr_t len; + DMADirection dir = (p->pid == USB_TOKEN_IN) ? + DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE; + dma_addr_t len; void *mem; int i; for (i = 0; i < sgl->nsg; i++) { len = sgl->sg[i].len; - mem = cpu_physical_memory_map(sgl->sg[i].base, &len, - is_write); + mem = dma_memory_map(sgl->dma, sgl->sg[i].base, &len, dir); if (!mem) { goto err; } @@ -46,18 +46,19 @@ int usb_packet_map(USBPacket *p, QEMUSGList *sgl) return 0; err: - usb_packet_unmap(p); + usb_packet_unmap(p, sgl); return -1; } -void usb_packet_unmap(USBPacket *p) +void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl) { - int is_write = (p->pid == USB_TOKEN_IN); + DMADirection dir = (p->pid == USB_TOKEN_IN) ? + DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE; int i; for (i = 0; i < p->iov.niov; i++) { - cpu_physical_memory_unmap(p->iov.iov[i].iov_base, - p->iov.iov[i].iov_len, is_write, - p->iov.iov[i].iov_len); + dma_memory_unmap(sgl->dma, p->iov.iov[i].iov_base, + p->iov.iov[i].iov_len, dir, + p->iov.iov[i].iov_len); } }