Message ID | 20100629172522.GA8227@localhost |
---|---|
State | New |
Headers | show |
On Tue, Jun 29, 2010 at 6:25 PM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > On the other hand, we could just leave it alone for now. Changing > mappings during DMA is stupid anyway: I don't think the guest can > recover the results of DMA safely, even though it might be used on > transfers in progress you simply don't care about anymore. Paul Brook > suggested we could update the cpu_physical_memory_map() mappings > somehow, but I think that's kinda difficult to accomplish. A malicious or broken guest shouldn't be able to crash or corrupt QEMU process memory. The IOMMU can only map from bus addresses to guest physical RAM (?) so the worst the guest can do here is corrupt itself? Stefan
On Wed, Jun 30, 2010 at 09:37:31AM +0100, Stefan Hajnoczi wrote: > On Tue, Jun 29, 2010 at 6:25 PM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > On the other hand, we could just leave it alone for now. Changing > > mappings during DMA is stupid anyway: I don't think the guest can > > recover the results of DMA safely, even though it might be used on > > transfers in progress you simply don't care about anymore. Paul Brook > > suggested we could update the cpu_physical_memory_map() mappings > > somehow, but I think that's kinda difficult to accomplish. > > A malicious or broken guest shouldn't be able to crash or corrupt QEMU > process memory. The IOMMU can only map from bus addresses to guest > physical RAM (?) so the worst the guest can do here is corrupt itself? > > Stefan That's true, but it's fair to be concerned about the guest itself. Imagine it runs some possibly malicious apps which program the hardware to do DMA. That should be safe when a IOMMU is present. But suddenly the guest OS changes mappings and expects the IOMMU to enforce them as soon as invalidation commands are completed. The guest then reclaims the old space for other uses. This leaves an opportunity for those processes to corrupt or read sensitive data. If the guest OS is prone to changing mappings during DMA, some process could continually set up, e.g., IDE DMA write transfers hoping to expose useful data it can't read otherwise. The buffer can be poisoned to see if someone went for the bait and wrote in that space. Actually I'm not that sure changing mappings during DMA is stupid, as the OS might want to reassign devices (where this is possible) to various processes. Reclaiming mappings seems normal when a processes dies during DMA, as the kernel has no way of telling whether DMA completed (or even started). Eduard
On Thu, Jul 1, 2010 at 8:30 PM, Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> wrote: > On Wed, Jun 30, 2010 at 09:37:31AM +0100, Stefan Hajnoczi wrote: >> On Tue, Jun 29, 2010 at 6:25 PM, Eduard - Gabriel Munteanu >> <eduard.munteanu@linux360.ro> wrote: >> > On the other hand, we could just leave it alone for now. Changing >> > mappings during DMA is stupid anyway: I don't think the guest can >> > recover the results of DMA safely, even though it might be used on >> > transfers in progress you simply don't care about anymore. Paul Brook >> > suggested we could update the cpu_physical_memory_map() mappings >> > somehow, but I think that's kinda difficult to accomplish. >> >> A malicious or broken guest shouldn't be able to crash or corrupt QEMU >> process memory. The IOMMU can only map from bus addresses to guest >> physical RAM (?) so the worst the guest can do here is corrupt itself? >> > That's true, but it's fair to be concerned about the guest itself. > Imagine it runs some possibly malicious apps which program the hardware > to do DMA. That should be safe when a IOMMU is present. > > But suddenly the guest OS changes mappings and expects the IOMMU to > enforce them as soon as invalidation commands are completed. The guest > then reclaims the old space for other uses. This leaves an opportunity > for those processes to corrupt or read sensitive data. As long as QEMU acts in the same way as real hardware we should be okay. Will real hardware change the mappings immediately and abort the DMA from the device if it tries to access an invalidated address? Stefan
On Fri, Jul 02, 2010 at 09:03:39AM +0100, Stefan Hajnoczi wrote: > On Thu, Jul 1, 2010 at 8:30 PM, Eduard - Gabriel Munteanu > <eduard.munteanu@linux360.ro> wrote: > > On Wed, Jun 30, 2010 at 09:37:31AM +0100, Stefan Hajnoczi wrote: > >> On Tue, Jun 29, 2010 at 6:25 PM, Eduard - Gabriel Munteanu > >> <eduard.munteanu@linux360.ro> wrote: > >> > On the other hand, we could just leave it alone for now. Changing > >> > mappings during DMA is stupid anyway: I don't think the guest can > >> > recover the results of DMA safely, even though it might be used on > >> > transfers in progress you simply don't care about anymore. Paul Brook > >> > suggested we could update the cpu_physical_memory_map() mappings > >> > somehow, but I think that's kinda difficult to accomplish. > >> > >> A malicious or broken guest shouldn't be able to crash or corrupt QEMU > >> process memory. ?The IOMMU can only map from bus addresses to guest > >> physical RAM (?) so the worst the guest can do here is corrupt itself? > >> > > That's true, but it's fair to be concerned about the guest itself. > > Imagine it runs some possibly malicious apps which program the hardware > > to do DMA. That should be safe when a IOMMU is present. > > > > But suddenly the guest OS changes mappings and expects the IOMMU to > > enforce them as soon as invalidation commands are completed. The guest > > then reclaims the old space for other uses. This leaves an opportunity > > for those processes to corrupt or read sensitive data. In such a case, OS should put device into quiescence by reset like pci bus reset or pcie function level reset. pci bus reset patch hasn't been merged yet, though. It needs clean up/generalization. > > As long as QEMU acts in the same way as real hardware we should be > okay. Will real hardware change the mappings immediately and abort > the DMA from the device if it tries to access an invalidated address? > > Stefan >
On Fri, Jul 02, 2010 at 09:03:39AM +0100, Stefan Hajnoczi wrote: > > That's true, but it's fair to be concerned about the guest itself. > > Imagine it runs some possibly malicious apps which program the hardware > > to do DMA. That should be safe when a IOMMU is present. > > > > But suddenly the guest OS changes mappings and expects the IOMMU to > > enforce them as soon as invalidation commands are completed. The guest > > then reclaims the old space for other uses. This leaves an opportunity > > for those processes to corrupt or read sensitive data. > > As long as QEMU acts in the same way as real hardware we should be > okay. Will real hardware change the mappings immediately and abort > the DMA from the device if it tries to access an invalidated address? > > Stefan Real, non-IOMMU aware hardware looks like this (simplified): Device <-> IOMMU <-> Memory (1) Most QEMU translated devices would do exactly that. But AIO is something like this: Device + <---translation--- IOMMU (2) ^ | ---------R/W--------> Memory There are two reasons this form isn't reducible to the former in case of AIO: 1. It uses cpu_physical_memory_map(). 2. The actual I/O happens on a separate thread. Real hardware of form (1) has no problems since all access is serialized through the IOMMU. That doesn't mean mapping updates happen instantly. But software can wait (and be notified) for the updates to happen. Real hardware of form (2) is comprised of devices which have their own IOTLB, I think. But unlike cpu_physical_memory_map() in software-land, these mappings can be updated during I/O. (These devices are necessarily IOMMU-aware.) As I see it, this mmaping trick is quite crucial to AIO and I'm not sure there's a way around it. The solution I proposed is making the IOMMU wait* for AIO to finish. Perhaps we can also break mmaping into smaller chunks so they complete in a reasonable time. * - Waiting here means to delay notifying the guest OS that invalidation commands completed. This is the important part: if the guest is told the mappings have been updated, then they really have to be. Eduard
On Fri, Jul 02, 2010 at 06:41:55PM +0900, Isaku Yamahata wrote: > On Fri, Jul 02, 2010 at 09:03:39AM +0100, Stefan Hajnoczi wrote: > > On Thu, Jul 1, 2010 at 8:30 PM, Eduard - Gabriel Munteanu > > <eduard.munteanu@linux360.ro> wrote: > > > But suddenly the guest OS changes mappings and expects the IOMMU to > > > enforce them as soon as invalidation commands are completed. The guest > > > then reclaims the old space for other uses. This leaves an opportunity > > > for those processes to corrupt or read sensitive data. > > In such a case, OS should put device into quiescence by reset like > pci bus reset or pcie function level reset. > pci bus reset patch hasn't been merged yet, though. > It needs clean up/generalization. > > -- > yamahata I wouldn't count on that. When the IOMMU notifies software of command completion, then that notification should be correct. So if we count on 'pci bus reset' we either don't execute INVALIDATE_* and COMPLETION_WAIT commands, or we issue bogus notifications (e.g. they'd be nops). That goes against the specs, and I'm not sure there's any good reason a non-KVM/QEMU-aware OS would reset the device in _all_ cases. For some background on this, mappings updates are followed by INVALIDATE_* commands and then a COMPLETION_WAIT (to wait for invalidation to finish). Eduard
diff --git a/hw/iommu.c b/hw/iommu.c new file mode 100644 index 0000000..410c88c --- /dev/null +++ b/hw/iommu.c @@ -0,0 +1,83 @@ +/* + * IOMMU layer + * + * Copyright (c) 2010 Eduard - Gabriel Munteanu + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include <errno.h> + +#include "iommu.h" + +struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev) +{ + BusState *bus; + + while (dev) { + bus = dev->parent_bus; + if (!bus) + goto out; + + if (bus->iommu) { + *real_dev = dev; + return bus->iommu; + } + + dev = bus->parent; + } + +out: + *real_dev = NULL; + return NULL; +} + +int __iommu_rw(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len, + int is_write) +{ + int plen, err; + target_phys_addr_t paddr; + unsigned perms; + + if (!is_write) + perms = IOMMU_PERM_READ; + else + perms = IOMMU_PERM_WRITE; + + do { + err = iommu->translate(iommu, dev, addr, &paddr, &plen, perms); + if (err) + return err; + if (plen > len) + plen = len; + + cpu_physical_memory_rw(paddr, buf, plen, is_write); + + len -= plen; + addr += plen; + buf += plen; + } while (len); + + return 0; +} + diff --git a/hw/iommu.h b/hw/iommu.h index cb51a6e..01978ab 100644 --- a/hw/iommu.h +++ b/hw/iommu.h @@ -1,8 +1,204 @@ #ifndef QEMU_IOMMU_H #define QEMU_IOMMU_H -#include <targphys.h> +#include "pci.h" +#include "targphys.h" +#include "qdev.h" -extern target_phys_addr_t iommu_translate(int bdf, target_phys_addr_t addr); +/* Don't use directly. */ +struct iommu { + void *opaque; + + void (*register_device)(struct iommu *iommu, + DeviceState *dev); + int (*translate)(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + target_phys_addr_t *paddr, + int *len, + unsigned perms); + int (*start_transaction)(struct iommu *iommu, + DeviceState *dev); + void (*end_transaction)(struct iommu *iommu, + DeviceState *dev); +}; + +#define IOMMU_PERM_READ (1 << 0) +#define IOMMU_PERM_WRITE (1 << 1) + +#define IOMMU_PERM_RW (IOMMU_PERM_READ | IOMMU_PERM_WRITE) + +static inline int iommu_nop_translate(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + target_phys_addr_t *paddr, + int *len, + unsigned perms) +{ + *paddr = addr; + *len = INT_MAX; + + return 0; +} + +static inline int iommu_nop_rw(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len, + int is_write) +{ + cpu_physical_memory_rw(addr, buf, len, is_write); + + return 0; +} + +static inline int iommu_register_device(struct iommu *iommu, + DeviceState *dev) +{ + if (iommu && iommu->register_device) + iommu->register_device(iommu, dev); + + return 0; +} + +#ifdef CONFIG_IOMMU + +extern struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev); + +/** + * Translates an address for the given device and performs access checking. + * + * Defined in implementation-specific IOMMU code. + * + * @iommu IOMMU + * @dev qdev device + * @addr address to be translated + * @paddr translated address + * @len number of bytes for which the translation is valid + * @rw read or write? + * + * Returns 0 iff translation and access checking succeeded. + */ +static inline int iommu_translate(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + target_phys_addr_t *paddr, + int *len, + unsigned perms) +{ + if (iommu && iommu->translate) + return iommu->translate(iommu, dev, addr, paddr, len, perms); + + return iommu_nop_translate(iommu, dev, addr, paddr, len, perms); +} + +extern int __iommu_rw(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len, + int is_write); + +/** + * Performs I/O with address translation and access checking. + * + * Defined in generic IOMMU code. + * + * @iommu IOMMU + * @dev qdev device + * @addr address where to perform I/O + * @buf buffer to read from or write to + * @len length of the operation + * @rw read or write? + * + * Returns 0 iff the I/O operation succeeded. + */ +static inline int iommu_rw(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len, + int is_write) +{ + if (iommu && iommu->translate) + return __iommu_rw(iommu, dev, addr, buf, len, is_write); + + return iommu_nop_rw(iommu, dev, addr, buf, len, is_write); +} + +static inline int iommu_start_transaction(struct iommu *iommu, + DeviceState *dev) +{ + if (iommu && iommu->start_transaction) + return iommu->start_transaction(iommu, dev); + + return 0; +} + +static inline void iommu_end_transaction(struct iommu *iommu, + DeviceState *dev) +{ + if (iommu && iommu->end_transaction) + iommu->end_transaction(iommu, dev); +} + +#else /* CONFIG_IOMMU */ + +static inline struct iommu *iommu_get(DeviceState *dev, DeviceState **real_dev) +{ + return NULL; +} + +static inline int iommu_translate(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + target_phys_addr_t *paddr, + int *len, + unsigned perms) +{ + return iommu_nop_translate(iommu, dev, addr, paddr, len, perms); +} + +static inline int iommu_rw(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len, + int is_write) +{ + return iommu_nop_rw(iommu, dev, addr, buf, len, is_write); +} + +static inline int iommu_start_transaction(struct iommu *iommu, + DeviceState *dev) +{ + return 0; +} + +static inline void iommu_end_transaction(struct iommu *iommu, + DeviceState *dev) +{ +} + +#endif /* CONFIG_IOMMU */ + +static inline int iommu_read(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + uint8_t *buf, + int len) +{ + return iommu_rw(iommu, dev, addr, buf, len, 0); +} + +static inline int iommu_write(struct iommu *iommu, + DeviceState *dev, + target_phys_addr_t addr, + const uint8_t *buf, + int len) +{ + return iommu_rw(iommu, dev, addr, (uint8_t *) buf, len, 1); +} #endif diff --git a/hw/qdev.h b/hw/qdev.h index a44060e..6e1e633 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -56,6 +56,8 @@ struct BusInfo { Property *props; }; +struct iommu; + struct BusState { DeviceState *parent; BusInfo *info; @@ -64,6 +66,10 @@ struct BusState { int qdev_allocated; QLIST_HEAD(, DeviceState) children; QLIST_ENTRY(BusState) sibling; + +#ifdef CONFIG_IOMMU + struct iommu *iommu; +#endif }; struct Property {