Message ID | 20200409191736.6233-2-jonathan.derrick@intel.com |
---|---|
State | New |
Headers | show |
Series | Real DMA dev DMA domain patch | expand |
Hi, On 2020/4/10 3:17, Jon Derrick wrote: > The PCI devices handled by intel-iommu may have a DMA requester on another bus, > such as VMD subdevices needing to use the VMD endpoint. > > The real DMA device is now used for the DMA mapping, but one case was missed > earlier: if the VMD device (and hence subdevices too) are under > IOMMU_DOMAIN_IDENTITY, mappings do not work. > > Codepaths like intel_map_page() handle the IOMMU_DOMAIN_DMA case by creating an > iommu DMA mapping, and fall back on dma_direct_map_page() for the > IOMMU_DOMAIN_IDENTITY case. However, handling of the IDENTITY case is broken > when intel_page_page() handles a subdevice. > > We observe that at iommu attach time, dmar_insert_one_dev_info() for the > subdevices will never set dev->archdata.iommu. This is because that function Do you mind telling why not setting this? > uses find_domain() to check if there is already an IOMMU for the device, and > find_domain() then defers to the real DMA device which does have one. Thus > dmar_insert_one_dev_info() returns without assigning dev->archdata.iommu. > > Then, later: > > 1. intel_map_page() checks if an IOMMU mapping is needed by calling > iommu_need_mapping() on the subdevice. identity_mapping() returns > false because dev->archdata.iommu is NULL, so this function > returns false indicating that mapping is needed. > 2. __intel_map_single() is called to create the mapping. > 3. __intel_map_single() calls find_domain(). This function now returns > the IDENTITY domain corresponding to the real DMA device. > 4. __intel_map_single() calls domain_get_iommu() on this "real" domain. > A failure is hit and the entire operation is aborted, because this > codepath is not intended to handle IDENTITY mappings: > if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA)) > return NULL; This is caused by the fragile private domain implementation. We are in process of removing it by enhancing the iommu subsystem with per-group default domain. https://www.spinics.net/lists/iommu/msg42976.html So ultimately VMD subdevices should have their own per-device iommu data and support per-device dma ops. Best regards, baolu
Hi Jon, Thanks for picking this up. Apologies for my absence here - I wasn't able to work on this recently, but I'm back again now. On Fri, Apr 10, 2020 at 3:32 AM Jon Derrick <jonathan.derrick@intel.com> wrote: > This becomes problematic if the real DMA device and the subdevices have > different addressing capabilities and some require translation. Instead we can > put the real DMA dev and any subdevices on the DMA domain. This change assigns > subdevices to the DMA domain, and moves the real DMA device to the DMA domain > if necessary. Have you tested this with the real DMA device in identity mode? It is not quite working for me. (Again, I'm not using VMD here, but have looked closely and believe we're working under the same constraints) First, the real DMA device gets added to the group: pci 0000:00:17.0: Adding to iommu group 9 (it's in IDENTITY mode here) Then later, the first subdevice comes along, and these are the results: pci 10000:00:00.0: [8086:02d7] type 00 class 0x010601 pci 10000:00:00.0: reg 0x10: [mem 0xae1a0000-0xae1a7fff] pci 10000:00:00.0: reg 0x14: [mem 0xae1a8000-0xae1a80ff] pci 10000:00:00.0: reg 0x18: [io 0x3090-0x3097] pci 10000:00:00.0: reg 0x1c: [io 0x3080-0x3083] pci 10000:00:00.0: reg 0x20: [io 0x3060-0x307f] pci 10000:00:00.0: reg 0x24: [mem 0xae100000-0xae103fff] pci 10000:00:00.0: PME# supported from D3hot pci 10000:00:00.0: Adding to iommu group 9 pci 10000:00:00.0: DMAR: Failed to get a private domain. That final message is added by your patch and indicates that it's not working. This is because the subdevice got added to the iommu group before the code you added tried to change to the DMA domain. It first gets added to the group through this call path: intel_iommu_add_device -> iommu_group_get_for_dev -> iommu_group_add_device Then, continuing within intel_iommu_add_device we get to the code you added, which tries to move the real DMA dev to DMA mode instead. It calls: intel_iommu_request_dma_domain_for_dev -> iommu_request_dma_domain_for_dev -> request_default_domain_for_dev Which fails here: /* Don't change mappings of existing devices */ ret = -EBUSY; if (iommu_group_device_count(group) != 1) goto out; because we already have 2 devices in the group (the real DMA dev, plus the subdevice we're in the process of handling now). Next I'll look into the iommu group rework that Baolu mentioned. Thanks, Daniel
On Sun, Apr 12, 2020 at 11:50:09AM +0800, Daniel Drake wrote: > > different addressing capabilities and some require translation. Instead we can > > put the real DMA dev and any subdevices on the DMA domain. This change assigns > > subdevices to the DMA domain, and moves the real DMA device to the DMA domain > > if necessary. > > Have you tested this with the real DMA device in identity mode? > It is not quite working for me. (Again, I'm not using VMD here, but > have looked closely and believe we're working under the same > constraints) So if you are not using VMD how does this matter for upstream?
On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote: > This is caused by the fragile private domain implementation. We are in > process of removing it by enhancing the iommu subsystem with per-group > default domain. > > https://www.spinics.net/lists/iommu/msg42976.html > > So ultimately VMD subdevices should have their own per-device iommu data > and support per-device dma ops. Interesting. There's also this patchset you posted: [PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7 https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html (to be pushed out to 5.8) In there you have: > iommu/vt-d: Don't force 32bit devices to uses DMA domain which seems to clash with the approach being explored in this thread. And: > iommu/vt-d: Apply per-device dma_ops This effectively solves the trip point that caused me to open these discussions, where intel_map_page() -> iommu_need_mapping() would incorrectly determine that a intel-iommu DMA mapping was needed for a PCI subdevice running in identity mode. After this patch, a PCI subdevice in identity mode uses the default system dma_ops and completely avoids intel-iommu. So that solves the issues I was looking at. Jon, you might want to check if the problems you see are likewise solved for you by these patches. I didn't try Joerg's iommu group rework yet as it conflicts with those patches above. Daniel
Hi Daniel, On 2020/4/13 10:25, Daniel Drake wrote: > On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote: >> This is caused by the fragile private domain implementation. We are in >> process of removing it by enhancing the iommu subsystem with per-group >> default domain. >> >> https://www.spinics.net/lists/iommu/msg42976.html >> >> So ultimately VMD subdevices should have their own per-device iommu data >> and support per-device dma ops. > > Interesting. There's also this patchset you posted: > [PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7 > https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html > (to be pushed out to 5.8) Both are trying to solve a same problem. I have sync'ed with Joerg. This patch set will be replaced with Joerg's proposal due to a race concern between domain switching and driver binding. I will rebase all vt-d patches in this set on top of Joerg's change. Best regards, baolu > > In there you have: >> iommu/vt-d: Don't force 32bit devices to uses DMA domain > which seems to clash with the approach being explored in this thread. > > And: >> iommu/vt-d: Apply per-device dma_ops > This effectively solves the trip point that caused me to open these > discussions, where intel_map_page() -> iommu_need_mapping() would > incorrectly determine that a intel-iommu DMA mapping was needed for a > PCI subdevice running in identity mode. After this patch, a PCI > subdevice in identity mode uses the default system dma_ops and > completely avoids intel-iommu. > > So that solves the issues I was looking at. Jon, you might want to > check if the problems you see are likewise solved for you by these > patches. > > I didn't try Joerg's iommu group rework yet as it conflicts with those > patches above. > > Daniel >
On Sun, 2020-04-12 at 11:50 +0800, Daniel Drake wrote: > Hi Jon, > > Thanks for picking this up. Apologies for my absence here - I wasn't > able to work on this recently, but I'm back again now. > > On Fri, Apr 10, 2020 at 3:32 AM Jon Derrick <jonathan.derrick@intel.com> wrote: > > This becomes problematic if the real DMA device and the subdevices have > > different addressing capabilities and some require translation. Instead we can > > put the real DMA dev and any subdevices on the DMA domain. This change assigns > > subdevices to the DMA domain, and moves the real DMA device to the DMA domain > > if necessary. > > Have you tested this with the real DMA device in identity mode? > It is not quite working for me. (Again, I'm not using VMD here, but > have looked closely and believe we're working under the same > constraints) It does work for me when real DMA device starts in Identity, but my 'real DMA device' doesn't do the DMA. It just provides the source-id. Does your 'real DMA device' do DMA? I suppose that could be the reason. You wouldn't want to change the domain on the live device using the method I proposed. > > First, the real DMA device gets added to the group: > pci 0000:00:17.0: Adding to iommu group 9 > (it's in IDENTITY mode here) > > Then later, the first subdevice comes along, and these are the results: > pci 10000:00:00.0: [8086:02d7] type 00 class 0x010601 > pci 10000:00:00.0: reg 0x10: [mem 0xae1a0000-0xae1a7fff] > pci 10000:00:00.0: reg 0x14: [mem 0xae1a8000-0xae1a80ff] > pci 10000:00:00.0: reg 0x18: [io 0x3090-0x3097] > pci 10000:00:00.0: reg 0x1c: [io 0x3080-0x3083] > pci 10000:00:00.0: reg 0x20: [io 0x3060-0x307f] > pci 10000:00:00.0: reg 0x24: [mem 0xae100000-0xae103fff] > pci 10000:00:00.0: PME# supported from D3hot > pci 10000:00:00.0: Adding to iommu group 9 > pci 10000:00:00.0: DMAR: Failed to get a private domain. > > That final message is added by your patch and indicates that it's not working. > > This is because the subdevice got added to the iommu group before the > code you added tried to change to the DMA domain. > > It first gets added to the group through this call path: > intel_iommu_add_device > -> iommu_group_get_for_dev > -> iommu_group_add_device > > Then, continuing within intel_iommu_add_device we get to the code you > added, which tries to move the real DMA dev to DMA mode instead. It > calls: > > intel_iommu_request_dma_domain_for_dev > -> iommu_request_dma_domain_for_dev > -> request_default_domain_for_dev > > Which fails here: > /* Don't change mappings of existing devices */ > ret = -EBUSY; > if (iommu_group_device_count(group) != 1) > goto out; > > because we already have 2 devices in the group (the real DMA dev, plus > the subdevice we're in the process of handling now). > You're right. I see the message too, but it still works for me. > Next I'll look into the iommu group rework that Baolu mentioned. > > Thanks, > Daniel
On Mon, 2020-04-13 at 10:48 +0800, Lu Baolu wrote: > Hi Daniel, > > On 2020/4/13 10:25, Daniel Drake wrote: > > On Fri, Apr 10, 2020 at 9:22 AM Lu Baolu <baolu.lu@linux.intel.com> wrote: > > > This is caused by the fragile private domain implementation. We are in > > > process of removing it by enhancing the iommu subsystem with per-group > > > default domain. > > > > > > https://www.spinics.net/lists/iommu/msg42976.html > > > > > > So ultimately VMD subdevices should have their own per-device iommu data > > > and support per-device dma ops. > > > > Interesting. There's also this patchset you posted: > > [PATCH 00/19] [PULL REQUEST] iommu/vt-d: patches for v5.7 > > https://lists.linuxfoundation.org/pipermail/iommu/2020-April/042967.html > > (to be pushed out to 5.8) > > Both are trying to solve a same problem. > > I have sync'ed with Joerg. This patch set will be replaced with Joerg's > proposal due to a race concern between domain switching and driver > binding. I will rebase all vt-d patches in this set on top of Joerg's > change. > > Best regards, > baolu > Thanks Baolu. I'll pick this back up on top of the for-5.8 changes. > > In there you have: > > > iommu/vt-d: Don't force 32bit devices to uses DMA domain > > which seems to clash with the approach being explored in this thread. > > > > And: > > > iommu/vt-d: Apply per-device dma_ops > > This effectively solves the trip point that caused me to open these > > discussions, where intel_map_page() -> iommu_need_mapping() would > > incorrectly determine that a intel-iommu DMA mapping was needed for a > > PCI subdevice running in identity mode. After this patch, a PCI > > subdevice in identity mode uses the default system dma_ops and > > completely avoids intel-iommu. > > > > So that solves the issues I was looking at. Jon, you might want to > > check if the problems you see are likewise solved for you by these > > patches. > > > > I didn't try Joerg's iommu group rework yet as it conflicts with those > > patches above. > > > > Daniel > >
On Mon, Apr 13, 2020 at 10:48:55AM +0800, Lu Baolu wrote: > I have sync'ed with Joerg. This patch set will be replaced with Joerg's > proposal due to a race concern between domain switching and driver > binding. I will rebase all vt-d patches in this set on top of Joerg's > change. Okay, but is this patch relevant for v5.7? The other changes we are working on will not land before v5.8. Regards, Joerg
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index ef0a5246700e..b4844a502499 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3049,6 +3049,9 @@ static int device_def_domain_type(struct device *dev) if ((iommu_identity_mapping & IDENTMAP_GFX) && IS_GFX_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + if (pci_real_dma_dev(pdev) != pdev) + return IOMMU_DOMAIN_DMA; + /* * We want to start off with all devices in the 1:1 domain, and * take them out later if we find they can't access all of memory. @@ -5781,12 +5784,32 @@ static bool intel_iommu_capable(enum iommu_cap cap) return false; } +static int intel_iommu_request_dma_domain_for_dev(struct device *dev, + struct dmar_domain *domain) +{ + int ret; + + ret = iommu_request_dma_domain_for_dev(dev); + if (ret) { + dmar_remove_one_dev_info(dev); + domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; + if (!get_private_domain_for_dev(dev)) { + dev_warn(dev, + "Failed to get a private domain.\n"); + return -ENOMEM; + } + } + + return 0; +} + static int intel_iommu_add_device(struct device *dev) { struct dmar_domain *dmar_domain; struct iommu_domain *domain; struct intel_iommu *iommu; struct iommu_group *group; + struct device *real_dev = dev; u8 bus, devfn; int ret; @@ -5810,6 +5833,21 @@ static int intel_iommu_add_device(struct device *dev) domain = iommu_get_domain_for_dev(dev); dmar_domain = to_dmar_domain(domain); + + if (dev_is_pci(dev)) + real_dev = &pci_real_dma_dev(to_pci_dev(dev))->dev; + + if (real_dev != dev) { + domain = iommu_get_domain_for_dev(real_dev); + if (domain->type != IOMMU_DOMAIN_DMA) { + dmar_remove_one_dev_info(real_dev); + + ret = intel_iommu_request_dma_domain_for_dev(real_dev, dmar_domain); + if (ret) + goto unlink; + } + } + if (domain->type == IOMMU_DOMAIN_DMA) { if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) { ret = iommu_request_dm_for_dev(dev); @@ -5823,20 +5861,12 @@ static int intel_iommu_add_device(struct device *dev) } } else { if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) { - ret = iommu_request_dma_domain_for_dev(dev); - if (ret) { - dmar_remove_one_dev_info(dev); - dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; - if (!get_private_domain_for_dev(dev)) { - dev_warn(dev, - "Failed to get a private domain.\n"); - ret = -ENOMEM; - goto unlink; - } + ret = intel_iommu_request_dma_domain_for_dev(dev, dmar_domain); + if (ret) + goto unlink; - dev_info(dev, - "Device uses a private dma domain.\n"); - } + dev_info(dev, + "Device uses a private dma domain.\n"); } }