Message ID | 20171006133203.22803-1-jean-philippe.brucker@arm.com |
---|---|
Headers | show |
Series | Process management for IOMMU + SVM for SMMUv3 | expand |
On Fri, Oct 06, 2017 at 02:32:01PM +0100, Jean-Philippe Brucker wrote: > The PASID ECN to the PCIe spec added a bit in the PRI status register that > allows a Function to declare whether a PRG Response should contain the > PASID prefix or not. > > Move the helper that accesses it from amd_iommu into the PCI subsystem, > renaming it to be consistent with the current spec (PRPR - PRG Response > PASID Required). > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> I assume this will be merged with the rest of the series, probably via an IOMMU tree. > --- > drivers/iommu/amd_iommu.c | 19 +------------------ > drivers/pci/ats.c | 17 +++++++++++++++++ > include/linux/pci-ats.h | 8 ++++++++ > include/uapi/linux/pci_regs.h | 1 + > 4 files changed, 27 insertions(+), 18 deletions(-) > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 51f8215877f5..45036a253d63 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2039,23 +2039,6 @@ static int pdev_iommuv2_enable(struct pci_dev *pdev) > return ret; > } > > -/* FIXME: Move this to PCI code */ > -#define PCI_PRI_TLP_OFF (1 << 15) > - > -static bool pci_pri_tlp_required(struct pci_dev *pdev) > -{ > - u16 status; > - int pos; > - > - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); > - if (!pos) > - return false; > - > - pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status); > - > - return (status & PCI_PRI_TLP_OFF) ? true : false; > -} > - > /* > * If a device is not yet associated with a domain, this function > * assigns it visible for the hardware > @@ -2084,7 +2067,7 @@ static int attach_device(struct device *dev, > > dev_data->ats.enabled = true; > dev_data->ats.qdep = pci_ats_queue_depth(pdev); > - dev_data->pri_tlp = pci_pri_tlp_required(pdev); > + dev_data->pri_tlp = pci_prg_resp_requires_prefix(pdev); > } > } else if (amd_iommu_iotlb_sup && > pci_enable_ats(pdev, PAGE_SHIFT) == 0) { > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index ad8ddbbbf245..f95e42df728b 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -389,3 +389,20 @@ int pci_max_pasids(struct pci_dev *pdev) > } > EXPORT_SYMBOL_GPL(pci_max_pasids); > #endif /* CONFIG_PCI_PASID */ > + > +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI) > +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev) > +{ > + u16 status; > + int pos; > + > + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI); > + if (!pos) > + return false; > + > + pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status); > + > + return !!(status & PCI_PRI_STATUS_PRPR); > +} > +EXPORT_SYMBOL_GPL(pci_prg_resp_requires_prefix); > +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */ > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h > index 782fb8e0755f..367ea9448441 100644 > --- a/include/linux/pci-ats.h > +++ b/include/linux/pci-ats.h > @@ -67,5 +67,13 @@ static inline int pci_max_pasids(struct pci_dev *pdev) > > #endif /* CONFIG_PCI_PASID */ > > +#if defined(CONFIG_PCI_PASID) && defined(CONFIG_PCI_PRI) > +bool pci_prg_resp_requires_prefix(struct pci_dev *pdev); > +#else > +static inline bool pci_prg_resp_requires_prefix(struct pci_dev *pdev) > +{ > + return false; > +} > +#endif /* CONFIG_PCI_PASID && CONFIG_PCI_PRI */ > > #endif /* LINUX_PCI_ATS_H*/ > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index f8d58045926f..a0eeb16a2bfe 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -862,6 +862,7 @@ > #define PCI_PRI_STATUS_RF 0x001 /* Response Failure */ > #define PCI_PRI_STATUS_UPRGI 0x002 /* Unexpected PRG index */ > #define PCI_PRI_STATUS_STOPPED 0x100 /* PRI Stopped */ > +#define PCI_PRI_STATUS_PRPR 0x8000 /* PRG Response requires PASID prefix */ > #define PCI_PRI_MAX_REQ 0x08 /* PRI max reqs supported */ > #define PCI_PRI_ALLOC_REQ 0x0c /* PRI max reqs allowed */ > #define PCI_EXT_CAP_PRI_SIZEOF 16 > -- > 2.13.3 > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > Following discussions at plumbers and elsewhere, it seems like we need to > unify some of the Shared Virtual Memory (SVM) code, in order to define > clear semantics for the SVM API. > > My previous RFC [1] was centered on the SMMUv3, but some of this code will > need to be reused by the SMMUv2 and virtio-iommu drivers. This second > proposal focuses on abstracting a little more into the core IOMMU API, and > also trying to find common ground for all SVM-capable IOMMUs. > > SVM is, in the context of the IOMMU, sharing page tables between a process > and a device. Traditionally it requires IO Page Fault and Process Address > Space ID capabilities in device and IOMMU. > > * A device driver can bind a process to a device, with iommu_process_bind. > Internally we hold on to the mm and get notified of its activity with an > mmu_notifier. The bond is removed by exit_mm, by a call to > iommu_process_unbind or iommu_detach_device. > > * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the > device driver, which programs it into the device to access the process > address space. > > * The device and the IOMMU support recoverable page faults. This can be > either ATS+PRI for PCI, or platform-specific mechanisms such as Stall > for SMMU. > > Ideally systems wanting to use SVM have to support these three features, > but in practice we'll see implementations supporting just a subset of > them, especially in validation environments. So even if this particular > patchset assumes all three capabilities, it should also be possible to > support PASID without IOPF (by pinning everything, see non-system SVM in > OpenCL) How to pin everything? If user malloc anything we should pin it. It should from user or driver? > , or IOPF without PASID (sharing the single device address space > with a process, could be useful for DPDK+VFIO). > > Implementing both these cases would enable PT sharing alone. Some people > would also like IOPF alone without SVM (covered by this series) or process > management without shared PT (not covered). Using these features > individually is also important for testing, as SVM is in its infancy and > providing easy ways to test is essential to reduce the number of quirks > down the line. > > Process management > ================== > > The first part of this series introduces boilerplate code for managing > PASIDs and processes bound to devices. It's something any IOMMU driver > that wants to support bind/unbind will have to do, and it is difficult to > get right. > > Patches > 1: iommu_process and PASID allocation, attach and release > 2: process_exit callback for device drivers > 3: iommu_process search by PASID > 4: track process changes with an MMU notifiers > 5: bind and unbind operations > > My proposal uses the following model: > > * The PASID space is system-wide. This means that a Linux process will > have a single PASID. I introduce the iommu_process structure and a > global IDR to manage this. > > * An iommu_process can be bound to multiple domains, and a domain can have > multiple iommu_process. when bind a task to device, can we create a single domain for it? I am thinking about process management without shared PT(for some device only support PASID without pri ability), it seems hard to expand if a domain have multiple iommu_process? Do you have any idea about this? > > * IOMMU groups share same PASID table. IOMMU groups are a convenient way > to cover various hardware weaknesses that prevent a group of device to > be isolated by the IOMMU (untrusted bridge, for instance). It's foolish > to assume that all PASID implementations will perfectly isolate devices > within a bus and functions within a device, so let's assume all devices > within an IOMMU group have to share PASID traffic as well. In general > there will be a single device per group. > > * It's up to the driver implementation to decide where to implement the > PASID tables. For SMMU it's more convenient to have a single PASID table > per domain. And I think the model fits better with the existing IOMMU > API: IOVA traffic is shared by all devices in a domain, so should PASID > traffic. What's the meaning of "share PASID traffic"? PASID space is system-wide, and a domain can have multiple iommu_process , so a domain can have multiple PASIDs , one PASID for a iommu_process, right? Yisheng Xie Thanks -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On 09/10/17 10:49, Yisheng Xie wrote: > Hi Jean, > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> Following discussions at plumbers and elsewhere, it seems like we need to >> unify some of the Shared Virtual Memory (SVM) code, in order to define >> clear semantics for the SVM API. >> >> My previous RFC [1] was centered on the SMMUv3, but some of this code will >> need to be reused by the SMMUv2 and virtio-iommu drivers. This second >> proposal focuses on abstracting a little more into the core IOMMU API, and >> also trying to find common ground for all SVM-capable IOMMUs. >> >> SVM is, in the context of the IOMMU, sharing page tables between a process >> and a device. Traditionally it requires IO Page Fault and Process Address >> Space ID capabilities in device and IOMMU. >> >> * A device driver can bind a process to a device, with iommu_process_bind. >> Internally we hold on to the mm and get notified of its activity with an >> mmu_notifier. The bond is removed by exit_mm, by a call to >> iommu_process_unbind or iommu_detach_device. >> >> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the >> device driver, which programs it into the device to access the process >> address space. >> >> * The device and the IOMMU support recoverable page faults. This can be >> either ATS+PRI for PCI, or platform-specific mechanisms such as Stall >> for SMMU. >> >> Ideally systems wanting to use SVM have to support these three features, >> but in practice we'll see implementations supporting just a subset of >> them, especially in validation environments. So even if this particular >> patchset assumes all three capabilities, it should also be possible to >> support PASID without IOPF (by pinning everything, see non-system SVM in >> OpenCL) > How to pin everything? If user malloc anything we should pin it. It should > from user or driver? For userspace drivers, I guess it would be via a VFIO ioctl, that does the same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map. For things like OpenCL SVM buffers, it's the kernel driver that does the pinning, just like VFIO does it, before launching the work on a SVM buffer. >> , or IOPF without PASID (sharing the single device address space >> with a process, could be useful for DPDK+VFIO). >> >> Implementing both these cases would enable PT sharing alone. Some people >> would also like IOPF alone without SVM (covered by this series) or process >> management without shared PT (not covered). Using these features >> individually is also important for testing, as SVM is in its infancy and >> providing easy ways to test is essential to reduce the number of quirks >> down the line. >> >> Process management >> ================== >> >> The first part of this series introduces boilerplate code for managing >> PASIDs and processes bound to devices. It's something any IOMMU driver >> that wants to support bind/unbind will have to do, and it is difficult to >> get right. >> >> Patches >> 1: iommu_process and PASID allocation, attach and release >> 2: process_exit callback for device drivers >> 3: iommu_process search by PASID >> 4: track process changes with an MMU notifiers >> 5: bind and unbind operations >> >> My proposal uses the following model: >> >> * The PASID space is system-wide. This means that a Linux process will >> have a single PASID. I introduce the iommu_process structure and a >> global IDR to manage this. >> >> * An iommu_process can be bound to multiple domains, and a domain can have >> multiple iommu_process. > when bind a task to device, can we create a single domain for it? I am thinking > about process management without shared PT(for some device only support PASID > without pri ability), it seems hard to expand if a domain have multiple iommu_process? > Do you have any idea about this? A device always has to be in a domain, as far as I know. Not supporting PRI forces you to pin down all user mappings (or just the ones you use for DMA) but you should sill be able to share PT. Now if you don't support shared PT either, but only PASID, then you'll have to use io-pgtable and a new map/unmap API on an iommu_process. I don't understand your concern though, how would the link between process and domains prevent this use-case? >> * IOMMU groups share same PASID table. IOMMU groups are a convenient way >> to cover various hardware weaknesses that prevent a group of device to >> be isolated by the IOMMU (untrusted bridge, for instance). It's foolish >> to assume that all PASID implementations will perfectly isolate devices >> within a bus and functions within a device, so let's assume all devices >> within an IOMMU group have to share PASID traffic as well. In general >> there will be a single device per group. >> >> * It's up to the driver implementation to decide where to implement the >> PASID tables. For SMMU it's more convenient to have a single PASID table >> per domain. And I think the model fits better with the existing IOMMU >> API: IOVA traffic is shared by all devices in a domain, so should PASID >> traffic. > What's the meaning of "share PASID traffic"? PASID space is system-wide, > and a domain can have multiple iommu_process , so a domain can have multiple > PASIDs , one PASID for a iommu_process, right? Yes, I meant that if a device can access mappings for a specific PASID, then other devices in that same domain are also able to access them. A few reasons for this choice in the SMMU: * As all devices in an IOMMU group will be in the same domain and share the same PASID traffic, it encompasses that case. Groups are the smallest isolation granularity, then users are free to choose to put different IOMMU groups in different domains. * For architectures that can have both non-PASID and PASID traffic simultaneously, like the SMMU, it is simpler to reason about PASID tables being a domain, rather than sharing PASID0 within the domain and handling all others per device. * It's the same principle as non-PASID mappings (iommu_map/unmap is on a domain). * It implement the classic example of IOMMU architectures where multiple device descriptors point to the same PASID tables. * It may be desirable for drivers to share PASIDs within a domain, if they are actually using domains for conveniently sharing address spaces between devices. I'm not sure how much this is used as a feature. It does model a shared bus where each device can snoop DMA, so it may be useful. bind/unbind operations are done on devices and not domains, though, because it allows users to know which device supports PASID, PRI, etc. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean-Philipe, Thanks for your patches, this is definitly a step in the right direction to get generic support for IO virtual memory into the IOMMU core code. But I see an issue with the design around task_struct, please see below. On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote: > +int iommu_process_bind_device(struct device *dev, struct task_struct *task, > + int *pasid, int flags) I just took this patch as an example, it is in the other patches as well. The code is designed around 'struct task_struct' while it should really be designed around 'struct mm_struct'. You are not attaching a specific process to a device, but the address-space of one or more processes. And that should be reflected in the design. There are several bad implications of building it around task_struct, one is the life-time of the binding. If the address space is detached from the device when the process exits, only the thread that did the setup can can safely make use of the device, because if the device is accessed from another thread it will crash when the setup-thread exits. There are other benefits of using mm_struct, for example that you can use mmu_notifiers to exit-handling. Here is how I think the base API should look like: * iommu_iovm_device_init(struct device *dev); iommu_iovm_device_shutdown(struct device *dev); These two functions do the device specific setup/shutdown. For PCI this would include enabling the PRI, PASID, and ATS capabilities and setting up a PASID table for the device. * iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm, iovm_shutdown_cb *cb); iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm); These functions add and delete the entries in the PASID table for the device and setup mmu_notifiers for the mm_struct to keep IOMMU TLBs in sync with the CPU TLBs. The shutdown_cb is invoked by the IOMMU core code when the mm_struct goes away, for example because the process segfaults. The PASID handling is best done these functions as well, unless there is a strong reason to allow device drivers to do the handling themselves. The context data can be stored directly in mm_struct, including the PASID for that mm. There is of course more functionality needed, the above only outlines the very basic needs. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Joerg, On 11/10/17 12:33, Joerg Roedel wrote: > Hi Jean-Philipe, > > Thanks for your patches, this is definitly a step in the right direction > to get generic support for IO virtual memory into the IOMMU core code. > > But I see an issue with the design around task_struct, please see > below. > > On Fri, Oct 06, 2017 at 02:31:32PM +0100, Jean-Philippe Brucker wrote: >> +int iommu_process_bind_device(struct device *dev, struct task_struct *task, >> + int *pasid, int flags) > > I just took this patch as an example, it is in the other patches as > well. The code is designed around 'struct task_struct' while it should > really be designed around 'struct mm_struct'. You are not attaching a > specific process to a device, but the address-space of one or more > processes. And that should be reflected in the design. Agreed. The task struct is only really needed for obtaining the mm in my code. It also keeps hold of a pid, but that's wrong and easy to remove. > There are several bad implications of building it around task_struct, > one is the life-time of the binding. If the address space is detached > from the device when the process exits, only the thread that did the > setup can can safely make use of the device, because if the device is > accessed from another thread it will crash when the setup-thread exits. > > There are other benefits of using mm_struct, for example that you can > use mmu_notifiers to exit-handling. > > Here is how I think the base API should look like: > > * iommu_iovm_device_init(struct device *dev); > iommu_iovm_device_shutdown(struct device *dev); > > These two functions do the device specific setup/shutdown. For > PCI this would include enabling the PRI, PASID, and ATS > capabilities and setting up a PASID table for the device. Ok. On SMMU the PASID table also hosts the non-PASID page table pointer, so the table and capability cannot be setup later than attach_device (and we'll probably have to enable PASID in add_device). But I suppose it's an implementation detail. Some device drivers will want to use ATS alone, for accelerating IOVA traffic. Should we enable it automatically, or provide device drivers with a way to enable it manually? According to the PCI spec, PASID has to be enabled before ATS, so device_init would have to first disable ATS in that case. > * iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm, > iovm_shutdown_cb *cb); > iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm); > > These functions add and delete the entries in the PASID table > for the device and setup mmu_notifiers for the mm_struct to > keep IOMMU TLBs in sync with the CPU TLBs. > > The shutdown_cb is invoked by the IOMMU core code when the > mm_struct goes away, for example because the process > segfaults. > > The PASID handling is best done these functions as well, unless > there is a strong reason to allow device drivers to do the > handling themselves. > > The context data can be stored directly in mm_struct, including the > PASID for that mm. Changing mm_struct probably isn't required at the moment, since the mm subsystem won't use the context data or the PASID. Outside of drivers/iommu/, only the caller of bind_mm needs the PASID in order to program it into the device. The only advantage I see would be slightly faster bind(), when finding out if a mm is already bound to devices. But we don't really need fast bind(), so I don't think we'd have enough material to argue for a change in mm_struct. We do need to allocate a separate "iommu_mm_struct" wrapper to store the mmu_notifier. Maybe bind() could return this structure (that contains the PASID), and unbind() would take this iommu_mm_struct as argument? Thanks Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, Thanks for replying. On 2017/10/9 19:36, Jean-Philippe Brucker wrote: > Hi, > > On 09/10/17 10:49, Yisheng Xie wrote: >> Hi Jean, >> >> On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >>> Following discussions at plumbers and elsewhere, it seems like we need to >>> unify some of the Shared Virtual Memory (SVM) code, in order to define >>> clear semantics for the SVM API. >>> >>> My previous RFC [1] was centered on the SMMUv3, but some of this code will >>> need to be reused by the SMMUv2 and virtio-iommu drivers. This second >>> proposal focuses on abstracting a little more into the core IOMMU API, and >>> also trying to find common ground for all SVM-capable IOMMUs. >>> >>> SVM is, in the context of the IOMMU, sharing page tables between a process >>> and a device. Traditionally it requires IO Page Fault and Process Address >>> Space ID capabilities in device and IOMMU. >>> >>> * A device driver can bind a process to a device, with iommu_process_bind. >>> Internally we hold on to the mm and get notified of its activity with an >>> mmu_notifier. The bond is removed by exit_mm, by a call to >>> iommu_process_unbind or iommu_detach_device. >>> >>> * iommu_process_bind returns a 20-bit PASID (PCI terminology) to the >>> device driver, which programs it into the device to access the process >>> address space. >>> >>> * The device and the IOMMU support recoverable page faults. This can be >>> either ATS+PRI for PCI, or platform-specific mechanisms such as Stall >>> for SMMU. >>> >>> Ideally systems wanting to use SVM have to support these three features, >>> but in practice we'll see implementations supporting just a subset of >>> them, especially in validation environments. So even if this particular >>> patchset assumes all three capabilities, it should also be possible to >>> support PASID without IOPF (by pinning everything, see non-system SVM in >>> OpenCL) >> How to pin everything? If user malloc anything we should pin it. It should >> from user or driver? > > For userspace drivers, I guess it would be via a VFIO ioctl, that does the > same preparatory work as VFIO_IOMMU_MAP_DMA, but doesn't call iommu_map. > For things like OpenCL SVM buffers, it's the kernel driver that does the > pinning, just like VFIO does it, before launching the work on a SVM buffer. > >>> , or IOPF without PASID (sharing the single device address space >>> with a process, could be useful for DPDK+VFIO). >>> >>> Implementing both these cases would enable PT sharing alone. Some people >>> would also like IOPF alone without SVM (covered by this series) or process >>> management without shared PT (not covered). Using these features >>> individually is also important for testing, as SVM is in its infancy and >>> providing easy ways to test is essential to reduce the number of quirks >>> down the line. >>> >>> Process management >>> ================== >>> >>> The first part of this series introduces boilerplate code for managing >>> PASIDs and processes bound to devices. It's something any IOMMU driver >>> that wants to support bind/unbind will have to do, and it is difficult to >>> get right. >>> >>> Patches >>> 1: iommu_process and PASID allocation, attach and release >>> 2: process_exit callback for device drivers >>> 3: iommu_process search by PASID >>> 4: track process changes with an MMU notifiers >>> 5: bind and unbind operations >>> >>> My proposal uses the following model: >>> >>> * The PASID space is system-wide. This means that a Linux process will >>> have a single PASID. I introduce the iommu_process structure and a >>> global IDR to manage this. >>> >>> * An iommu_process can be bound to multiple domains, and a domain can have >>> multiple iommu_process. >> when bind a task to device, can we create a single domain for it? I am thinking >> about process management without shared PT(for some device only support PASID >> without pri ability), it seems hard to expand if a domain have multiple iommu_process? >> Do you have any idea about this? > > A device always has to be in a domain, as far as I know. Not supporting > PRI forces you to pin down all user mappings (or just the ones you use for > DMA) but you should sill be able to share PT. Now if you don't support > shared PT either, but only PASID, then you'll have to use io-pgtable and a > new map/unmap API on an iommu_process. I don't understand your concern > though, how would the link between process and domains prevent this use-case? > So you mean that if an iommu_process bind to multiple devices it should create multiple io-pgtables? or just share the same io-pgtable? >>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way >>> to cover various hardware weaknesses that prevent a group of device to >>> be isolated by the IOMMU (untrusted bridge, for instance). It's foolish >>> to assume that all PASID implementations will perfectly isolate devices >>> within a bus and functions within a device, so let's assume all devices >>> within an IOMMU group have to share PASID traffic as well. In general >>> there will be a single device per group. >>> >>> * It's up to the driver implementation to decide where to implement the >>> PASID tables. For SMMU it's more convenient to have a single PASID table >>> per domain. And I think the model fits better with the existing IOMMU >>> API: IOVA traffic is shared by all devices in a domain, so should PASID >>> traffic. >> What's the meaning of "share PASID traffic"? PASID space is system-wide, >> and a domain can have multiple iommu_process , so a domain can have multiple >> PASIDs , one PASID for a iommu_process, right? I get what your mean now, thanks for your explain. > > Yes, I meant that if a device can access mappings for a specific PASID, > then other devices in that same domain are also able to access them. > > A few reasons for this choice in the SMMU: > * As all devices in an IOMMU group will be in the same domain and share > the same PASID traffic, it encompasses that case. Groups are the smallest > isolation granularity, then users are free to choose to put different > IOMMU groups in different domains. > * For architectures that can have both non-PASID and PASID traffic > simultaneously, like the SMMU, it is simpler to reason about PASID tables > being a domain, rather than sharing PASID0 within the domain and handling > all others per device. > * It's the same principle as non-PASID mappings (iommu_map/unmap is on a > domain). > * It implement the classic example of IOMMU architectures where multiple > device descriptors point to the same PASID tables. > * It may be desirable for drivers to share PASIDs within a domain, if they > are actually using domains for conveniently sharing address spaces between > devices. I'm not sure how much this is used as a feature. It does model a > shared bus where each device can snoop DMA, so it may be useful. > I get another question about this design, thinking about the following case: If a platform device with PASID ability, e.g. accelerator, which have multiple accelerator process units(APUs), it may create multiple virtual devices, one virtual device represent for an APU, with the same sid. They can be in different groups, however must be in the same domain as this design, for domain held the PASID table, right? So how could they be used by different guest OS? Thanks Yisheng Xie > bind/unbind operations are done on devices and not domains, though, > because it allows users to know which device supports PASID, PRI, etc. > > Thanks, > Jean > > . > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean-Philippe, On Thu, Oct 12, 2017 at 12:13:20PM +0100, Jean-Philippe Brucker wrote: > On 11/10/17 12:33, Joerg Roedel wrote: > > Here is how I think the base API should look like: > > > > * iommu_iovm_device_init(struct device *dev); > > iommu_iovm_device_shutdown(struct device *dev); > > > > These two functions do the device specific setup/shutdown. For > > PCI this would include enabling the PRI, PASID, and ATS > > capabilities and setting up a PASID table for the device. > > Ok. On SMMU the PASID table also hosts the non-PASID page table pointer, > so the table and capability cannot be setup later than attach_device (and > we'll probably have to enable PASID in add_device). But I suppose it's an > implementation detail. Right, when the capabilities are enabled is an implementation detail of the iommu-drivers. > Some device drivers will want to use ATS alone, for accelerating IOVA > traffic. Should we enable it automatically, or provide device drivers with > a way to enable it manually? According to the PCI spec, PASID has to be > enabled before ATS, so device_init would have to first disable ATS in that > case. Yes, driver can enable ATS for normal use of a device, and disable/re-enable it when the driver requests PASID/PRI functionality. That is also an implementation detail. You should probably also document that the init/shutdown functions may interrupt device operation, so that driver writers are aware of that. > > > * iommu_iovm_bind_mm(struct device *dev, struct mm_struct *mm, > > iovm_shutdown_cb *cb); > > iommu_iovm_unbind_mm(struct device *dev, struct mm_struct *mm); > > > > These functions add and delete the entries in the PASID table > > for the device and setup mmu_notifiers for the mm_struct to > > keep IOMMU TLBs in sync with the CPU TLBs. > > > > The shutdown_cb is invoked by the IOMMU core code when the > > mm_struct goes away, for example because the process > > segfaults. > > > > The PASID handling is best done these functions as well, unless > > there is a strong reason to allow device drivers to do the > > handling themselves. > > > > The context data can be stored directly in mm_struct, including the > > PASID for that mm. > > Changing mm_struct probably isn't required at the moment, since the mm > subsystem won't use the context data or the PASID. Outside of > drivers/iommu/, only the caller of bind_mm needs the PASID in order to > program it into the device. The only advantage I see would be slightly > faster bind(), when finding out if a mm is already bound to devices. But > we don't really need fast bind(), so I don't think we'd have enough > material to argue for a change in mm_struct. The idea behind storing the PASID in mm_struct is that we have a system-wide PASID-allocator and only one PASID per address space, even when accessed from multiple devices. There will be hardware implemenations where this is required, afaik. It doesn't mean that it _needs_ to be part of mm_struct, but it is certainly easier than tracking this 1-1 relation separatly. > We do need to allocate a separate "iommu_mm_struct" wrapper to store the > mmu_notifier. Maybe bind() could return this structure (that contains the > PASID), and unbind() would take this iommu_mm_struct as argument? I'd like to track this iommu_mm_struct only in iommu-code, otherwise drivers need to track the pointer somewhere. And we need to track the mm_struct->iommu_mm_struct relation anyway in core-code to handle events like segfaults, when the whole mm_struct goes away under us. So when we track this in core code, there is no need to track this again in the device drivers. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/10/17 13:05, Yisheng Xie wrote: [...] >>>> * An iommu_process can be bound to multiple domains, and a domain can have >>>> multiple iommu_process. >>> when bind a task to device, can we create a single domain for it? I am thinking >>> about process management without shared PT(for some device only support PASID >>> without pri ability), it seems hard to expand if a domain have multiple iommu_process? >>> Do you have any idea about this? >> >> A device always has to be in a domain, as far as I know. Not supporting >> PRI forces you to pin down all user mappings (or just the ones you use for >> DMA) but you should sill be able to share PT. Now if you don't support >> shared PT either, but only PASID, then you'll have to use io-pgtable and a >> new map/unmap API on an iommu_process. I don't understand your concern >> though, how would the link between process and domains prevent this use-case? >> > So you mean that if an iommu_process bind to multiple devices it should create > multiple io-pgtables? or just share the same io-pgtable? I don't know to be honest, I haven't thought much about the io-pgtable case, I'm all about sharing the mm :) It really depends on what the user (GPU driver I assume) wants. I think that if you're not sharing an mm with the device, then you're trying to hide parts of the process to the device, so you'd also want the flexibility of having different io-pgtables between devices. Different devices accessing isolated parts of the process requires separate io-pgtables. >>>> * IOMMU groups share same PASID table. IOMMU groups are a convenient way >>>> to cover various hardware weaknesses that prevent a group of device to >>>> be isolated by the IOMMU (untrusted bridge, for instance). It's foolish >>>> to assume that all PASID implementations will perfectly isolate devices >>>> within a bus and functions within a device, so let's assume all devices >>>> within an IOMMU group have to share PASID traffic as well. In general >>>> there will be a single device per group. >>>> >>>> * It's up to the driver implementation to decide where to implement the >>>> PASID tables. For SMMU it's more convenient to have a single PASID table >>>> per domain. And I think the model fits better with the existing IOMMU >>>> API: IOVA traffic is shared by all devices in a domain, so should PASID >>>> traffic. >>> What's the meaning of "share PASID traffic"? PASID space is system-wide, >>> and a domain can have multiple iommu_process , so a domain can have multiple >>> PASIDs , one PASID for a iommu_process, right? > I get what your mean now, thanks for your explain. > >> >> Yes, I meant that if a device can access mappings for a specific PASID, >> then other devices in that same domain are also able to access them. >> >> A few reasons for this choice in the SMMU: >> * As all devices in an IOMMU group will be in the same domain and share >> the same PASID traffic, it encompasses that case. Groups are the smallest >> isolation granularity, then users are free to choose to put different >> IOMMU groups in different domains. >> * For architectures that can have both non-PASID and PASID traffic >> simultaneously, like the SMMU, it is simpler to reason about PASID tables >> being a domain, rather than sharing PASID0 within the domain and handling >> all others per device. >> * It's the same principle as non-PASID mappings (iommu_map/unmap is on a >> domain). >> * It implement the classic example of IOMMU architectures where multiple >> device descriptors point to the same PASID tables. >> * It may be desirable for drivers to share PASIDs within a domain, if they >> are actually using domains for conveniently sharing address spaces between >> devices. I'm not sure how much this is used as a feature. It does model a >> shared bus where each device can snoop DMA, so it may be useful. >> > > I get another question about this design, thinking about the following case: > > If a platform device with PASID ability, e.g. accelerator, which have multiple > accelerator process units(APUs), it may create multiple virtual devices, one > virtual device represent for an APU, with the same sid. > > They can be in different groups, however must be in the same domain as this > design, for domain held the PASID table, right? So how could they be used by > different guest OS? If they have the same SID, they will be in the same group as there will be a single entry in the SMMU stream table. Otherwise, if the virtual devices can be properly isolated from each other (in the same way as PCI SR-IOV), they will each have their own SID, can each be in a different IOMMU group and can be assigned to different guests. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 12, 2017 at 01:55:32PM +0100, Jean-Philippe Brucker wrote: > On 12/10/17 13:05, Yisheng Xie wrote: > [...] > >>>> * An iommu_process can be bound to multiple domains, and a domain can have > >>>> multiple iommu_process. > >>> when bind a task to device, can we create a single domain for it? I am thinking > >>> about process management without shared PT(for some device only support PASID > >>> without pri ability), it seems hard to expand if a domain have multiple iommu_process? > >>> Do you have any idea about this? > >> > >> A device always has to be in a domain, as far as I know. Not supporting > >> PRI forces you to pin down all user mappings (or just the ones you use for > >> DMA) but you should sill be able to share PT. Now if you don't support > >> shared PT either, but only PASID, then you'll have to use io-pgtable and a > >> new map/unmap API on an iommu_process. I don't understand your concern > >> though, how would the link between process and domains prevent this use-case? > >> > > So you mean that if an iommu_process bind to multiple devices it should create > > multiple io-pgtables? or just share the same io-pgtable? > > I don't know to be honest, I haven't thought much about the io-pgtable > case, I'm all about sharing the mm :) > > It really depends on what the user (GPU driver I assume) wants. I think > that if you're not sharing an mm with the device, then you're trying to > hide parts of the process to the device, so you'd also want the > flexibility of having different io-pgtables between devices. Different > devices accessing isolated parts of the process requires separate io-pgtables. In our specific Snapdragon use case the GPU is the only entity that cares about process specific io-pgtables. Everything else (display, video, camera) is happy using a global io-ptgable. The reasoning is that the GPU is programmable from user space and can be easily used to copy data whereas the other use cases have mostly fixed functions. Even if different devices did want to have a process specific io-pgtable I doubt we would share them. Every device uses the IOMMU differently and the magic needed to share a io-pgtable between (for example) a GPU and a DSP would be prohibitively complicated. Jordan
few nits below. > +/* > + * Allocate a iommu_process structure for the given task. > + * > + * Ideally we shouldn't need the domain parameter, since iommu_process is > + * system-wide, but we use it to retrieve the driver's allocation ops and a > + * PASID range. > + */ > +static struct iommu_process * > +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task) > +{ > + int err; > + int pasid; > + struct iommu_process *process; > + > + if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free)) > + return ERR_PTR(-ENODEV); > + > + process = domain->ops->process_alloc(task); > + if (IS_ERR(process)) > + return process; > + if (!process) > + return ERR_PTR(-ENOMEM); > + > + process->pid = get_task_pid(task, PIDTYPE_PID); > + process->release = domain->ops->process_free; > + INIT_LIST_HEAD(&process->domains); > + kref_init(&process->kref); > + nit, I think you should place this check right after the pid assignment. > + if (!process->pid) { > + err = -EINVAL; > + goto err_free_process; > + } > + > + idr_preload(GFP_KERNEL); > + spin_lock(&iommu_process_lock); > + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, > + domain->max_pasid + 1, GFP_ATOMIC); > + process->pasid = pasid; > + spin_unlock(&iommu_process_lock); > + idr_preload_end(); > + nit, You can maybe return here if pasid is not negative. > + if (pasid < 0) { > + err = pasid; > + goto err_put_pid; > + } > + > + return process; > + > +err_put_pid: > + put_pid(process->pid); > + > +err_free_process: > + domain->ops->process_free(process); > + > + return ERR_PTR(err); > +} > + > +static void iommu_process_release(struct kref *kref) > +{ > + struct iommu_process *process; > + void (*release)(struct iommu_process *); > + > + assert_spin_locked(&iommu_process_lock); > + > + process = container_of(kref, struct iommu_process, kref); if we are concerned about things going wrong (assert above), we should also add some pointer check here (WARN) for process and release pointers as well. > + release = process->release; > + > + WARN_ON(!list_empty(&process->domains)); > + > + idr_remove(&iommu_process_idr, process->pasid); > + put_pid(process->pid); > + release(process); > +} > + > +/* > + * Returns non-zero if a reference to the process was successfully taken. > + * Returns zero if the process is being freed and should not be used. > + */ > +static int iommu_process_get_locked(struct iommu_process *process) > +{ > + assert_spin_locked(&iommu_process_lock); > + > + if (process) > + return kref_get_unless_zero(&process->kref); > + > + return 0; > +} > + > +static void iommu_process_put_locked(struct iommu_process *process) > +{ > + assert_spin_locked(&iommu_process_lock); > + > + kref_put(&process->kref, iommu_process_release); > +} > + > +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev, > + struct iommu_process *process) > +{ > + int err; > + int pasid = process->pasid; > + struct iommu_context *context; > + > + if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach)) > + return -ENODEV; > + > + if (pasid > domain->max_pasid || pasid < domain->min_pasid) > + return -ENOSPC; > + > + context = kzalloc(sizeof(*context), GFP_KERNEL); > + if (!context) > + return -ENOMEM; > + devm_kzalloc maybe? > + context->process = process; > + context->domain = domain; > + refcount_set(&context->ref, 1); > + > + spin_lock(&iommu_process_lock); > + err = domain->ops->process_attach(domain, dev, process, true); > + if (err) { > + kfree(context); > + spin_unlock(&iommu_process_lock); > + return err; > + } > + > + list_add(&context->process_head, &process->domains); > + list_add(&context->domain_head, &domain->processes); > + spin_unlock(&iommu_process_lock); > + > + return 0; > +}
Just some improvement suggestions. On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote: > + spin_lock(&iommu_process_lock); > + idr_for_each_entry(&iommu_process_idr, process, i) { > + if (process->pid != pid) > + continue; if you see this construct a lot, this could be a for_each_iommu_process. > + > + if (!iommu_process_get_locked(process)) { > + /* Process is defunct, create a new one */ > + process = NULL; > + break; > + } > + > + /* Great, is it also bound to this domain? */ > + list_for_each_entry(cur_context, &process->domains, > + process_head) { > + if (cur_context->domain != domain) > + continue; if you see this construct a lot, this could be a for_each_iommu_process_domain. > + > + context = cur_context; > + *pasid = process->pasid; > + > + /* Splendid, tell the driver and increase the ref */ > + err = iommu_process_attach_locked(context, dev); > + if (err) > + iommu_process_put_locked(process); > + > + break; > + } > + break; > + } > + spin_unlock(&iommu_process_lock); > + put_pid(pid); > + > + if (context) > + return err; I think you should make the section above a independent function and return here when the context is found.
Hi Jean, > -----Original Message----- > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] > Sent: Friday, October 6, 2017 9:31 PM > To: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- > acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- > foundation.org > Cc: joro@8bytes.org; robh+dt@kernel.org; mark.rutland@arm.com; > catalin.marinas@arm.com; will.deacon@arm.com; lorenzo.pieralisi@arm.com; > hanjun.guo@linaro.org; sudeep.holla@arm.com; rjw@rjwysocki.net; > lenb@kernel.org; robin.murphy@arm.com; bhelgaas@google.com; > alex.williamson@redhat.com; tn@semihalf.com; liubo95@huawei.com; > thunder.leizhen@huawei.com; xieyisheng1@huawei.com; > gabriele.paoloni@huawei.com; nwatters@codeaurora.org; okaya@codeaurora.org; > rfranz@cavium.com; dwmw2@infradead.org; jacob.jun.pan@linux.intel.com; Liu, Yi > L <yi.l.liu@intel.com>; Raj, Ashok <ashok.raj@intel.com>; robdclark@gmail.com > Subject: [RFCv2 PATCH 01/36] iommu: Keep track of processes and PASIDs > > IOMMU drivers need a way to bind Linux processes to devices. This is used for > Shared Virtual Memory (SVM), where devices support paging. In that mode, DMA can > directly target virtual addresses of a process. > > Introduce boilerplate code for allocating process structures and binding them to > devices. Four operations are added to IOMMU drivers: > > * process_alloc, process_free: to create an iommu_process structure and > perform architecture-specific operations required to grab the process > (for instance on ARM SMMU, pin down the CPU ASID). There is a single > iommu_process structure per Linux process. > > * process_attach: attach a process to a device. The IOMMU driver checks > that the device is capable of sharing an address space with this > process, and writes the PASID table entry to install the process page > directory. > > Some IOMMU drivers (e.g. ARM SMMU and virtio-iommu) will have a single > PASID table per domain, for convenience. Other can implement it > differently but to help these drivers, process_attach and process_detach > take a 'first' or 'last' parameter telling whether they need to > install/remove the PASID entry or only send the required TLB > invalidations. > > * process_detach: detach a process from a device. The IOMMU driver removes > the PASID table entry and invalidates the IOTLBs. > > process_attach and process_detach operations are serialized with a spinlock. At the > moment it is global, but if we try to optimize it, the core should at least prevent > concurrent attach/detach on the same domain. > (so multi-level PASID table code can allocate tables lazily without having to go > through the io-pgtable concurrency nightmare). process_alloc can sleep, but > process_free must not (because we'll have to call it from > call_srcu.) > > At the moment we use an IDR for allocating PASIDs and retrieving contexts. > We also use a single spinlock. These can be refined and optimized later (a custom > allocator will be needed for top-down PASID allocation). > > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > --- > drivers/iommu/Kconfig | 10 ++ > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-process.c | 225 > ++++++++++++++++++++++++++++++++++++++++++ > drivers/iommu/iommu.c | 1 + > include/linux/iommu.h | 24 +++++ > 5 files changed, 261 insertions(+) > create mode 100644 drivers/iommu/iommu-process.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index > f3a21343e636..1ea5c90e37be 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -74,6 +74,16 @@ config IOMMU_DMA > select IOMMU_IOVA > select NEED_SG_DMA_LENGTH > > +config IOMMU_PROCESS > + bool "Process management API for the IOMMU" > + select IOMMU_API > + help > + Enable process management for the IOMMU API. In systems that support > + it, device drivers can bind processes to devices and share their page > + tables using this API. > + > + If unsure, say N here. > + > config FSL_PAMU > bool "Freescale IOMMU support" > depends on PCI > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index > b910aea813a1..a2832edbfaa2 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -1,6 +1,7 @@ > obj-$(CONFIG_IOMMU_API) += iommu.o > obj-$(CONFIG_IOMMU_API) += iommu-traces.o > obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o > +obj-$(CONFIG_IOMMU_PROCESS) += iommu-process.o > obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o > obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o diff --git > a/drivers/iommu/iommu-process.c b/drivers/iommu/iommu-process.c new file > mode 100644 index 000000000000..a7e5a1c94305 > --- /dev/null > +++ b/drivers/iommu/iommu-process.c > @@ -0,0 +1,225 @@ > +/* > + * Track processes bound to devices > + * > + * This program is free software; you can redistribute it and/or modify > +it > + * under the terms of the GNU General Public License version 2 as > +published > + * by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 > +USA > + * > + * Copyright (C) 2017 ARM Ltd. > + * > + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> */ > + > +#include <linux/idr.h> > +#include <linux/iommu.h> > +#include <linux/slab.h> > +#include <linux/spinlock.h> > + > +/* Link between a domain and a process */ struct iommu_context { > + struct iommu_process *process; > + struct iommu_domain *domain; > + > + struct list_head process_head; > + struct list_head domain_head; > + > + /* Number of devices that use this context */ > + refcount_t ref; > +}; > + > +/* > + * Because we're using an IDR, PASIDs are limited to 31 bits (the sign > +bit is > + * used for returning errors). In practice implementations will use at > +most 20 > + * bits, which is the PCI limit. > + */ > +static DEFINE_IDR(iommu_process_idr); > + > +/* > + * For the moment this is an all-purpose lock. It serializes > + * access/modifications to contexts (process-domain links), > +access/modifications > + * to the PASID IDR, and changes to process refcount as well. > + */ > +static DEFINE_SPINLOCK(iommu_process_lock); > + > +/* > + * Allocate a iommu_process structure for the given task. > + * > + * Ideally we shouldn't need the domain parameter, since iommu_process > +is > + * system-wide, but we use it to retrieve the driver's allocation ops > +and a > + * PASID range. > + */ > +static struct iommu_process * > +iommu_process_alloc(struct iommu_domain *domain, struct task_struct > +*task) { > + int err; > + int pasid; > + struct iommu_process *process; > + > + if (WARN_ON(!domain->ops->process_alloc || !domain->ops- > >process_free)) > + return ERR_PTR(-ENODEV); > + > + process = domain->ops->process_alloc(task); > + if (IS_ERR(process)) > + return process; > + if (!process) > + return ERR_PTR(-ENOMEM); > + > + process->pid = get_task_pid(task, PIDTYPE_PID); > + process->release = domain->ops->process_free; > + INIT_LIST_HEAD(&process->domains); > + kref_init(&process->kref); > + > + if (!process->pid) { > + err = -EINVAL; > + goto err_free_process; > + } > + > + idr_preload(GFP_KERNEL); > + spin_lock(&iommu_process_lock); > + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, > + domain->max_pasid + 1, GFP_ATOMIC); > + process->pasid = pasid; [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu layer instead of vendor iommu driver? Is there strong reason here? I think pasid management may be better within vendor iommu driver as pasid management could differ from vendor to vendor. Regards, Yi L -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 23/10/17 12:04, Liu, Yi L wrote: >> + idr_preload(GFP_KERNEL); >> + spin_lock(&iommu_process_lock); >> + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, >> + domain->max_pasid + 1, GFP_ATOMIC); >> + process->pasid = pasid; > > [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu > layer instead of vendor iommu driver? Is there strong reason here? I think pasid > management may be better within vendor iommu driver as pasid management > could differ from vendor to vendor. But that's the thing, we're trying to abstract PASID and process management to have it in the core, because there shouldn't be many differences from vendor to vendor. This way we have the allocation code in one place and vendor drivers don't have to copy paste it from other drivers. It's just a global number within a range, so I don't think vendors will have many different ways of designing it. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote: > On 23/10/17 12:04, Liu, Yi L wrote: > >> + idr_preload(GFP_KERNEL); > >> + spin_lock(&iommu_process_lock); > >> + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, > >> + domain->max_pasid + 1, GFP_ATOMIC); > >> + process->pasid = pasid; > > > > [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu > > layer instead of vendor iommu driver? Is there strong reason here? I think pasid > > management may be better within vendor iommu driver as pasid management > > could differ from vendor to vendor. > > But that's the thing, we're trying to abstract PASID and process > management to have it in the core, because there shouldn't be many > differences from vendor to vendor. This way we have the allocation code in > one place and vendor drivers don't have to copy paste it from other drivers. I think this makes sense for the native case and also in the vIOMMU if the PASID tables and allocation are completely managed by the guest. If the vIOMMU requires any co-ordination in how the PASID's are allocated for guest devices there might need to be some control on how these are allocated that ultimately need to be managed by VMM/Physical IOMMU. For instance if the PASID space is sparse for e.g if we make the PASID allocation as one of the ops, the IOMMU implementation will choose the default function, or if it choose a differnt mechanism it would have that flexibility. Does this make sense? Cheers, Ashok > > It's just a global number within a range, so I don't think vendors will > have many different ways of designing it. > > Thanks, > Jean > > > > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25/10/17 19:05, Raj, Ashok wrote: > Hi Jean > > On Mon, Oct 23, 2017 at 01:17:07PM +0100, Jean-Philippe Brucker wrote: >> On 23/10/17 12:04, Liu, Yi L wrote: >>>> + idr_preload(GFP_KERNEL); >>>> + spin_lock(&iommu_process_lock); >>>> + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, >>>> + domain->max_pasid + 1, GFP_ATOMIC); >>>> + process->pasid = pasid; >>> >>> [Liu, Yi L] If I'm understanding well, here is managing the pasid allocation in iommu >>> layer instead of vendor iommu driver? Is there strong reason here? I think pasid >>> management may be better within vendor iommu driver as pasid management >>> could differ from vendor to vendor. >> >> But that's the thing, we're trying to abstract PASID and process >> management to have it in the core, because there shouldn't be many >> differences from vendor to vendor. This way we have the allocation code in >> one place and vendor drivers don't have to copy paste it from other drivers. > > I think this makes sense for the native case and also in the vIOMMU > if the PASID tables and allocation are completely managed by the guest. > > If the vIOMMU requires any co-ordination in how the PASID's are allocated > for guest devices there might need to be some control on how these are > allocated that ultimately need to be managed by VMM/Physical IOMMU. For > instance if the PASID space is sparse for e.g (I don't have your example) > if we make the PASID allocation as one of the ops, the IOMMU implementation > will choose the default function, or if it choose a differnt mechanism it would > have that flexibility. > > Does this make sense? If the PASID space is sparse, maybe we can add a firmware or probe mechanism to declare reserved PASIDs, like we have for reserved IOVAs, that feeds into the core IOMMU driver. But I agree that we can always let vendor drivers implement their own allocator if they need one in the future. For the moment it can stay generic. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sinan, Sorry for the delay, thanks for reviewing this! On 21/10/17 00:32, Sinan Kaya wrote: > few nits below. > >> +/* >> + * Allocate a iommu_process structure for the given task. >> + * >> + * Ideally we shouldn't need the domain parameter, since iommu_process is >> + * system-wide, but we use it to retrieve the driver's allocation ops and a >> + * PASID range. >> + */ >> +static struct iommu_process * >> +iommu_process_alloc(struct iommu_domain *domain, struct task_struct *task) >> +{ >> + int err; >> + int pasid; >> + struct iommu_process *process; >> + >> + if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free)) >> + return ERR_PTR(-ENODEV); >> + >> + process = domain->ops->process_alloc(task); >> + if (IS_ERR(process)) >> + return process; >> + if (!process) >> + return ERR_PTR(-ENOMEM); >> + >> + process->pid = get_task_pid(task, PIDTYPE_PID); >> + process->release = domain->ops->process_free; >> + INIT_LIST_HEAD(&process->domains); >> + kref_init(&process->kref); >> + > nit, I think you should place this check right after the pid assignment. Sure >> + if (!process->pid) { >> + err = -EINVAL; >> + goto err_free_process; >> + } >> + >> + idr_preload(GFP_KERNEL); >> + spin_lock(&iommu_process_lock); >> + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, >> + domain->max_pasid + 1, GFP_ATOMIC); >> + process->pasid = pasid; >> + spin_unlock(&iommu_process_lock); >> + idr_preload_end(); >> + > > nit, You can maybe return here if pasid is not negative. Ok >> + if (pasid < 0) { >> + err = pasid; >> + goto err_put_pid; >> + } >> + >> + return process; >> + >> +err_put_pid: >> + put_pid(process->pid); >> + >> +err_free_process: >> + domain->ops->process_free(process); >> + >> + return ERR_PTR(err); >> +} >> + >> +static void iommu_process_release(struct kref *kref) >> +{ >> + struct iommu_process *process; >> + void (*release)(struct iommu_process *); >> + >> + assert_spin_locked(&iommu_process_lock); >> + >> + process = container_of(kref, struct iommu_process, kref); > > if we are concerned about things going wrong (assert above), we should > also add some pointer check here (WARN) for process and release pointers as well. We can probably get rid of this assert, as any external users releasing the process will have to go through iommu_process_put() which takes the lock. process_alloc() ensures that release isn't NULL, and process should always be valid here since we're being called from kref_put, but I should check the value in process_put. >> + release = process->release; >> + >> + WARN_ON(!list_empty(&process->domains)); >> + >> + idr_remove(&iommu_process_idr, process->pasid); >> + put_pid(process->pid); >> + release(process); >> +} >> + >> +/* >> + * Returns non-zero if a reference to the process was successfully taken. >> + * Returns zero if the process is being freed and should not be used. >> + */ >> +static int iommu_process_get_locked(struct iommu_process *process) >> +{ >> + assert_spin_locked(&iommu_process_lock); >> + >> + if (process) >> + return kref_get_unless_zero(&process->kref); >> + >> + return 0; >> +} >> + >> +static void iommu_process_put_locked(struct iommu_process *process) >> +{ >> + assert_spin_locked(&iommu_process_lock); >> + >> + kref_put(&process->kref, iommu_process_release); >> +} >> + >> +static int iommu_process_attach(struct iommu_domain *domain, struct device *dev, >> + struct iommu_process *process) >> +{ >> + int err; >> + int pasid = process->pasid; >> + struct iommu_context *context; >> + >> + if (WARN_ON(!domain->ops->process_attach || !domain->ops->process_detach)) >> + return -ENODEV; >> + >> + if (pasid > domain->max_pasid || pasid < domain->min_pasid) >> + return -ENOSPC; >> + >> + context = kzalloc(sizeof(*context), GFP_KERNEL); >> + if (!context) >> + return -ENOMEM; >> + > > devm_kzalloc maybe? I don't think we can ever leak contexts. Before the device is released, it has to detach() from the domain, which will unbind from any process (call unbind_dev_all()). Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21/10/17 16:47, Sinan Kaya wrote: > Just some improvement suggestions. > > On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote: >> + spin_lock(&iommu_process_lock); >> + idr_for_each_entry(&iommu_process_idr, process, i) { >> + if (process->pid != pid) >> + continue; > if you see this construct a lot, this could be a for_each_iommu_process. > >> + >> + if (!iommu_process_get_locked(process)) { >> + /* Process is defunct, create a new one */ >> + process = NULL; >> + break; >> + } >> + >> + /* Great, is it also bound to this domain? */ >> + list_for_each_entry(cur_context, &process->domains, >> + process_head) { >> + if (cur_context->domain != domain) >> + continue; > if you see this construct a lot, this could be a for_each_iommu_process_domain. > >> + >> + context = cur_context; >> + *pasid = process->pasid; >> + >> + /* Splendid, tell the driver and increase the ref */ >> + err = iommu_process_attach_locked(context, dev); >> + if (err) >> + iommu_process_put_locked(process); >> + >> + break; >> + } >> + break; >> + } >> + spin_unlock(&iommu_process_lock); >> + put_pid(pid); >> + >> + if (context) >> + return err; > > I think you should make the section above a independent function and return here when the > context is found. Hopefully this code will only be needed for bind(), but moving it to a separate function should look better. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote: > > >> -----Original Message----- >> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com] >> Sent: Thursday, November 02, 2017 3:52 PM >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> >> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- >> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- >> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A) >> <xieyisheng1@huawei.com>; Gabriele Paoloni >> <gabriele.paoloni@huawei.com>; Catalin Marinas >> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>; >> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi >> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com; >> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org; >> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com; >> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>; >> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU) >> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com; >> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin >> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm >> <linuxarm@huawei.com> >> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for >> Substream IDs >> >> Hi Shameer, >> >> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote: >>> We had a go with this series on HiSIlicon D05 platform which doesn't have >>> support for ssids/ATS/PRI, to make sure it generally works. >>> >>> But observed the below crash on boot, >>> >>> [ 16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 >> __alloc_pages_nodemask+0x19c/0xc48 >>> [ 16.026797] Modules linked in: >>> [ 16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1- >> 159539-ge42aca3 #236 >>> [...] >>> [ 16.068206] Workqueue: events deferred_probe_work_func >>> [ 16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000 >>> [ 16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48 >>> [ 16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48 >>> [ 16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48 >>> [ 16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc >>> [ 16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38 >>> [ 16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190 >>> [ 16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204 >>> [ 16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0 >>> [ 16.539575] [<ffff000008568884>] >> arm_smmu_domain_finalise_s1+0x60/0x248 >>> [ 16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300 >>> [ 16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c >>> [ 16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4 >>> [ 16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8 >>> [ 16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418 >>> [ 16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c >>> [ 16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70 >>> [ 16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4 >>> [ 16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc >>> [ 16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94 >>> [ 16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c >>> [ 16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18 >>> [ 16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98 >>> >>> After a bit of debug it looks like on platforms where ssid is not supported, >>> s1_cfg.num_contexts is set to zero and it eventually results in this crash >>> in, >>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()--> >>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero. >>> >>> With the below fix, it works on D05 now, >>> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>> index 8ad90e2..51f5821 100644 >>> --- a/drivers/iommu/arm-smmu-v3.c >>> +++ b/drivers/iommu/arm-smmu-v3.c >>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct >> iommu_domain *domain, >>> domain->min_pasid = 1; >>> domain->max_pasid = master->num_ssids - 1; >>> smmu_domain->s1_cfg.num_contexts = master->num_ssids; >>> + } else { >>> + smmu_domain->s1_cfg.num_contexts = 1; >>> } >>> + >>> smmu_domain->s1_cfg.can_stall = master->ste.can_stall; >>> break; >>> case ARM_SMMU_DOMAIN_NESTED: >>> >>> >>> I am not sure this is right place do this. Please take a look. >> >> Thanks for testing the series and reporting the bug. I added the >> following patch to branch svm/current, does it work for you? > > Yes, it does. > > Thanks, > Shameer > >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index 42c8378624ed..edda466adc81 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev) >> } >> } >> >> - if (smmu->ssid_bits) >> - master->num_ssids = 1 << min(smmu->ssid_bits, >> - fwspec->num_pasid_bits); >> + master->num_ssids = 1 << min(smmu->ssid_bits, fwspec- >>> num_pasid_bits); If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ? It seems Shameerali's fix is better ? >> >> if (fwspec->can_stall && smmu->features & ARM_SMMU_FEAT_STALLS) { >> master->can_fault = true; > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/11/17 05:45, Yisheng Xie wrote: > Hi Jean, > > On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote: >> >> >>> -----Original Message----- >>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com] >>> Sent: Thursday, November 02, 2017 3:52 PM >>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> >>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- >>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- >>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A) >>> <xieyisheng1@huawei.com>; Gabriele Paoloni >>> <gabriele.paoloni@huawei.com>; Catalin Marinas >>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>; >>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi >>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com; >>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org; >>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com; >>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>; >>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU) >>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com; >>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin >>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm >>> <linuxarm@huawei.com> >>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for >>> Substream IDs >>> >>> Hi Shameer, >>> >>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote: >>>> We had a go with this series on HiSIlicon D05 platform which doesn't have >>>> support for ssids/ATS/PRI, to make sure it generally works. >>>> >>>> But observed the below crash on boot, >>>> >>>> [ 16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 >>> __alloc_pages_nodemask+0x19c/0xc48 >>>> [ 16.026797] Modules linked in: >>>> [ 16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1- >>> 159539-ge42aca3 #236 >>>> [...] >>>> [ 16.068206] Workqueue: events deferred_probe_work_func >>>> [ 16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000 >>>> [ 16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48 >>>> [ 16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48 >>>> [ 16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48 >>>> [ 16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc >>>> [ 16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38 >>>> [ 16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190 >>>> [ 16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204 >>>> [ 16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0 >>>> [ 16.539575] [<ffff000008568884>] >>> arm_smmu_domain_finalise_s1+0x60/0x248 >>>> [ 16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300 >>>> [ 16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c >>>> [ 16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4 >>>> [ 16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8 >>>> [ 16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418 >>>> [ 16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c >>>> [ 16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70 >>>> [ 16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4 >>>> [ 16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc >>>> [ 16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94 >>>> [ 16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c >>>> [ 16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18 >>>> [ 16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98 >>>> >>>> After a bit of debug it looks like on platforms where ssid is not supported, >>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash >>>> in, >>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()--> >>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero. >>>> >>>> With the below fix, it works on D05 now, >>>> >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>>> index 8ad90e2..51f5821 100644 >>>> --- a/drivers/iommu/arm-smmu-v3.c >>>> +++ b/drivers/iommu/arm-smmu-v3.c >>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct >>> iommu_domain *domain, >>>> domain->min_pasid = 1; >>>> domain->max_pasid = master->num_ssids - 1; >>>> smmu_domain->s1_cfg.num_contexts = master->num_ssids; >>>> + } else { >>>> + smmu_domain->s1_cfg.num_contexts = 1; >>>> } >>>> + >>>> smmu_domain->s1_cfg.can_stall = master->ste.can_stall; >>>> break; >>>> case ARM_SMMU_DOMAIN_NESTED: >>>> >>>> >>>> I am not sure this is right place do this. Please take a look. >>> >>> Thanks for testing the series and reporting the bug. I added the >>> following patch to branch svm/current, does it work for you? >> >> Yes, it does. >> >> Thanks, >> Shameer >> >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>> index 42c8378624ed..edda466adc81 100644 >>> --- a/drivers/iommu/arm-smmu-v3.c >>> +++ b/drivers/iommu/arm-smmu-v3.c >>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev) >>> } >>> } >>> >>> - if (smmu->ssid_bits) >>> - master->num_ssids = 1 << min(smmu->ssid_bits, >>> - fwspec->num_pasid_bits); >>> + master->num_ssids = 1 << min(smmu->ssid_bits, fwspec- >>>> num_pasid_bits); > > If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ? Yes, the context table allocator always needs to allocate at least one entry, even if the master or SMMU doesn't support SSID. I think an earlier version called this field "num_contexts", maybe we should got back to that name for clarity? Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com] > Sent: Friday, November 03, 2017 9:37 AM > To: xieyisheng (A) <xieyisheng1@huawei.com>; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com> > Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- > acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- > foundation.org; Mark Rutland <Mark.Rutland@arm.com>; Gabriele Paoloni > <gabriele.paoloni@huawei.com>; Catalin Marinas > <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>; > okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi > <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com; > joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org; > jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com; > robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>; > bhelgaas@google.com; dwmw2@infradead.org; liubo (CU) > <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com; > hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin > Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm > <linuxarm@huawei.com> > Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for > Substream IDs > > On 03/11/17 05:45, Yisheng Xie wrote: > > Hi Jean, > > > > On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote: > >> > >> > >>> -----Original Message----- > >>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com] > >>> Sent: Thursday, November 02, 2017 3:52 PM > >>> To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com> > >>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- > >>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- > >>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A) > >>> <xieyisheng1@huawei.com>; Gabriele Paoloni > >>> <gabriele.paoloni@huawei.com>; Catalin Marinas > >>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>; > >>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi > >>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com; > >>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org; > >>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com; > >>> robh+dt@kernel.org; Leizhen (ThunderTown) > <thunder.leizhen@huawei.com>; > >>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU) > >>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com; > >>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin > >>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm > >>> <linuxarm@huawei.com> > >>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for > >>> Substream IDs > >>> > >>> Hi Shameer, > >>> > >>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi > wrote: > >>>> We had a go with this series on HiSIlicon D05 platform which doesn't have > >>>> support for ssids/ATS/PRI, to make sure it generally works. > >>>> > >>>> But observed the below crash on boot, > >>>> > >>>> [ 16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 > >>> __alloc_pages_nodemask+0x19c/0xc48 > >>>> [ 16.026797] Modules linked in: > >>>> [ 16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0- > rc1- > >>> 159539-ge42aca3 #236 > >>>> [...] > >>>> [ 16.068206] Workqueue: events deferred_probe_work_func > >>>> [ 16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000 > >>>> [ 16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48 > >>>> [ 16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48 > >>>> [ 16.469220] [<ffff000008186b94>] > __alloc_pages_nodemask+0x19c/0xc48 > >>>> [ 16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc > >>>> [ 16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38 > >>>> [ 16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190 > >>>> [ 16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204 > >>>> [ 16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0 > >>>> [ 16.539575] [<ffff000008568884>] > >>> arm_smmu_domain_finalise_s1+0x60/0x248 > >>>> [ 16.552909] [<ffff00000856c104>] > arm_smmu_attach_dev+0x264/0x300 > >>>> [ 16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c > >>>> [ 16.577117] [<ffff00000855e698>] > iommu_group_add_device+0x144/0x3a4 > >>>> [ 16.589746] [<ffff00000855ed18>] > iommu_group_get_for_dev+0x70/0xf8 > >>>> [ 16.602201] [<ffff00000856a314>] > arm_smmu_add_device+0x1a4/0x418 > >>>> [ 16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c > >>>> [ 16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70 > >>>> [ 16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4 > >>>> [ 16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc > >>>> [ 16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94 > >>>> [ 16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c > >>>> [ 16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18 > >>>> [ 16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98 > >>>> > >>>> After a bit of debug it looks like on platforms where ssid is not supported, > >>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash > >>>> in, > >>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()--> > >>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero. > >>>> > >>>> With the below fix, it works on D05 now, > >>>> > >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu- > v3.c > >>>> index 8ad90e2..51f5821 100644 > >>>> --- a/drivers/iommu/arm-smmu-v3.c > >>>> +++ b/drivers/iommu/arm-smmu-v3.c > >>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct > >>> iommu_domain *domain, > >>>> domain->min_pasid = 1; > >>>> domain->max_pasid = master->num_ssids - 1; > >>>> smmu_domain->s1_cfg.num_contexts = master- > >num_ssids; > >>>> + } else { > >>>> + smmu_domain->s1_cfg.num_contexts = 1; > >>>> } > >>>> + > >>>> smmu_domain->s1_cfg.can_stall = master->ste.can_stall; > >>>> break; > >>>> case ARM_SMMU_DOMAIN_NESTED: > >>>> > >>>> > >>>> I am not sure this is right place do this. Please take a look. > >>> > >>> Thanks for testing the series and reporting the bug. I added the > >>> following patch to branch svm/current, does it work for you? > >> > >> Yes, it does. > >> > >> Thanks, > >> Shameer > >> > >>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu- > v3.c > >>> index 42c8378624ed..edda466adc81 100644 > >>> --- a/drivers/iommu/arm-smmu-v3.c > >>> +++ b/drivers/iommu/arm-smmu-v3.c > >>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device > *dev) > >>> } > >>> } > >>> > >>> - if (smmu->ssid_bits) > >>> - master->num_ssids = 1 << min(smmu->ssid_bits, > >>> - fwspec->num_pasid_bits); > >>> + master->num_ssids = 1 << min(smmu->ssid_bits, fwspec- > >>>> num_pasid_bits); > > > > If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ? > > Yes, the context table allocator always needs to allocate at least one > entry, even if the master or SMMU doesn't support SSID. I think an earlier > version called this field "num_contexts", maybe we should got back to that > name for clarity? +1 for that. As ssid can be zero as per the spec, num_ssids=1 will be slightly misleading. Thanks, Shameer
On 2017/11/3 17:37, Jean-Philippe Brucker wrote: > On 03/11/17 05:45, Yisheng Xie wrote: >> Hi Jean, >> >> On 2017/11/3 1:02, Shameerali Kolothum Thodi wrote: >>> >>> >>>> -----Original Message----- >>>> From: Jean-Philippe Brucker [mailto:Jean-Philippe.Brucker@arm.com] >>>> Sent: Thursday, November 02, 2017 3:52 PM >>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> >>>> Cc: linux-arm-kernel@lists.infradead.org; linux-pci@vger.kernel.org; linux- >>>> acpi@vger.kernel.org; devicetree@vger.kernel.org; iommu@lists.linux- >>>> foundation.org; Mark Rutland <Mark.Rutland@arm.com>; xieyisheng (A) >>>> <xieyisheng1@huawei.com>; Gabriele Paoloni >>>> <gabriele.paoloni@huawei.com>; Catalin Marinas >>>> <Catalin.Marinas@arm.com>; Will Deacon <Will.Deacon@arm.com>; >>>> okaya@codeaurora.org; yi.l.liu@intel.com; Lorenzo Pieralisi >>>> <Lorenzo.Pieralisi@arm.com>; ashok.raj@intel.com; tn@semihalf.com; >>>> joro@8bytes.org; rfranz@cavium.com; lenb@kernel.org; >>>> jacob.jun.pan@linux.intel.com; alex.williamson@redhat.com; >>>> robh+dt@kernel.org; Leizhen (ThunderTown) <thunder.leizhen@huawei.com>; >>>> bhelgaas@google.com; dwmw2@infradead.org; liubo (CU) >>>> <liubo95@huawei.com>; rjw@rjwysocki.net; robdclark@gmail.com; >>>> hanjun.guo@linaro.org; Sudeep Holla <Sudeep.Holla@arm.com>; Robin >>>> Murphy <Robin.Murphy@arm.com>; nwatters@codeaurora.org; Linuxarm >>>> <linuxarm@huawei.com> >>>> Subject: Re: [RFCv2 PATCH 14/36] iommu/arm-smmu-v3: Add support for >>>> Substream IDs >>>> >>>> Hi Shameer, >>>> >>>> On Thu, Nov 02, 2017 at 12:49:32PM +0000, Shameerali Kolothum Thodi wrote: >>>>> We had a go with this series on HiSIlicon D05 platform which doesn't have >>>>> support for ssids/ATS/PRI, to make sure it generally works. >>>>> >>>>> But observed the below crash on boot, >>>>> >>>>> [ 16.009084] WARNING: CPU: 59 PID: 391 at mm/page_alloc.c:3883 >>>> __alloc_pages_nodemask+0x19c/0xc48 >>>>> [ 16.026797] Modules linked in: >>>>> [ 16.032944] CPU: 59 PID: 391 Comm: kworker/59:1 Not tainted 4.14.0-rc1- >>>> 159539-ge42aca3 #236 >>>>> [...] >>>>> [ 16.068206] Workqueue: events deferred_probe_work_func >>>>> [ 16.078557] task: ffff8017d38a0000 task.stack: ffff00000b198000 >>>>> [ 16.090486] PC is at __alloc_pages_nodemask+0x19c/0xc48 >>>>> [ 16.101013] LR is at __alloc_pages_nodemask+0xe0/0xc48 >>>>> [ 16.469220] [<ffff000008186b94>] __alloc_pages_nodemask+0x19c/0xc48 >>>>> [ 16.481854] [<ffff0000081d65b0>] alloc_pages_current+0x80/0xcc >>>>> [ 16.493607] [<ffff000008182be8>] __get_free_pages+0xc/0x38 >>>>> [ 16.504661] [<ffff0000083c4d58>] swiotlb_alloc_coherent+0x64/0x190 >>>>> [ 16.517117] [<ffff00000809824c>] __dma_alloc+0x110/0x204 >>>>> [ 16.527820] [<ffff00000858e850>] dmam_alloc_coherent+0x88/0xf0 >>>>> [ 16.539575] [<ffff000008568884>] >>>> arm_smmu_domain_finalise_s1+0x60/0x248 >>>>> [ 16.552909] [<ffff00000856c104>] arm_smmu_attach_dev+0x264/0x300 >>>>> [ 16.565013] [<ffff00000855d40c>] __iommu_attach_device+0x48/0x5c >>>>> [ 16.577117] [<ffff00000855e698>] iommu_group_add_device+0x144/0x3a4 >>>>> [ 16.589746] [<ffff00000855ed18>] iommu_group_get_for_dev+0x70/0xf8 >>>>> [ 16.602201] [<ffff00000856a314>] arm_smmu_add_device+0x1a4/0x418 >>>>> [ 16.614308] [<ffff00000849dfcc>] iort_iommu_configure+0xf0/0x16c >>>>> [ 16.626416] [<ffff000008468c50>] acpi_dma_configure+0x30/0x70 >>>>> [ 16.637994] [<ffff00000858f00c>] dma_configure+0xa8/0xd4 >>>>> [ 16.648695] [<ffff00000857706c>] driver_probe_device+0x1a4/0x2dc >>>>> [ 16.673081] [<ffff0000085752c8>] bus_for_each_drv+0x54/0x94 >>>>> [ 16.684307] [<ffff000008576db0>] __device_attach+0xc4/0x12c >>>>> [ 16.695533] [<ffff000008577350>] device_initial_probe+0x10/0x18 >>>>> [ 16.707462] [<ffff0000085762b4>] bus_probe_device+0x90/0x98 >>>>> >>>>> After a bit of debug it looks like on platforms where ssid is not supported, >>>>> s1_cfg.num_contexts is set to zero and it eventually results in this crash >>>>> in, >>>>> arm_smmu_domain_finalise_s1() -->arm_smmu_alloc_cd_tables()--> >>>>> arm_smmu_alloc_cd_leaf_table() as num_leaf_entries is zero. >>>>> >>>>> With the below fix, it works on D05 now, >>>>> >>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>>>> index 8ad90e2..51f5821 100644 >>>>> --- a/drivers/iommu/arm-smmu-v3.c >>>>> +++ b/drivers/iommu/arm-smmu-v3.c >>>>> @@ -2433,7 +2433,10 @@ static int arm_smmu_domain_finalise(struct >>>> iommu_domain *domain, >>>>> domain->min_pasid = 1; >>>>> domain->max_pasid = master->num_ssids - 1; >>>>> smmu_domain->s1_cfg.num_contexts = master->num_ssids; >>>>> + } else { >>>>> + smmu_domain->s1_cfg.num_contexts = 1; >>>>> } >>>>> + >>>>> smmu_domain->s1_cfg.can_stall = master->ste.can_stall; >>>>> break; >>>>> case ARM_SMMU_DOMAIN_NESTED: >>>>> >>>>> >>>>> I am not sure this is right place do this. Please take a look. >>>> >>>> Thanks for testing the series and reporting the bug. I added the >>>> following patch to branch svm/current, does it work for you? >>> >>> Yes, it does. >>> >>> Thanks, >>> Shameer >>> >>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >>>> index 42c8378624ed..edda466adc81 100644 >>>> --- a/drivers/iommu/arm-smmu-v3.c >>>> +++ b/drivers/iommu/arm-smmu-v3.c >>>> @@ -3169,9 +3169,7 @@ static int arm_smmu_add_device(struct device *dev) >>>> } >>>> } >>>> >>>> - if (smmu->ssid_bits) >>>> - master->num_ssids = 1 << min(smmu->ssid_bits, >>>> - fwspec->num_pasid_bits); >>>> + master->num_ssids = 1 << min(smmu->ssid_bits, fwspec- >>>>> num_pasid_bits); >> >> If fwspec->num_pasid_bits = 0, then master have _one_ num_ssids ? > > Yes, the context table allocator always needs to allocate at least one > entry, even if the master or SMMU doesn't support SSID. I think an earlier > version called this field "num_contexts", maybe we should got back to that > name for clarity? > Yes, it will be more clear. Thanks Yisheng > Thanks, > Jean > > . > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/10/12 20:55, Jean-Philippe Brucker wrote: > On 12/10/17 13:05, Yisheng Xie wrote: > [...] >>>>> * An iommu_process can be bound to multiple domains, and a domain can have >>>>> multiple iommu_process. >>>> when bind a task to device, can we create a single domain for it? I am thinking >>>> about process management without shared PT(for some device only support PASID >>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process? >>>> Do you have any idea about this? >>> >>> A device always has to be in a domain, as far as I know. Not supporting >>> PRI forces you to pin down all user mappings (or just the ones you use for >>> DMA) but you should sill be able to share PT. Now if you don't support >>> shared PT either, but only PASID, then you'll have to use io-pgtable and a >>> new map/unmap API on an iommu_process. I don't understand your concern >>> though, how would the link between process and domains prevent this use-case? >>> >> So you mean that if an iommu_process bind to multiple devices it should create >> multiple io-pgtables? or just share the same io-pgtable? > > I don't know to be honest, I haven't thought much about the io-pgtable > case, I'm all about sharing the mm :) > Sorry to get back to this thread, but traditional DMA_MAP use case may also want to enable Substreamid/PASID. As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable. We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all io-pgtables into a single domain or iommu_process. Thanks, Liubo > It really depends on what the user (GPU driver I assume) wants. I think > that if you're not sharing an mm with the device, then you're trying to > hide parts of the process to the device, so you'd also want the > flexibility of having different io-pgtables between devices. Different > devices accessing isolated parts of the process requires separate io-pgtables. > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Liubo, On 08/11/17 01:21, Bob Liu wrote: > Hi Jean, > > On 2017/10/12 20:55, Jean-Philippe Brucker wrote: >> On 12/10/17 13:05, Yisheng Xie wrote: >> [...] >>>>>> * An iommu_process can be bound to multiple domains, and a domain can have >>>>>> multiple iommu_process. >>>>> when bind a task to device, can we create a single domain for it? I am thinking >>>>> about process management without shared PT(for some device only support PASID >>>>> without pri ability), it seems hard to expand if a domain have multiple iommu_process? >>>>> Do you have any idea about this? >>>> >>>> A device always has to be in a domain, as far as I know. Not supporting >>>> PRI forces you to pin down all user mappings (or just the ones you use for >>>> DMA) but you should sill be able to share PT. Now if you don't support >>>> shared PT either, but only PASID, then you'll have to use io-pgtable and a >>>> new map/unmap API on an iommu_process. I don't understand your concern >>>> though, how would the link between process and domains prevent this use-case? >>>> >>> So you mean that if an iommu_process bind to multiple devices it should create >>> multiple io-pgtables? or just share the same io-pgtable? >> >> I don't know to be honest, I haven't thought much about the io-pgtable >> case, I'm all about sharing the mm :) >> > > Sorry to get back to this thread, but traditional DMA_MAP use case may also want to > enable Substreamid/PASID. > As a general framework, you may also consider SubStreamid/Pasid support for dma map/io-pgtable. > > We're considering make io-pgtables per SubStreamid/Pasid, but haven't decide put all > io-pgtables into a single domain or iommu_process. Yes they should be in a single domain, see also my other reply here: http://www.spinics.net/lists/arm-kernel/msg613586.html I've only been thinking about the IOMMU API for the moment, but I guess the VFIO API would use this extension? I suppose it would be a new PASID field to DMA_MAP along with a flag. The PASID would probably be allocated with BIND + some special flag. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, +static struct iommu_process * +iommu_process_alloc(struct iommu_domain *domain, struct task_struct +*task) { + int err; + int pasid; + struct iommu_process *process; + + if (WARN_ON(!domain->ops->process_alloc || !domain->ops->process_free)) + return ERR_PTR(-ENODEV); + + process = domain->ops->process_alloc(task); + if (IS_ERR(process)) + return process; + if (!process) + return ERR_PTR(-ENOMEM); + + process->pid = get_task_pid(task, PIDTYPE_PID); + process->release = domain->ops->process_free; + INIT_LIST_HEAD(&process->domains); + kref_init(&process->kref); + + if (!process->pid) { + err = -EINVAL; + goto err_free_process; + } + + idr_preload(GFP_KERNEL); + spin_lock(&iommu_process_lock); + pasid = idr_alloc_cyclic(&iommu_process_idr, process, domain->min_pasid, + domain->max_pasid + 1, GFP_ATOMIC); If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0. When idr_alloc_cyclic is called it invokes idr_get_free_cmn function where we have following condition. (Based on kernel 4.14-rc6) if (!radix_tree_tagged(root, IDR_FREE)) start = max(start, maxindex + 1); if (start > max) return ERR_PTR(-ENOSPC); Here max is being assigned zero by the time this function is invoked, this value is based on domain->max_pasid. This condition fails and ENOSPC is returned. In this case even though hardware supports PASID, BIND flow fails. Any reason why pasid allocation moved to idr allocations rather than bitmap allocations as in v1 patches ? + process->pasid = pasid; Regards, Bharat -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > Hook process operations to support PASID and page table sharing with the > SMMUv3: > > + > +static void arm_smmu_process_exit(struct iommu_domain *domain, > + struct iommu_process *process) > +{ > + struct arm_smmu_master_data *master; > + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + > + if (!domain->process_exit) > + return; If domain do not set process_exit handler, just return? smmu do not need invalid ATC, clear cd entry, etc.? Maybe you should check when call domain->process_exit? > + > + spin_lock(&smmu_domain->devices_lock); > + list_for_each_entry(master, &smmu_domain->devices, list) { > + if (!master->processes) > + continue; > + > + master->processes--; Add if (domain->process_exit) here? > + domain->process_exit(domain, master->dev, process->pasid, > + domain->process_exit_token); > + > + /* TODO: inval ATC */ > + } > + spin_unlock(&smmu_domain->devices_lock); > + > + arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL); > + > + /* TODO: Invalidate all mappings if not DVM */ > +} > + Thanks Yisheng Xie -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On 09/11/17 03:32, Yisheng Xie wrote: > Hi Jean, > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> Hook process operations to support PASID and page table sharing with the >> SMMUv3: >> >> + >> +static void arm_smmu_process_exit(struct iommu_domain *domain, >> + struct iommu_process *process) >> +{ >> + struct arm_smmu_master_data *master; >> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >> + >> + if (!domain->process_exit) >> + return; > > If domain do not set process_exit handler, just return? smmu do not > need invalid ATC, clear cd entry, etc.? Maybe you should check when > call domain->process_exit? Indeed, that doesn't make sense. I'll move the check below. Thanks, Jean >> + >> + spin_lock(&smmu_domain->devices_lock); >> + list_for_each_entry(master, &smmu_domain->devices, list) { >> + if (!master->processes) >> + continue; >> + >> + master->processes--; > Add > if (domain->process_exit) > here? >> + domain->process_exit(domain, master->dev, process->pasid, >> + domain->process_exit_token); >> + >> + /* TODO: inval ATC */ >> + } >> + spin_unlock(&smmu_domain->devices_lock); >> + >> + arm_smmu_write_ctx_desc(smmu_domain, process->pasid, NULL); >> + >> + /* TODO: Invalidate all mappings if not DVM */ >> +} >> + > Thanks > Yisheng Xie > > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bharat, On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote: > Hi Jean, > > +static struct iommu_process * > +iommu_process_alloc(struct iommu_domain *domain, struct task_struct > +*task) { > + int err; > + int pasid; > + struct iommu_process *process; > + > + if (WARN_ON(!domain->ops->process_alloc || > !domain->ops->process_free)) > + return ERR_PTR(-ENODEV); > + > + process = domain->ops->process_alloc(task); > + if (IS_ERR(process)) > + return process; > + if (!process) > + return ERR_PTR(-ENOMEM); > + > + process->pid = get_task_pid(task, PIDTYPE_PID); > + process->release = domain->ops->process_free; > + INIT_LIST_HEAD(&process->domains); > + kref_init(&process->kref); > + > + if (!process->pid) { > + err = -EINVAL; > + goto err_free_process; > + } > + > + idr_preload(GFP_KERNEL); > + spin_lock(&iommu_process_lock); > + pasid = idr_alloc_cyclic(&iommu_process_idr, process, > domain->min_pasid, > + domain->max_pasid + 1, GFP_ATOMIC); > If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0. > When idr_alloc_cyclic is called it invokes idr_get_free_cmn function > where we have following condition. (Based on kernel 4.14-rc6) > if (!radix_tree_tagged(root, IDR_FREE)) > start = max(start, maxindex + 1); > if (start > max) > return ERR_PTR(-ENOSPC); > Here max is being assigned zero by the time this function is invoked, > this value is based on domain->max_pasid. > This condition fails and ENOSPC is returned. > > In this case even though hardware supports PASID, BIND flow fails. It should fail, since we're reserving PASID 0 for non-PASID transactions with S1DSS=0b10. In addition, the SMMUv3 specification does not allow using PASID with a single entry. See the description of S1CDMax in 5.2 Stream Table Entry: "when this field is 0, the substreams of the STE are disabled and one CD is available. (The minimum useful number of substreams is 2.) Any transaction with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMID event recorded." > Any reason why pasid allocation moved to idr allocations rather than > bitmap allocations as in v1 patches ? Yes, idr provides a convenient way to quickly retrieve the context associated with a PASID, when handling a fault. v1 had the allocation in a bitmap and storing in a rb-tree. By using an idr we combine both and rely on a well-tested infrastructure. Note that in the future we might need to go back to handcrafting the PASID allocation, but it will probably still be based on idr. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bharat, On 11/08/2017 05:50 PM, Bharat Kumar Gogada wrote: > Hi Jean, > > +static struct iommu_process * > +iommu_process_alloc(struct iommu_domain *domain, struct task_struct > +*task) { > + int err; > + int pasid; > + struct iommu_process *process; > + > + if (WARN_ON(!domain->ops->process_alloc || > !domain->ops->process_free)) > + return ERR_PTR(-ENODEV); > + > + process = domain->ops->process_alloc(task); > + if (IS_ERR(process)) > + return process; > + if (!process) > + return ERR_PTR(-ENOMEM); > + > + process->pid = get_task_pid(task, PIDTYPE_PID); > + process->release = domain->ops->process_free; > + INIT_LIST_HEAD(&process->domains); > + kref_init(&process->kref); > + > + if (!process->pid) { > + err = -EINVAL; > + goto err_free_process; > + } > + > + idr_preload(GFP_KERNEL); > + spin_lock(&iommu_process_lock); > + pasid = idr_alloc_cyclic(&iommu_process_idr, process, > domain->min_pasid, > + domain->max_pasid + 1, GFP_ATOMIC); > If EP supports only one pasid; domain->min_pasid=1 and domain->max_pasid=0. > When idr_alloc_cyclic is called it invokes idr_get_free_cmn function > where we have following condition. (Based on kernel 4.14-rc6) > if (!radix_tree_tagged(root, IDR_FREE)) > start = max(start, maxindex + 1); > if (start > max) > return ERR_PTR(-ENOSPC); > Here max is being assigned zero by the time this function is invoked, > this value is based on domain->max_pasid. > This condition fails and ENOSPC is returned. > > In this case even though hardware supports PASID, BIND flow fails. It should fail, since we're reserving PASID 0 for non-PASID transactions with S1DSS=0b10. In addition, the SMMUv3 specification does not allow using PASID with a single entry. See the description of S1CDMax in 5.2 Stream Table Entry: "when this field is 0, the substreams of the STE are disabled and one CD is available. (The minimum useful number of substreams is 2.) Any transaction with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMID event recorded." > Any reason why pasid allocation moved to idr allocations rather than > bitmap allocations as in v1 patches ? Yes, idr provides a convenient way to quickly retrieve the context associated with a PASID, when handling a fault. v1 had the allocation in a bitmap and storing in a rb-tree. By using an idr we combine both and rely on a well-tested infrastructure. Note that in the future we might need to go back to handcrafting the PASID allocation, but it will probably still be based on idr. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > In this case even though hardware supports PASID, BIND flow fails. It should fail, since we're reserving PASID 0 for non-PASID transactions with S1DSS=0b10. In addition, the SMMUv3 specification does not allow using PASID with a single entry. See the description of S1CDMax in 5.2 Stream Table Entry: "when this field is 0, the substreams of the STE are disabled and one CD is available. (The minimum useful number of substreams is 2.) Any transaction with a SubstreamID will be terminated with an abort and a C_BAD_SUBSTREAMID event recorded." > Any reason why pasid allocation moved to idr allocations rather than > bitmap allocations as in v1 patches ? Yes, idr provides a convenient way to quickly retrieve the context associated with a PASID, when handling a fault. v1 had the allocation in a bitmap and storing in a rb-tree. By using an idr we combine both and rely on a well-tested infrastructure. Note that in the future we might need to go back to handcrafting the PASID allocation, but it will probably still be based on idr. Thanks for the clarification, Jean. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, +static size_t +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, + unsigned long iova, size_t size) +{ + unsigned long flags; + struct arm_smmu_cmdq_ent cmd; + struct arm_smmu_master_data *master; + + arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, list) + arm_smmu_atc_inv_master(master, &cmd); + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + return size; +} + /* IOMMU API */ static bool arm_smmu_capable(enum iommu_cap cap) { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev) __iommu_process_unbind_dev_all(&smmu_domain->domain, dev); if (smmu_domain) { + arm_smmu_atc_inv_master_all(master, 0); + In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application. When vfio group fd is closed after UNBIND, arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero. Why invalidation needs to be sent on ssid zero ? Regards, Bharat -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bharat, On 16/11/17 14:19, Bharat Kumar Gogada wrote: > Hi Jean, > > +static size_t > +arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, > + unsigned long iova, size_t size) > +{ > + unsigned long flags; > + struct arm_smmu_cmdq_ent cmd; > + struct arm_smmu_master_data *master; > + > + arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); > + > + spin_lock_irqsave(&smmu_domain->devices_lock, flags); > + list_for_each_entry(master, &smmu_domain->devices, list) > + arm_smmu_atc_inv_master(master, &cmd); > + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); > + > + return size; > +} > + > /* IOMMU API */ > static bool arm_smmu_capable(enum iommu_cap cap) { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev) > __iommu_process_unbind_dev_all(&smmu_domain->domain, dev); > > if (smmu_domain) { > + arm_smmu_atc_inv_master_all(master, 0); > + > In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application. > When vfio group fd is closed after UNBIND, arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero. > Why invalidation needs to be sent on ssid zero ? It's possible to use bind/unbind and map/unmap APIs at the same time on a domain. map/unmap modifies non-ssid mappings as usual, for context descriptor 0. Note that SSID 0 is converted to "no SSID" by arm_smmu_atc_inv_to_cmd. That said, I think VFIO cleans all DMA mappings when the VFIO group fd is closed, which will send individual ATC invalidation for each mapping. But VFIO may not be the only user of this API, and future users may be less careful. It's safer to send this global invalidation whenever we detach the domain, to ensure we're not leaving stale ATC entries for the next user. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> + > + spin_lock_irqsave(&smmu_domain->devices_lock, flags); > + list_for_each_entry(master, &smmu_domain->devices, list) > + arm_smmu_atc_inv_master(master, &cmd); > + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); > + > + return size; > +} > + > /* IOMMU API */ > static bool arm_smmu_capable(enum iommu_cap cap) { @@ -2361,6 +2506,8 @@ static void arm_smmu_detach_dev(struct device *dev) > __iommu_process_unbind_dev_all(&smmu_domain->domain, dev); > > if (smmu_domain) { > + arm_smmu_atc_inv_master_all(master, 0); > + > In BIND flow, when VFIO_IOMMU_UNBIND is invoked invalidation is sent on allocated PASID for this application. > When vfio group fd is closed after UNBIND, arm_smmu_detach_dev is invoked now invalidation is being sent on ssid zero. > Why invalidation needs to be sent on ssid zero ? It's possible to use bind/unbind and map/unmap APIs at the same time on a domain. map/unmap modifies non-ssid mappings as usual, for context descriptor 0. Note that SSID 0 is converted to "no SSID" by arm_smmu_atc_inv_to_cmd. That said, I think VFIO cleans all DMA mappings when the VFIO group fd is closed, which will send individual ATC invalidation for each mapping. But VFIO may not be the only user of this API, and future users may be less careful. It's safer to send this global invalidation whenever we detach the domain, to ensure we're not leaving stale ATC entries for the next user. Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. If a device supports multiple PASID there might be different applications running parallel. So why is multiple instances restricted ? Regards, Bharat -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17/11/17 06:11, Bharat Kumar Gogada wrote: [...] > Thanks Jean, I see that currently vfio_group_fops_open does not allow multiple instances. > If a device supports multiple PASID there might be different applications running parallel. > So why is multiple instances restricted ? You can't have multiple processes owning the same PCI device, it's unmanageable. For using multiple PASIDs, my idea was that the userspace driver ("the server"), that owns the device, would have a way to partition it into smaller frames. It forks to create "clients" and assigns a PASID to each of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID into a privileged MMIO frame that defines the partition properties). Each client accesses an unprivileged MMIO frame to use a device partition (or sends commands to the server via IPC), and can perform DMA on its own virtual memory. This is complete speculation of course, we have very little information on how PASID-capable devices will be designed, so I'm trying to imagine likely scenarios. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Jean, On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > IOMMU drivers need a way to bind Linux processes to devices. This is used > for Shared Virtual Memory (SVM), where devices support paging. In that > mode, DMA can directly target virtual addresses of a process. > > Introduce boilerplate code for allocating process structures and binding > them to devices. Four operations are added to IOMMU drivers: > > * process_alloc, process_free: to create an iommu_process structure and > perform architecture-specific operations required to grab the process > (for instance on ARM SMMU, pin down the CPU ASID). There is a single > iommu_process structure per Linux process. > I'm a bit confused here. The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table. (fix me if I misunderstood). Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process? Could you consider document these concepts? Thanks, Liubo -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22/11/17 03:15, Bob Liu wrote: > Hey Jean, > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> IOMMU drivers need a way to bind Linux processes to devices. This is used >> for Shared Virtual Memory (SVM), where devices support paging. In that >> mode, DMA can directly target virtual addresses of a process. >> >> Introduce boilerplate code for allocating process structures and binding >> them to devices. Four operations are added to IOMMU drivers: >> >> * process_alloc, process_free: to create an iommu_process structure and >> perform architecture-specific operations required to grab the process >> (for instance on ARM SMMU, pin down the CPU ASID). There is a single >> iommu_process structure per Linux process. >> > > I'm a bit confused here. > The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table. > (fix me if I misunderstood). iommu_domain can also be seen as a logical partition of devices that share the same address spaces (the concept comes from AMD and Intel IOMMU domains, I believe). Without PASIDs it was a single address space, with PASIDs it can have multiple address spaces. > Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process? > Could you consider document these concepts? iommu_process is used to keep track of Linux process address spaces. I'll rename it to io_mm in next version, to make it clear that it doesn't represent a Linux task but an mm_struct instead. However the implementation stays pretty much identical. A domain can be associated to multiple io_mm, and an io_mm can be associated to multiple domains. In the IOMMU architectures I know, PASID is implemented like this. You have the device tables (stream tables on SMMU), pointing to PASID tables (context descriptor tables on SMMU). In the following diagram, .->+--------+ / 0 | |------ io_pgtable / +--------+ / 1 | |------ io_mm->mm X +--------+ / +--------+ 0 | A |-' 2 | |-. +--------+ +--------+ \ 1 | | 3 | | \ +--------+ +--------+ -- io_mm->mm Y 2 | B |--. PASID tables / +--------+ \ | 3 | B |----+--->+--------+ | +--------+ / 0 | |- | -- io_pgtable 4 | B |--' +--------+ | +--------+ 1 | | | Device tables +--------+ | 2 | |--' +--------+ 3 | |------ io_mm->priv io_pgtable +--------+ PASID tables * Device 0 (e.g. PCI 0000:00:00.0) is in domain A. * Devices 2, 3 and 4 are in domain B. * Domain A has the top set of PASID tables. * Domain B has the bottom set of PASID tables. * Domain A is bound to process address space X. -> Device 0 can access X with PASID 1. * Both domains A and B are bound to process address space Y. -> Devices 0, 2, 3 and 4 can access Y with PASID 2 * PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be reserved for classic DMA map/unmap. Even for hypothetical devices that don't support non-pasid transactions, I'd like to keep this convention. It should be quite useful for device drivers to have PASID 0 available with DMA map/unmap. * When introducing "private" PASID address spaces (that many are asking for), which are backed by a set of io-pgtable and map/unmap ops, I suppose they would reuse the io_mm structure. In this example PASID 3 is associated to a private address space and not backed by an mm. Since the PASID space is global, PASID 3 won't be available for any other domain. Does this clarify the current design, or is it just more confusing? Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2017/11/22 21:04, Jean-Philippe Brucker wrote: > On 22/11/17 03:15, Bob Liu wrote: >> Hey Jean, >> >> On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >>> IOMMU drivers need a way to bind Linux processes to devices. This is used >>> for Shared Virtual Memory (SVM), where devices support paging. In that >>> mode, DMA can directly target virtual addresses of a process. >>> >>> Introduce boilerplate code for allocating process structures and binding >>> them to devices. Four operations are added to IOMMU drivers: >>> >>> * process_alloc, process_free: to create an iommu_process structure and >>> perform architecture-specific operations required to grab the process >>> (for instance on ARM SMMU, pin down the CPU ASID). There is a single >>> iommu_process structure per Linux process. >>> >> >> I'm a bit confused here. >> The original meaning of iommu_domain is a virtual addrspace defined by a set of io page table. >> (fix me if I misunderstood). > > iommu_domain can also be seen as a logical partition of devices that share > the same address spaces (the concept comes from AMD and Intel IOMMU > domains, I believe). Without PASIDs it was a single address space, with > PASIDs it can have multiple address spaces. > >> Then what's the meaning of iommu_domain and iommu_process after introducing iommu_process? >> Could you consider document these concepts? > > iommu_process is used to keep track of Linux process address spaces. I'll > rename it to io_mm in next version, to make it clear that it doesn't > represent a Linux task but an mm_struct instead. However the > implementation stays pretty much identical. A domain can be associated to > multiple io_mm, and an io_mm can be associated to multiple domains. > > In the IOMMU architectures I know, PASID is implemented like this. You > have the device tables (stream tables on SMMU), pointing to PASID tables > (context descriptor tables on SMMU). In the following diagram, > > .->+--------+ > / 0 | |------ io_pgtable > / +--------+ > / 1 | |------ io_mm->mm X > +--------+ / +--------+ > 0 | A |-' 2 | |-. > +--------+ +--------+ \ > 1 | | 3 | | \ > +--------+ +--------+ -- io_mm->mm Y > 2 | B |--. PASID tables / > +--------+ \ | > 3 | B |----+--->+--------+ | > +--------+ / 0 | |- | -- io_pgtable > 4 | B |--' +--------+ | > +--------+ 1 | | | > Device tables +--------+ | > 2 | |--' > +--------+ > 3 | |------ io_mm->priv io_pgtable > +--------+ > PASID tables > > * Device 0 (e.g. PCI 0000:00:00.0) is in domain A. > * Devices 2, 3 and 4 are in domain B. > * Domain A has the top set of PASID tables. > * Domain B has the bottom set of PASID tables. > > * Domain A is bound to process address space X. > -> Device 0 can access X with PASID 1. > * Both domains A and B are bound to process address space Y. > -> Devices 0, 2, 3 and 4 can access Y with PASID 2 > > * PASID 0 is special on Arm SMMU (with S1DSS=0b10). It will always be > reserved for classic DMA map/unmap. Even for hypothetical devices that > don't support non-pasid transactions, I'd like to keep this convention. > It should be quite useful for device drivers to have PASID 0 available > with DMA map/unmap. > > * When introducing "private" PASID address spaces (that many are asking > for), which are backed by a set of io-pgtable and map/unmap ops, I > suppose they would reuse the io_mm structure. In this example PASID 3 is > associated to a private address space and not backed by an mm. Since the > PASID space is global, PASID 3 won't be available for any other domain. > > Does this clarify the current design, or is it just more confusing? > It's very helpful, thank you very much! Regards, Liubo -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a > bond between a container and a process address space, identified by a > device-specific ID named PASID. This allows the device to target DMA > transactions at the process virtual addresses without a need for mapping > and unmapping buffers explicitly in the IOMMU. The process page tables are > shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to > handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a > PASID. > How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? e.g always bind to current process in SET_IOMMU. Not sure about the real use case. > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> > --- > drivers/vfio/vfio_iommu_type1.c | 243 +++++++++++++++++++++++++++++++++++++++- > include/uapi/linux/vfio.h | 69 ++++++++++++ > 2 files changed, 311 insertions(+), 1 deletion(-) > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24/11/17 08:23, Bob Liu wrote: > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> Add two new ioctl for VFIO containers. VFIO_DEVICE_BIND_PROCESS creates a >> bond between a container and a process address space, identified by a >> device-specific ID named PASID. This allows the device to target DMA >> transactions at the process virtual addresses without a need for mapping >> and unmapping buffers explicitly in the IOMMU. The process page tables are >> shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to >> handle faults. VFIO_DEVICE_UNBIND_PROCESS removed a bond identified by a >> PASID. >> > > How about hide bind/unbind into ioctl(VFIO_SET_IOMMU)? > e.g always bind to current process in SET_IOMMU. > > Not sure about the real use case. I guess you could introduce a new VFIO IOMMU type for this. I think this would be useful for SVA without PASID: if the device supports I/O page faults, use SET_IOMMU with a VFIO_SVA_IOMMU type (for example) and the process is bound automatically to the default translation context of the device. This requires a new IOMMU type because the MAP/UNMAP ioctl won't work anymore. I'm not keen on introducing loads of new features in the APIs at the moment, because I only have the IOMMU point of view, not many endpoint users. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > +int iommu_process_bind_device(struct device *dev, struct task_struct *task, > + int *pasid, int flags) > +{ [..] > + err = iommu_process_attach_locked(context, dev); > + if (err) > + iommu_process_put_locked(process); one ref for a context is enough right? so also need call iommu_process_put_locked() if attach ok, or will be leak if user call bind twice for the same device and task. -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > - if (domain->ext_handler) { > + if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) { > + fault->flags |= IOMMU_FAULT_ATOMIC; Why remove the condition of domain->ext_handler? should it be much better like: if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler) If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) is true. It will oops, right? > ret = domain->ext_handler(domain, dev, fault, > domain->handler_token); Thanks Yisheng Xie -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On 29/11/17 06:15, Yisheng Xie wrote: > Hi Jean, > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> - if (domain->ext_handler) { >> + if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) { >> + fault->flags |= IOMMU_FAULT_ATOMIC; > > Why remove the condition of domain->ext_handler? should it be much better like: > if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler) > > If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) > is true. It will oops, right? I removed the check because ext_handler shouldn't be NULL if handler_flags has a bit set (as per iommu_set_ext_fault_handler). But you're right that this is fragile, and I overlooked the case where users could call set_ext_fault_handler to clear the fault handler. (Note that this ext_handler will most likely be replaced by the fault infrastructure that Jacob is working on: https://patchwork.kernel.org/patch/10063385/ to which we should add the atomic/blocking flags) Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 29/11/17 06:08, Yisheng Xie wrote: > > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> +int iommu_process_bind_device(struct device *dev, struct task_struct *task, >> + int *pasid, int flags) >> +{ > [..] >> + err = iommu_process_attach_locked(context, dev); >> + if (err) >> + iommu_process_put_locked(process); > one ref for a context is enough right? so also need call iommu_process_put_locked() > if attach ok, or will be leak if user call bind twice for the same device and task. I wasn't sure, I think I prefer taking one ref for each bind. If user calls bind twice, it should call unbind twice as well (in case of leak we free the context on process exit). Also with this implementation, user can call bind for two devices in the same domain, which will share the same context structure. So we have to take as many refs as bind() calls. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Jean, On 2017/11/29 23:01, Jean-Philippe Brucker wrote: > On 29/11/17 06:08, Yisheng Xie wrote: >> >> >> On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task, >>> + int *pasid, int flags) >>> +{ >> [..] >>> + err = iommu_process_attach_locked(context, dev); >>> + if (err) >>> + iommu_process_put_locked(process); >> one ref for a context is enough right? so also need call iommu_process_put_locked() >> if attach ok, or will be leak if user call bind twice for the same device and task. > > I wasn't sure, I think I prefer taking one ref for each bind. If user > calls bind twice, it should call unbind twice as well (in case of leak we > free the context on process exit). > > Also with this implementation, user can call bind for two devices in the > same domain, which will share the same context structure. So we have to > take as many refs as bind() calls. hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for your next version), right? For each bind it will take a ref of context as present design. but why also process ref need be taken for each bind? I mean it seems does not break _user can call bind for two devices in the same domain_. And if you really want to take a ref of *process* for echo bind, you should put it when unbind, right? I just not find where you put the ref of process when unbind. But just put the process ref when free context. Maybe I just miss something. Thanks Yisheng Xie > > Thanks, > Jean > > . > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hi jean, On 2017/11/29 23:01, Jean-Philippe Brucker wrote: > Hello, > > On 29/11/17 06:15, Yisheng Xie wrote: >> Hi Jean, >> >> On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >>> - if (domain->ext_handler) { >>> + if (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) { >>> + fault->flags |= IOMMU_FAULT_ATOMIC; >> >> Why remove the condition of domain->ext_handler? should it be much better like: >> if ((domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) && domain->ext_handler) >> >> If domain->ext_handler is NULL, and (domain->handler_flags & IOMMU_FAULT_HANDLER_ATOMIC) >> is true. It will oops, right? > > I removed the check because ext_handler shouldn't be NULL if handler_flags > has a bit set (as per iommu_set_ext_fault_handler). But you're right that > this is fragile, and I overlooked the case where users could call > set_ext_fault_handler to clear the fault handler. > > (Note that this ext_handler will most likely be replaced by the fault > infrastructure that Jacob is working on: > https://patchwork.kernel.org/patch/10063385/ to which we should add the > atomic/blocking flags) > Get it, thanks for your explanation. Thanks Yisheng Xie > Thanks, > Jean > > . > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 30/11/17 01:11, Yisheng Xie wrote: > Hi, Jean, > > On 2017/11/29 23:01, Jean-Philippe Brucker wrote: >> On 29/11/17 06:08, Yisheng Xie wrote: >>> >>> >>> On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >>>> +int iommu_process_bind_device(struct device *dev, struct task_struct *task, >>>> + int *pasid, int flags) >>>> +{ >>> [..] >>>> + err = iommu_process_attach_locked(context, dev); >>>> + if (err) >>>> + iommu_process_put_locked(process); >>> one ref for a context is enough right? so also need call iommu_process_put_locked() >>> if attach ok, or will be leak if user call bind twice for the same device and task. >> >> I wasn't sure, I think I prefer taking one ref for each bind. If user >> calls bind twice, it should call unbind twice as well (in case of leak we >> free the context on process exit). >> >> Also with this implementation, user can call bind for two devices in the >> same domain, which will share the same context structure. So we have to >> take as many refs as bind() calls. > > > hmm, it has two ref, one for _context_ and the other for *process* (or maybe mm for > your next version), right? For each bind it will take a ref of context as present > design. but why also process ref need be taken for each bind? I mean it seems does > not break _user can call bind for two devices in the same domain_. > > And if you really want to take a ref of *process* for echo bind, you should put it when > unbind, right? I just not find where you put the ref of process when unbind. But just put > the process ref when free context. > > Maybe I just miss something. No you're right I misunderstood, sorry about that. Each context has a single ref to a process, so we do need to drop the process ref here as you pointed out. I thought I exercised this path though, I'll update my test suite. Also attach_locked shouldn't take a context ref if attach fails... Thanks a lot, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean, On 2017/10/6 21:31, Jean-Philippe Brucker wrote: > If the SMMU supports it and the kernel was built with HTTU support, enable > + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) { > + smmu->features |= ARM_SMMU_FEAT_HA; > + if (reg & IDR0_HD) > + smmu->features |= ARM_SMMU_FEAT_HD; > + } What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ? If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also enable related feature for SMMUv3? Thanks Yisheng Xie > + > /* > * If the CPU is using VHE, but the SMMU doesn't support it, the SMMU > * will create TLB entries for NH-EL1 world and will miss the > -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/12/17 06:51, Yisheng Xie wrote: > Hi Jean, > > On 2017/10/6 21:31, Jean-Philippe Brucker wrote: >> If the SMMU supports it and the kernel was built with HTTU support, enable >> + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && (reg & (IDR0_HA | IDR0_HD))) { >> + smmu->features |= ARM_SMMU_FEAT_HA; >> + if (reg & IDR0_HD) >> + smmu->features |= ARM_SMMU_FEAT_HD; >> + } > > What is relationship of armv8.1 HW_AFDBM and SMMUv3 HTTU? I mean why we need > IS_ENABLED(CONFIG_ARM64_HW_AFDBM) ? I think the reason we needed this was that, without CONFIG_ARM64_HW_AFDBM, the CPU wouldn't update the pte atomically and pte_dirty() wouldn't check the DBM bit 51 (only the SW dirty bit 55). Since af29678fe785 ("arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths") removed lots of #ifdefs, I'll see if we can remove the above IS_ENABLED as well. > If CONFIG_ARM64_HW_AFDBM=y but the process do not support ARMv8.1, should it also > enable related feature for SMMUv3? Yes I think we can enable HTTU in the SMMU even if the CPU doesn't support it (though HTTU is only useful when sharing process address spaces). The mm code checks for both HW and SW bits even when the CPU doesn't support ARMv8.1 Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jean-Philippe, On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote: > /** > + * iommu_process_bind_device - Bind a process address space to a device > + * @dev: the device > + * @task: the process to bind > + * @pasid: valid address where the PASID will be stored > + * @flags: bond properties (IOMMU_PROCESS_BIND_*) > + * > + * Create a bond between device and task, allowing the device to access the > + * process address space using the returned PASID. > + * > + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error > + * is returned. > + */ > +int iommu_process_bind_device(struct device *dev, struct task_struct *task, > + int *pasid, int flags) This API doesn't play nice with endpoint device drivers that have PASID limitations. The AMD driver seems to have PASID limitations per product that are not being advertised in the PCI capability. device_iommu_pasid_init() { pasid_limit = min_t(unsigned int, (unsigned int)(1 << kfd->device_info->max_pasid_bits), iommu_info.max_pasids); /* * last pasid is used for kernel queues doorbells * in the future the last pasid might be used for a kernel thread. */ pasid_limit = min_t(unsigned int, pasid_limit, kfd->doorbell_process_limit - 1); } kfd->device_info->max_pasid_bits seems to contain per device limitations. Would you be willing to extend the API so that the requester can impose some limit on the PASID value that is getting allocated. Sinan
Hi Sinan, On 19/01/18 04:52, Sinan Kaya wrote: > Hi Jean-Philippe, > > On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote: >> /** >> + * iommu_process_bind_device - Bind a process address space to a device >> + * @dev: the device >> + * @task: the process to bind >> + * @pasid: valid address where the PASID will be stored >> + * @flags: bond properties (IOMMU_PROCESS_BIND_*) >> + * >> + * Create a bond between device and task, allowing the device to access the >> + * process address space using the returned PASID. >> + * >> + * On success, 0 is returned and @pasid contains a valid ID. Otherwise, an error >> + * is returned. >> + */ >> +int iommu_process_bind_device(struct device *dev, struct task_struct *task, >> + int *pasid, int flags) > > This API doesn't play nice with endpoint device drivers that have PASID limitations. > > The AMD driver seems to have PASID limitations per product that are not being > advertised in the PCI capability. > > device_iommu_pasid_init() > { > pasid_limit = min_t(unsigned int, > (unsigned int)(1 << kfd->device_info->max_pasid_bits), > iommu_info.max_pasids); > /* > * last pasid is used for kernel queues doorbells > * in the future the last pasid might be used for a kernel thread. > */ > pasid_limit = min_t(unsigned int, > pasid_limit, > kfd->doorbell_process_limit - 1); > } > > kfd->device_info->max_pasid_bits seems to contain per device limitations. > > Would you be willing to extend the API so that the requester can impose some limit > on the PASID value that is getting allocated. Good point. Following the feedback for this series, next version adds another public function: int iommu_sva_device_init(struct device *dev, int features); that has to be called by the device driver before any bind(). The intent is to let some IOMMU drivers initialize PASID tables and other features lazily, only if the device driver actually intends to use them. Maybe I could change this function to: int iommu_sva_device_init(struct device *dev, int features, unsigned int max_pasid); @features is a bitmask telling what the device driver needs (PASID and/or page faults). If features has IOMMU_SVA_FEAT_PASID set, then device driver can set a max_pasid limit, that we'd store in our private device-iommu data. If max_pasid is 0, then we'd use the PCI limit. Thanks, Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2018-01-19 05:27, Jean-Philippe Brucker wrote: > Hi Sinan, > > On 19/01/18 04:52, Sinan Kaya wrote: >> Hi Jean-Philippe, >> >> On 10/6/2017 9:31 AM, Jean-Philippe Brucker wrote: >>> /** >>> + * iommu_process_bind_device - Bind a process address space to a >>> device >>> + * @dev: the device >>> + * @task: the process to bind >>> + * @pasid: valid address where the PASID will be stored >>> + * @flags: bond properties (IOMMU_PROCESS_BIND_*) >>> + * >>> + * Create a bond between device and task, allowing the device to >>> access the >>> + * process address space using the returned PASID. >>> + * >>> + * On success, 0 is returned and @pasid contains a valid ID. >>> Otherwise, an error >>> + * is returned. >>> + */ >>> +int iommu_process_bind_device(struct device *dev, struct task_struct >>> *task, >>> + int *pasid, int flags) >> >> This API doesn't play nice with endpoint device drivers that have >> PASID limitations. >> >> The AMD driver seems to have PASID limitations per product that are >> not being >> advertised in the PCI capability. >> >> device_iommu_pasid_init() >> { >> pasid_limit = min_t(unsigned int, >> (unsigned int)(1 << >> kfd->device_info->max_pasid_bits), >> iommu_info.max_pasids); >> /* >> * last pasid is used for kernel queues doorbells >> * in the future the last pasid might be used for a kernel >> thread. >> */ >> pasid_limit = min_t(unsigned int, >> pasid_limit, >> kfd->doorbell_process_limit - 1); >> } >> >> kfd->device_info->max_pasid_bits seems to contain per device >> limitations. >> >> Would you be willing to extend the API so that the requester can >> impose some limit >> on the PASID value that is getting allocated. > > Good point. Following the feedback for this series, next version adds > another public function: > > int iommu_sva_device_init(struct device *dev, int features); > > that has to be called by the device driver before any bind(). The > intent > is to let some IOMMU drivers initialize PASID tables and other features > lazily, only if the device driver actually intends to use them. Maybe I > could change this function to: > > int iommu_sva_device_init(struct device *dev, int features, unsigned > int > max_pasid); > > @features is a bitmask telling what the device driver needs (PASID > and/or > page faults). If features has IOMMU_SVA_FEAT_PASID set, then device > driver > can set a max_pasid limit, that we'd store in our private device-iommu > data. If max_pasid is 0, then we'd use the PCI limit. Yes, this should work. > > Thanks, > Jean -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html