Message ID | 20181116112430.31248-1-vivek.gautam@codeaurora.org |
---|---|
Headers | show |
Series | iommu/arm-smmu: Add runtime pm/sleep support | expand |
On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: > From: Sricharan R <sricharan@codeaurora.org> > > The smmu device probe/remove and add/remove master device callbacks > gets called when the smmu is not linked to its master, that is without > the context of the master device. So calling runtime apis in those places > separately. > Global locks are also initialized before enabling runtime pm as the > runtime_resume() calls device_reset() which does tlb_sync_global() > that ultimately requires locks to be initialized. > > Signed-off-by: Sricharan R <sricharan@codeaurora.org> > [vivek: Cleanup pm runtime calls] > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > --- > drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++----- > 1 file changed, 91 insertions(+), 10 deletions(-) Given that you're doing the get/put in the TLBI ops unconditionally: > static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) > { > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > - if (smmu_domain->tlb_ops) > + if (smmu_domain->tlb_ops) { > + arm_smmu_rpm_get(smmu); > smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); > + arm_smmu_rpm_put(smmu); > + } > } > > static void arm_smmu_iotlb_sync(struct iommu_domain *domain) > { > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > - if (smmu_domain->tlb_ops) > + if (smmu_domain->tlb_ops) { > + arm_smmu_rpm_get(smmu); > smmu_domain->tlb_ops->tlb_sync(smmu_domain); > + arm_smmu_rpm_put(smmu); > + } Why do you need them around the map/unmap calls as well? Will
[+Thor] On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific > clock and power requirements. > On msm8996, multiple cores, viz. mdss, video, etc. use this > smmu. On sdm845, this smmu is used with gpu. > Add bindings for the same. > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > Reviewed-by: Rob Herring <robh@kernel.org> > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > --- > drivers/iommu/arm-smmu.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 2098c3141f5f..d315ca637097 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -120,6 +120,7 @@ enum arm_smmu_implementation { > GENERIC_SMMU, > ARM_MMU500, > CAVIUM_SMMUV2, > + QCOM_SMMUV2, > }; > > struct arm_smmu_s2cr { > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > +static const char * const qcom_smmuv2_clks[] = { > + "bus", "iface", > +}; > + > +static const struct arm_smmu_match_data qcom_smmuv2 = { > + .version = ARM_SMMU_V2, > + .model = QCOM_SMMUV2, > + .clks = qcom_smmuv2_clks, > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > +}; These seems redundant if we go down the route proposed by Thor, where we just pull all of the clocks out of the device-tree. In which case, why do we need this match_data at all? Will
Hi Will, On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: > > From: Sricharan R <sricharan@codeaurora.org> > > > > The smmu device probe/remove and add/remove master device callbacks > > gets called when the smmu is not linked to its master, that is without > > the context of the master device. So calling runtime apis in those places > > separately. > > Global locks are also initialized before enabling runtime pm as the > > runtime_resume() calls device_reset() which does tlb_sync_global() > > that ultimately requires locks to be initialized. > > > > Signed-off-by: Sricharan R <sricharan@codeaurora.org> > > [vivek: Cleanup pm runtime calls] > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > > --- > > drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++----- > > 1 file changed, 91 insertions(+), 10 deletions(-) > > Given that you're doing the get/put in the TLBI ops unconditionally: > > > static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) > > { > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > > > - if (smmu_domain->tlb_ops) > > + if (smmu_domain->tlb_ops) { > > + arm_smmu_rpm_get(smmu); > > smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); > > + arm_smmu_rpm_put(smmu); > > + } > > } > > > > static void arm_smmu_iotlb_sync(struct iommu_domain *domain) > > { > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > > > - if (smmu_domain->tlb_ops) > > + if (smmu_domain->tlb_ops) { > > + arm_smmu_rpm_get(smmu); > > smmu_domain->tlb_ops->tlb_sync(smmu_domain); > > + arm_smmu_rpm_put(smmu); > > + } > > Why do you need them around the map/unmap calls as well? We still have .tlb_add_flush path? Thanks Vivek > > Will > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
Hi Will, On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > [+Thor] > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific > > clock and power requirements. > > On msm8996, multiple cores, viz. mdss, video, etc. use this > > smmu. On sdm845, this smmu is used with gpu. > > Add bindings for the same. > > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > Reviewed-by: Rob Herring <robh@kernel.org> > > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > > --- > > drivers/iommu/arm-smmu.c | 13 +++++++++++++ > > 1 file changed, 13 insertions(+) > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > index 2098c3141f5f..d315ca637097 100644 > > --- a/drivers/iommu/arm-smmu.c > > +++ b/drivers/iommu/arm-smmu.c > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation { > > GENERIC_SMMU, > > ARM_MMU500, > > CAVIUM_SMMUV2, > > + QCOM_SMMUV2, > > }; > > > > struct arm_smmu_s2cr { > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > > > +static const char * const qcom_smmuv2_clks[] = { > > + "bus", "iface", > > +}; > > + > > +static const struct arm_smmu_match_data qcom_smmuv2 = { > > + .version = ARM_SMMU_V2, > > + .model = QCOM_SMMUV2, > > + .clks = qcom_smmuv2_clks, > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > > +}; > > These seems redundant if we go down the route proposed by Thor, where we > just pull all of the clocks out of the device-tree. In which case, why > do we need this match_data at all? Which is better? Driver relying solely on the device tree to tell which all clocks are required to be enabled, or, the driver deciding itself based on the platform's match data, that it should have X, Y, & Z clocks that should be supplied from the device tree. Thanks Vivek > > Will > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
Hi Vivek, Will, On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam <vivek.gautam@codeaurora.org> wrote: > > Hi Will, > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > > > [+Thor] > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific > > > clock and power requirements. > > > On msm8996, multiple cores, viz. mdss, video, etc. use this > > > smmu. On sdm845, this smmu is used with gpu. > > > Add bindings for the same. > > > > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > > Reviewed-by: Rob Herring <robh@kernel.org> > > > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > > > --- > > > drivers/iommu/arm-smmu.c | 13 +++++++++++++ > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > > index 2098c3141f5f..d315ca637097 100644 > > > --- a/drivers/iommu/arm-smmu.c > > > +++ b/drivers/iommu/arm-smmu.c > > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation { > > > GENERIC_SMMU, > > > ARM_MMU500, > > > CAVIUM_SMMUV2, > > > + QCOM_SMMUV2, > > > }; > > > > > > struct arm_smmu_s2cr { > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > > > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > > > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > > > > > +static const char * const qcom_smmuv2_clks[] = { > > > + "bus", "iface", > > > +}; > > > + > > > +static const struct arm_smmu_match_data qcom_smmuv2 = { > > > + .version = ARM_SMMU_V2, > > > + .model = QCOM_SMMUV2, > > > + .clks = qcom_smmuv2_clks, > > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > > > +}; > > > > These seems redundant if we go down the route proposed by Thor, where we > > just pull all of the clocks out of the device-tree. In which case, why > > do we need this match_data at all? > > Which is better? Driver relying solely on the device tree to tell > which all clocks > are required to be enabled, > or, the driver deciding itself based on the platform's match data, > that it should > have X, Y, & Z clocks that should be supplied from the device tree. The former would simplify the driver, but would also make it impossible to spot mistakes in DT, which would ultimately surface out as very hard to debug bugs (likely complete system lockups). For qcom_smmuv2, I believe we're eventually going to end up with platform-specific quirks anyway, so specifying the clocks too wouldn't hurt. Given that, I'd recommend sticking to the latter, i.e. what this patch does. Best regards, Tomasz
Hi Tomasz, On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote: > > Hi Vivek, Will, > > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam > <vivek.gautam@codeaurora.org> wrote: > > > > Hi Will, > > > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > > > > > [+Thor] > > > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > > > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific > > > > clock and power requirements. > > > > On msm8996, multiple cores, viz. mdss, video, etc. use this > > > > smmu. On sdm845, this smmu is used with gpu. > > > > Add bindings for the same. > > > > > > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > > > Reviewed-by: Rob Herring <robh@kernel.org> > > > > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > > > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > > > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > > > > --- > > > > drivers/iommu/arm-smmu.c | 13 +++++++++++++ > > > > 1 file changed, 13 insertions(+) > > > > > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > > > > index 2098c3141f5f..d315ca637097 100644 > > > > --- a/drivers/iommu/arm-smmu.c > > > > +++ b/drivers/iommu/arm-smmu.c > > > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation { > > > > GENERIC_SMMU, > > > > ARM_MMU500, > > > > CAVIUM_SMMUV2, > > > > + QCOM_SMMUV2, > > > > }; > > > > > > > > struct arm_smmu_s2cr { > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > > > > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > > > > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > > > > > > > +static const char * const qcom_smmuv2_clks[] = { > > > > + "bus", "iface", > > > > +}; > > > > + > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = { > > > > + .version = ARM_SMMU_V2, > > > > + .model = QCOM_SMMUV2, > > > > + .clks = qcom_smmuv2_clks, > > > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > > > > +}; > > > > > > These seems redundant if we go down the route proposed by Thor, where we > > > just pull all of the clocks out of the device-tree. In which case, why > > > do we need this match_data at all? > > > > Which is better? Driver relying solely on the device tree to tell > > which all clocks > > are required to be enabled, > > or, the driver deciding itself based on the platform's match data, > > that it should > > have X, Y, & Z clocks that should be supplied from the device tree. > > The former would simplify the driver, but would also make it > impossible to spot mistakes in DT, which would ultimately surface out > as very hard to debug bugs (likely complete system lockups). Thanks. Yea, this is how I understand things presently. Relying on device tree puts the things out of driver's control. Hi Will, Am I unable to understand the intentions here for Thor's clock-fetch design change? > > For qcom_smmuv2, I believe we're eventually going to end up with > platform-specific quirks anyway, so specifying the clocks too wouldn't > hurt. Given that, I'd recommend sticking to the latter, i.e. what this > patch does. > > Best regards, > Tomasz Best regards Vivek > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote: > On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote: > > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam > > <vivek.gautam@codeaurora.org> wrote: > > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > > > > > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > > > > > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > > > > > > > > > +static const char * const qcom_smmuv2_clks[] = { > > > > > + "bus", "iface", > > > > > +}; > > > > > + > > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = { > > > > > + .version = ARM_SMMU_V2, > > > > > + .model = QCOM_SMMUV2, > > > > > + .clks = qcom_smmuv2_clks, > > > > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > > > > > +}; > > > > > > > > These seems redundant if we go down the route proposed by Thor, where we > > > > just pull all of the clocks out of the device-tree. In which case, why > > > > do we need this match_data at all? > > > > > > Which is better? Driver relying solely on the device tree to tell > > > which all clocks > > > are required to be enabled, > > > or, the driver deciding itself based on the platform's match data, > > > that it should > > > have X, Y, & Z clocks that should be supplied from the device tree. > > > > The former would simplify the driver, but would also make it > > impossible to spot mistakes in DT, which would ultimately surface out > > as very hard to debug bugs (likely complete system lockups). > > Thanks. > Yea, this is how I understand things presently. Relying on device tree > puts the things out of driver's control. But it also has the undesirable effect of having to update the driver code whenever we want to add support for a new SMMU implementation. If we do this all in the DT, as Thor is trying to do, then older kernels will work well with new hardware. > Hi Will, > Am I unable to understand the intentions here for Thor's clock-fetch > design change? I'm having trouble parsing your question, sorry. Please work with Thor so that we have a single way to get the clock information. My preference is to take it from the firmware, for the reason I stated above. Will
On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote: > Hi Will, > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > > > On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: > > > From: Sricharan R <sricharan@codeaurora.org> > > > > > > The smmu device probe/remove and add/remove master device callbacks > > > gets called when the smmu is not linked to its master, that is without > > > the context of the master device. So calling runtime apis in those places > > > separately. > > > Global locks are also initialized before enabling runtime pm as the > > > runtime_resume() calls device_reset() which does tlb_sync_global() > > > that ultimately requires locks to be initialized. > > > > > > Signed-off-by: Sricharan R <sricharan@codeaurora.org> > > > [vivek: Cleanup pm runtime calls] > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > > Reviewed-by: Tomasz Figa <tfiga@chromium.org> > > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > > > Reviewed-by: Robin Murphy <robin.murphy@arm.com> > > > --- > > > drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 91 insertions(+), 10 deletions(-) > > > > Given that you're doing the get/put in the TLBI ops unconditionally: > > > > > static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) > > > { > > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > > > > > - if (smmu_domain->tlb_ops) > > > + if (smmu_domain->tlb_ops) { > > > + arm_smmu_rpm_get(smmu); > > > smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); > > > + arm_smmu_rpm_put(smmu); > > > + } > > > } > > > > > > static void arm_smmu_iotlb_sync(struct iommu_domain *domain) > > > { > > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > > + struct arm_smmu_device *smmu = smmu_domain->smmu; > > > > > > - if (smmu_domain->tlb_ops) > > > + if (smmu_domain->tlb_ops) { > > > + arm_smmu_rpm_get(smmu); > > > smmu_domain->tlb_ops->tlb_sync(smmu_domain); > > > + arm_smmu_rpm_put(smmu); > > > + } > > > > Why do you need them around the map/unmap calls as well? > > We still have .tlb_add_flush path? Ok, so we could add the ops around that as well. Right now, we've got the runtime pm hooks crossing two parts of the API. Will
On Sat, Nov 24, 2018 at 3:34 AM Will Deacon <will.deacon@arm.com> wrote: > > On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote: > > On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote: > > > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam > > > <vivek.gautam@codeaurora.org> wrote: > > > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: > > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: > > > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > > > > > > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > > > > > > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > > > > > > > > > > > > +static const char * const qcom_smmuv2_clks[] = { > > > > > > + "bus", "iface", > > > > > > +}; > > > > > > + > > > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = { > > > > > > + .version = ARM_SMMU_V2, > > > > > > + .model = QCOM_SMMUV2, > > > > > > + .clks = qcom_smmuv2_clks, > > > > > > + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), > > > > > > +}; > > > > > > > > > > These seems redundant if we go down the route proposed by Thor, where we > > > > > just pull all of the clocks out of the device-tree. In which case, why > > > > > do we need this match_data at all? > > > > > > > > Which is better? Driver relying solely on the device tree to tell > > > > which all clocks > > > > are required to be enabled, > > > > or, the driver deciding itself based on the platform's match data, > > > > that it should > > > > have X, Y, & Z clocks that should be supplied from the device tree. > > > > > > The former would simplify the driver, but would also make it > > > impossible to spot mistakes in DT, which would ultimately surface out > > > as very hard to debug bugs (likely complete system lockups). > > > > Thanks. > > Yea, this is how I understand things presently. Relying on device tree > > puts the things out of driver's control. > > But it also has the undesirable effect of having to update the driver > code whenever we want to add support for a new SMMU implementation. If > we do this all in the DT, as Thor is trying to do, then older kernels > will work well with new hardware. Fair enough, if you're okay with that. Obviously one would still have to change the DT bindings to list the exact set of clocks for the new hardware variant, unless the convention changed recently. Best regards, Tomasz
On 11/24/2018 12:06 AM, Will Deacon wrote: > On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote: >> Hi Will, >> >> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: >>> On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: >>>> From: Sricharan R <sricharan@codeaurora.org> >>>> >>>> The smmu device probe/remove and add/remove master device callbacks >>>> gets called when the smmu is not linked to its master, that is without >>>> the context of the master device. So calling runtime apis in those places >>>> separately. >>>> Global locks are also initialized before enabling runtime pm as the >>>> runtime_resume() calls device_reset() which does tlb_sync_global() >>>> that ultimately requires locks to be initialized. >>>> >>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org> >>>> [vivek: Cleanup pm runtime calls] >>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>> Reviewed-by: Tomasz Figa <tfiga@chromium.org> >>>> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> >>>> Reviewed-by: Robin Murphy <robin.murphy@arm.com> >>>> --- >>>> drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++----- >>>> 1 file changed, 91 insertions(+), 10 deletions(-) >>> Given that you're doing the get/put in the TLBI ops unconditionally: >>> >>>> static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) >>>> { >>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; >>>> >>>> - if (smmu_domain->tlb_ops) >>>> + if (smmu_domain->tlb_ops) { >>>> + arm_smmu_rpm_get(smmu); >>>> smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); >>>> + arm_smmu_rpm_put(smmu); >>>> + } >>>> } >>>> >>>> static void arm_smmu_iotlb_sync(struct iommu_domain *domain) >>>> { >>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; >>>> >>>> - if (smmu_domain->tlb_ops) >>>> + if (smmu_domain->tlb_ops) { >>>> + arm_smmu_rpm_get(smmu); >>>> smmu_domain->tlb_ops->tlb_sync(smmu_domain); >>>> + arm_smmu_rpm_put(smmu); >>>> + } >>> Why do you need them around the map/unmap calls as well? >> We still have .tlb_add_flush path? > Ok, so we could add the ops around that as well. Right now, we've got > the runtime pm hooks crossing two parts of the API. Sure, will do that then, and remove the runtime pm hooks from map/unmap. Thanks Vivek > > Will
On 11/24/2018 12:04 AM, Will Deacon wrote: > On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote: >> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote: >>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam >>> <vivek.gautam@codeaurora.org> wrote: >>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote: >>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: >>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); >>>>>> ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); >>>>>> ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); >>>>>> >>>>>> +static const char * const qcom_smmuv2_clks[] = { >>>>>> + "bus", "iface", >>>>>> +}; >>>>>> + >>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = { >>>>>> + .version = ARM_SMMU_V2, >>>>>> + .model = QCOM_SMMUV2, >>>>>> + .clks = qcom_smmuv2_clks, >>>>>> + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), >>>>>> +}; >>>>> These seems redundant if we go down the route proposed by Thor, where we >>>>> just pull all of the clocks out of the device-tree. In which case, why >>>>> do we need this match_data at all? >>>> Which is better? Driver relying solely on the device tree to tell >>>> which all clocks >>>> are required to be enabled, >>>> or, the driver deciding itself based on the platform's match data, >>>> that it should >>>> have X, Y, & Z clocks that should be supplied from the device tree. >>> The former would simplify the driver, but would also make it >>> impossible to spot mistakes in DT, which would ultimately surface out >>> as very hard to debug bugs (likely complete system lockups). >> Thanks. >> Yea, this is how I understand things presently. Relying on device tree >> puts the things out of driver's control. > But it also has the undesirable effect of having to update the driver > code whenever we want to add support for a new SMMU implementation. If > we do this all in the DT, as Thor is trying to do, then older kernels > will work well with new hardware. > >> Hi Will, >> Am I unable to understand the intentions here for Thor's clock-fetch >> design change? > I'm having trouble parsing your question, sorry. Please work with Thor > so that we have a single way to get the clock information. My preference > is to take it from the firmware, for the reason I stated above. Hi Will, Sure, thanks. I will work with Thor to get this going. Hi Thor, Does it sound okay to you to squash your patch [1] into my patch [2] with your 'Signed-off-by' tag? I will update the commit log to include the information about getting clock details from device tree. [1] https://patchwork.kernel.org/patch/10628725/ [2] https://patchwork.kernel.org/patch/10686061/ Best regards Vivek > > Will
On 11/26/2018 11:33 AM, Vivek Gautam wrote: > > > On 11/24/2018 12:06 AM, Will Deacon wrote: >> On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote: >>> Hi Will, >>> >>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> >>> wrote: >>>> On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: >>>>> From: Sricharan R <sricharan@codeaurora.org> >>>>> >>>>> The smmu device probe/remove and add/remove master device callbacks >>>>> gets called when the smmu is not linked to its master, that is >>>>> without >>>>> the context of the master device. So calling runtime apis in those >>>>> places >>>>> separately. >>>>> Global locks are also initialized before enabling runtime pm as the >>>>> runtime_resume() calls device_reset() which does tlb_sync_global() >>>>> that ultimately requires locks to be initialized. >>>>> >>>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org> >>>>> [vivek: Cleanup pm runtime calls] >>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>> Reviewed-by: Tomasz Figa <tfiga@chromium.org> >>>>> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> >>>>> Reviewed-by: Robin Murphy <robin.murphy@arm.com> >>>>> --- >>>>> drivers/iommu/arm-smmu.c | 101 >>>>> ++++++++++++++++++++++++++++++++++++++++++----- >>>>> 1 file changed, 91 insertions(+), 10 deletions(-) >>>> Given that you're doing the get/put in the TLBI ops unconditionally: >>>> >>>>> static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) >>>>> { >>>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >>>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; >>>>> >>>>> - if (smmu_domain->tlb_ops) >>>>> + if (smmu_domain->tlb_ops) { >>>>> + arm_smmu_rpm_get(smmu); >>>>> smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); >>>>> + arm_smmu_rpm_put(smmu); >>>>> + } >>>>> } >>>>> >>>>> static void arm_smmu_iotlb_sync(struct iommu_domain *domain) >>>>> { >>>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); >>>>> + struct arm_smmu_device *smmu = smmu_domain->smmu; >>>>> >>>>> - if (smmu_domain->tlb_ops) >>>>> + if (smmu_domain->tlb_ops) { >>>>> + arm_smmu_rpm_get(smmu); >>>>> smmu_domain->tlb_ops->tlb_sync(smmu_domain); >>>>> + arm_smmu_rpm_put(smmu); >>>>> + } >>>> Why do you need them around the map/unmap calls as well? >>> We still have .tlb_add_flush path? >> Ok, so we could add the ops around that as well. Right now, we've got >> the runtime pm hooks crossing two parts of the API. > > Sure, will do that then, and remove the runtime pm hooks from map/unmap. I missed this earlier - We are adding runtime pm hooks in the 'iommu_ops' callbacks and not really to 'tlb_ops'. So how the runtime pm hooks crossing the paths? '.map/.unmap' iommu_ops don't call '.flush_iotlb_all' or '.iotlb_sync' iommu_ops anywhere. E.g., only callers to domain->ops->flush_iotlb_all() are: iommu_dma_flush_iotlb_all(), or iommu_flush_tlb_all() which are not in map/unmap paths. Regards Vivek > > Thanks > Vivek >> >> Will >
Hi Vivek, On 11/26/18 4:55 AM, Vivek Gautam wrote: > > On 11/24/2018 12:04 AM, Will Deacon wrote: >> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote: >>> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote: >>>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam >>>> <vivek.gautam@codeaurora.org> wrote: >>>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> >>>>> wrote: >>>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: >>>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, >>>>>>> ARM_SMMU_V1_64K, GENERIC_SMMU); >>>>>>> ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); >>>>>>> ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); >>>>>>> >>>>>>> +static const char * const qcom_smmuv2_clks[] = { >>>>>>> + "bus", "iface", >>>>>>> +}; >>>>>>> + >>>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = { >>>>>>> + .version = ARM_SMMU_V2, >>>>>>> + .model = QCOM_SMMUV2, >>>>>>> + .clks = qcom_smmuv2_clks, >>>>>>> + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), >>>>>>> +}; >>>>>> These seems redundant if we go down the route proposed by Thor, >>>>>> where we >>>>>> just pull all of the clocks out of the device-tree. In which case, >>>>>> why >>>>>> do we need this match_data at all? >>>>> Which is better? Driver relying solely on the device tree to tell >>>>> which all clocks >>>>> are required to be enabled, >>>>> or, the driver deciding itself based on the platform's match data, >>>>> that it should >>>>> have X, Y, & Z clocks that should be supplied from the device tree. >>>> The former would simplify the driver, but would also make it >>>> impossible to spot mistakes in DT, which would ultimately surface out >>>> as very hard to debug bugs (likely complete system lockups). >>> Thanks. >>> Yea, this is how I understand things presently. Relying on device tree >>> puts the things out of driver's control. >> But it also has the undesirable effect of having to update the driver >> code whenever we want to add support for a new SMMU implementation. If >> we do this all in the DT, as Thor is trying to do, then older kernels >> will work well with new hardware. >> >>> Hi Will, >>> Am I unable to understand the intentions here for Thor's clock-fetch >>> design change? >> I'm having trouble parsing your question, sorry. Please work with Thor >> so that we have a single way to get the clock information. My preference >> is to take it from the firmware, for the reason I stated above. > Hi Will, > > Sure, thanks. I will work with Thor to get this going. > > Hi Thor, > Does it sound okay to you to squash your patch [1] into my patch [2] with > your 'Signed-off-by' tag? > I will update the commit log to include the information about getting > clock details from device tree. > > [1] https://patchwork.kernel.org/patch/10628725/ > [2] https://patchwork.kernel.org/patch/10686061/ > Yes, that would be great and easier to understand than my patch on top of yours. Additionally, can you remove the "Error:" as Will requested as part of the squash? Thank you! Thor > Best regards > Vivek >> >> Will > >
Hi Thor, On 11/26/2018 8:11 PM, Thor Thayer wrote: > Hi Vivek, > > On 11/26/18 4:55 AM, Vivek Gautam wrote: >> >> On 11/24/2018 12:04 AM, Will Deacon wrote: >>> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote: >>>> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> >>>> wrote: >>>>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam >>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon >>>>>> <will.deacon@arm.com> wrote: >>>>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote: >>>>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, >>>>>>>> ARM_SMMU_V1_64K, GENERIC_SMMU); >>>>>>>> ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); >>>>>>>> ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); >>>>>>>> >>>>>>>> +static const char * const qcom_smmuv2_clks[] = { >>>>>>>> + "bus", "iface", >>>>>>>> +}; >>>>>>>> + >>>>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = { >>>>>>>> + .version = ARM_SMMU_V2, >>>>>>>> + .model = QCOM_SMMUV2, >>>>>>>> + .clks = qcom_smmuv2_clks, >>>>>>>> + .num_clks = ARRAY_SIZE(qcom_smmuv2_clks), >>>>>>>> +}; >>>>>>> These seems redundant if we go down the route proposed by Thor, >>>>>>> where we >>>>>>> just pull all of the clocks out of the device-tree. In which >>>>>>> case, why >>>>>>> do we need this match_data at all? >>>>>> Which is better? Driver relying solely on the device tree to tell >>>>>> which all clocks >>>>>> are required to be enabled, >>>>>> or, the driver deciding itself based on the platform's match data, >>>>>> that it should >>>>>> have X, Y, & Z clocks that should be supplied from the device tree. >>>>> The former would simplify the driver, but would also make it >>>>> impossible to spot mistakes in DT, which would ultimately surface out >>>>> as very hard to debug bugs (likely complete system lockups). >>>> Thanks. >>>> Yea, this is how I understand things presently. Relying on device tree >>>> puts the things out of driver's control. >>> But it also has the undesirable effect of having to update the driver >>> code whenever we want to add support for a new SMMU implementation. If >>> we do this all in the DT, as Thor is trying to do, then older kernels >>> will work well with new hardware. >>> >>>> Hi Will, >>>> Am I unable to understand the intentions here for Thor's clock-fetch >>>> design change? >>> I'm having trouble parsing your question, sorry. Please work with Thor >>> so that we have a single way to get the clock information. My >>> preference >>> is to take it from the firmware, for the reason I stated above. >> Hi Will, >> >> Sure, thanks. I will work with Thor to get this going. >> >> Hi Thor, >> Does it sound okay to you to squash your patch [1] into my patch [2] >> with >> your 'Signed-off-by' tag? >> I will update the commit log to include the information about getting >> clock details from device tree. >> >> [1] https://patchwork.kernel.org/patch/10628725/ >> [2] https://patchwork.kernel.org/patch/10686061/ >> > > Yes, that would be great and easier to understand than my patch on top > of yours. > > Additionally, can you remove the "Error:" as Will requested as part of > the squash? Thanks for your consent. I have reworked the patch today, and have addressed Will's comment. I will give a try on the board and post it by tomorrow. Best regards Vivek > > Thank you! > > Thor > >> Best regards >> Vivek >>> >>> Will >> >> >
On Mon, Nov 26, 2018 at 04:56:42PM +0530, Vivek Gautam wrote: > On 11/26/2018 11:33 AM, Vivek Gautam wrote: > >On 11/24/2018 12:06 AM, Will Deacon wrote: > >>On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote: > >>>On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> > >>>wrote: > >>>>On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote: > >>>>>From: Sricharan R <sricharan@codeaurora.org> > >>>>> > >>>>>The smmu device probe/remove and add/remove master device callbacks > >>>>>gets called when the smmu is not linked to its master, that is > >>>>>without > >>>>>the context of the master device. So calling runtime apis in those > >>>>>places > >>>>>separately. > >>>>>Global locks are also initialized before enabling runtime pm as the > >>>>>runtime_resume() calls device_reset() which does tlb_sync_global() > >>>>>that ultimately requires locks to be initialized. > >>>>> > >>>>>Signed-off-by: Sricharan R <sricharan@codeaurora.org> > >>>>>[vivek: Cleanup pm runtime calls] > >>>>>Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > >>>>>Reviewed-by: Tomasz Figa <tfiga@chromium.org> > >>>>>Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> > >>>>>Reviewed-by: Robin Murphy <robin.murphy@arm.com> > >>>>>--- > >>>>> drivers/iommu/arm-smmu.c | 101 > >>>>>++++++++++++++++++++++++++++++++++++++++++----- > >>>>> 1 file changed, 91 insertions(+), 10 deletions(-) > >>>>Given that you're doing the get/put in the TLBI ops unconditionally: > >>>> > >>>>> static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) > >>>>> { > >>>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > >>>>>+ struct arm_smmu_device *smmu = smmu_domain->smmu; > >>>>> > >>>>>- if (smmu_domain->tlb_ops) > >>>>>+ if (smmu_domain->tlb_ops) { > >>>>>+ arm_smmu_rpm_get(smmu); > >>>>>smmu_domain->tlb_ops->tlb_flush_all(smmu_domain); > >>>>>+ arm_smmu_rpm_put(smmu); > >>>>>+ } > >>>>> } > >>>>> > >>>>> static void arm_smmu_iotlb_sync(struct iommu_domain *domain) > >>>>> { > >>>>> struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > >>>>>+ struct arm_smmu_device *smmu = smmu_domain->smmu; > >>>>> > >>>>>- if (smmu_domain->tlb_ops) > >>>>>+ if (smmu_domain->tlb_ops) { > >>>>>+ arm_smmu_rpm_get(smmu); > >>>>>smmu_domain->tlb_ops->tlb_sync(smmu_domain); > >>>>>+ arm_smmu_rpm_put(smmu); > >>>>>+ } > >>>>Why do you need them around the map/unmap calls as well? > >>>We still have .tlb_add_flush path? > >>Ok, so we could add the ops around that as well. Right now, we've got > >>the runtime pm hooks crossing two parts of the API. > > > >Sure, will do that then, and remove the runtime pm hooks from map/unmap. > > I missed this earlier - > We are adding runtime pm hooks in the 'iommu_ops' callbacks and not really > to > 'tlb_ops'. So how the runtime pm hooks crossing the paths? > '.map/.unmap' iommu_ops don't call '.flush_iotlb_all' or '.iotlb_sync' > iommu_ops > anywhere. > > E.g., only callers to domain->ops->flush_iotlb_all() are: > iommu_dma_flush_iotlb_all(), or iommu_flush_tlb_all() which are not in > map/unmap paths. Yes, sorry, I got confused here and completely misled you. In which case, your original patch is ok because it intercepts the core IOMMU API via iommu_ops. Apologies. At that level, should we also annotate arm_smmu_iova_to_phys_hard() for the iova_to_phys() implementation? With that detail and clock bits sorted out, we should be able to get this queued at last. Will