mbox series

[RESEND,v17,0/5] iommu/arm-smmu: Add runtime pm/sleep support

Message ID 20181116112430.31248-1-vivek.gautam@codeaurora.org
Headers show
Series iommu/arm-smmu: Add runtime pm/sleep support | expand

Message

Vivek Gautam Nov. 16, 2018, 11:24 a.m. UTC
Hi Will,
I am resending this series after we concluded on comments [1,2] on v16 of
this patch-series, and the subsequent patch [3] was posted.
Kindly merge this series.

Thanks
Vivek

Previous version of this patch series is @ [4].
Also refer to [4] for change logs for previous versions.

[1] https://lore.kernel.org/patchwork/patch/979430/
[2] https://lore.kernel.org/patchwork/patch/979433/
[3] https://lore.kernel.org/patchwork/patch/994194/
[4] https://lore.kernel.org/patchwork/cover/979429/

Sricharan R (3):
  iommu/arm-smmu: Add pm_runtime/sleep ops
  iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
  iommu/arm-smmu: Add the device_link between masters and smmu

Vivek Gautam (2):
  dt-bindings: arm-smmu: Add bindings for qcom,smmu-v2
  iommu/arm-smmu: Add support for qcom,smmu-v2 variant

 .../devicetree/bindings/iommu/arm,smmu.txt         |  39 +++++
 drivers/iommu/arm-smmu.c                           | 192 +++++++++++++++++++--
 2 files changed, 219 insertions(+), 12 deletions(-)

Comments

Will Deacon Nov. 21, 2018, 5:37 p.m. UTC | #1
On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
> From: Sricharan R <sricharan@codeaurora.org>
> 
> The smmu device probe/remove and add/remove master device callbacks
> gets called when the smmu is not linked to its master, that is without
> the context of the master device. So calling runtime apis in those places
> separately.
> Global locks are also initialized before enabling runtime pm as the
> runtime_resume() calls device_reset() which does tlb_sync_global()
> that ultimately requires locks to be initialized.
> 
> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> [vivek: Cleanup pm runtime calls]
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 91 insertions(+), 10 deletions(-)

Given that you're doing the get/put in the TLBI ops unconditionally:

>  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  
> -	if (smmu_domain->tlb_ops)
> +	if (smmu_domain->tlb_ops) {
> +		arm_smmu_rpm_get(smmu);
>  		smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
> +		arm_smmu_rpm_put(smmu);
> +	}
>  }
>  
>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>  {
>  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  
> -	if (smmu_domain->tlb_ops)
> +	if (smmu_domain->tlb_ops) {
> +		arm_smmu_rpm_get(smmu);
>  		smmu_domain->tlb_ops->tlb_sync(smmu_domain);
> +		arm_smmu_rpm_put(smmu);
> +	}

Why do you need them around the map/unmap calls as well?

Will
Will Deacon Nov. 21, 2018, 5:38 p.m. UTC | #2
[+Thor]

On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> clock and power requirements.
> On msm8996, multiple cores, viz. mdss, video, etc. use this
> smmu. On sdm845, this smmu is used with gpu.
> Add bindings for the same.
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> Reviewed-by: Rob Herring <robh@kernel.org>
> Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2098c3141f5f..d315ca637097 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
>  	GENERIC_SMMU,
>  	ARM_MMU500,
>  	CAVIUM_SMMUV2,
> +	QCOM_SMMUV2,
>  };
>  
>  struct arm_smmu_s2cr {
> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
>  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>  
> +static const char * const qcom_smmuv2_clks[] = {
> +	"bus", "iface",
> +};
> +
> +static const struct arm_smmu_match_data qcom_smmuv2 = {
> +	.version = ARM_SMMU_V2,
> +	.model = QCOM_SMMUV2,
> +	.clks = qcom_smmuv2_clks,
> +	.num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> +};

These seems redundant if we go down the route proposed by Thor, where we
just pull all of the clocks out of the device-tree. In which case, why
do we need this match_data at all?

Will
Vivek Gautam Nov. 22, 2018, 12:02 p.m. UTC | #3
Hi Will,

On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
>
> On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
> > From: Sricharan R <sricharan@codeaurora.org>
> >
> > The smmu device probe/remove and add/remove master device callbacks
> > gets called when the smmu is not linked to its master, that is without
> > the context of the master device. So calling runtime apis in those places
> > separately.
> > Global locks are also initialized before enabling runtime pm as the
> > runtime_resume() calls device_reset() which does tlb_sync_global()
> > that ultimately requires locks to be initialized.
> >
> > Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> > [vivek: Cleanup pm runtime calls]
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 91 insertions(+), 10 deletions(-)
>
> Given that you're doing the get/put in the TLBI ops unconditionally:
>
> >  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> >  {
> >       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> >
> > -     if (smmu_domain->tlb_ops)
> > +     if (smmu_domain->tlb_ops) {
> > +             arm_smmu_rpm_get(smmu);
> >               smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
> > +             arm_smmu_rpm_put(smmu);
> > +     }
> >  }
> >
> >  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> >  {
> >       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> >
> > -     if (smmu_domain->tlb_ops)
> > +     if (smmu_domain->tlb_ops) {
> > +             arm_smmu_rpm_get(smmu);
> >               smmu_domain->tlb_ops->tlb_sync(smmu_domain);
> > +             arm_smmu_rpm_put(smmu);
> > +     }
>
> Why do you need them around the map/unmap calls as well?

We still have .tlb_add_flush path?

Thanks
Vivek
>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Vivek Gautam Nov. 23, 2018, 9:13 a.m. UTC | #4
Hi Will,

On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
>
> [+Thor]
>
> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> > clock and power requirements.
> > On msm8996, multiple cores, viz. mdss, video, etc. use this
> > smmu. On sdm845, this smmu is used with gpu.
> > Add bindings for the same.
> >
> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > Reviewed-by: Rob Herring <robh@kernel.org>
> > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm-smmu.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 2098c3141f5f..d315ca637097 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
> >       GENERIC_SMMU,
> >       ARM_MMU500,
> >       CAVIUM_SMMUV2,
> > +     QCOM_SMMUV2,
> >  };
> >
> >  struct arm_smmu_s2cr {
> > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> >
> > +static const char * const qcom_smmuv2_clks[] = {
> > +     "bus", "iface",
> > +};
> > +
> > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > +     .version = ARM_SMMU_V2,
> > +     .model = QCOM_SMMUV2,
> > +     .clks = qcom_smmuv2_clks,
> > +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > +};
>
> These seems redundant if we go down the route proposed by Thor, where we
> just pull all of the clocks out of the device-tree. In which case, why
> do we need this match_data at all?

Which is better? Driver relying solely on the device tree to tell
which all clocks
are required to be enabled,
or, the driver deciding itself based on the platform's match data,
that it should
have X, Y, & Z clocks that should be supplied from the device tree.

Thanks
Vivek

>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Tomasz Figa Nov. 23, 2018, 9:22 a.m. UTC | #5
Hi Vivek, Will,

On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
<vivek.gautam@codeaurora.org> wrote:
>
> Hi Will,
>
> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
> >
> > [+Thor]
> >
> > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> > > clock and power requirements.
> > > On msm8996, multiple cores, viz. mdss, video, etc. use this
> > > smmu. On sdm845, this smmu is used with gpu.
> > > Add bindings for the same.
> > >
> > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> > > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >  drivers/iommu/arm-smmu.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 2098c3141f5f..d315ca637097 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
> > >       GENERIC_SMMU,
> > >       ARM_MMU500,
> > >       CAVIUM_SMMUV2,
> > > +     QCOM_SMMUV2,
> > >  };
> > >
> > >  struct arm_smmu_s2cr {
> > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > >
> > > +static const char * const qcom_smmuv2_clks[] = {
> > > +     "bus", "iface",
> > > +};
> > > +
> > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > +     .version = ARM_SMMU_V2,
> > > +     .model = QCOM_SMMUV2,
> > > +     .clks = qcom_smmuv2_clks,
> > > +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > +};
> >
> > These seems redundant if we go down the route proposed by Thor, where we
> > just pull all of the clocks out of the device-tree. In which case, why
> > do we need this match_data at all?
>
> Which is better? Driver relying solely on the device tree to tell
> which all clocks
> are required to be enabled,
> or, the driver deciding itself based on the platform's match data,
> that it should
> have X, Y, & Z clocks that should be supplied from the device tree.

The former would simplify the driver, but would also make it
impossible to spot mistakes in DT, which would ultimately surface out
as very hard to debug bugs (likely complete system lockups).

For qcom_smmuv2, I believe we're eventually going to end up with
platform-specific quirks anyway, so specifying the clocks too wouldn't
hurt. Given that, I'd recommend sticking to the latter, i.e. what this
patch does.

Best regards,
Tomasz
Vivek Gautam Nov. 23, 2018, 9:36 a.m. UTC | #6
Hi Tomasz,

On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Vivek, Will,
>
> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
> <vivek.gautam@codeaurora.org> wrote:
> >
> > Hi Will,
> >
> > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
> > >
> > > [+Thor]
> > >
> > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > > qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
> > > > clock and power requirements.
> > > > On msm8996, multiple cores, viz. mdss, video, etc. use this
> > > > smmu. On sdm845, this smmu is used with gpu.
> > > > Add bindings for the same.
> > > >
> > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> > > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> > > > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > > > ---
> > > >  drivers/iommu/arm-smmu.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > index 2098c3141f5f..d315ca637097 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -120,6 +120,7 @@ enum arm_smmu_implementation {
> > > >       GENERIC_SMMU,
> > > >       ARM_MMU500,
> > > >       CAVIUM_SMMUV2,
> > > > +     QCOM_SMMUV2,
> > > >  };
> > > >
> > > >  struct arm_smmu_s2cr {
> > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> > > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > > >
> > > > +static const char * const qcom_smmuv2_clks[] = {
> > > > +     "bus", "iface",
> > > > +};
> > > > +
> > > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > > +     .version = ARM_SMMU_V2,
> > > > +     .model = QCOM_SMMUV2,
> > > > +     .clks = qcom_smmuv2_clks,
> > > > +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > > +};
> > >
> > > These seems redundant if we go down the route proposed by Thor, where we
> > > just pull all of the clocks out of the device-tree. In which case, why
> > > do we need this match_data at all?
> >
> > Which is better? Driver relying solely on the device tree to tell
> > which all clocks
> > are required to be enabled,
> > or, the driver deciding itself based on the platform's match data,
> > that it should
> > have X, Y, & Z clocks that should be supplied from the device tree.
>
> The former would simplify the driver, but would also make it
> impossible to spot mistakes in DT, which would ultimately surface out
> as very hard to debug bugs (likely complete system lockups).

Thanks.
Yea, this is how I understand things presently. Relying on device tree
puts the things out of driver's control.

Hi Will,
Am I unable to understand the intentions here for Thor's clock-fetch
design change?

>
> For qcom_smmuv2, I believe we're eventually going to end up with
> platform-specific quirks anyway, so specifying the clocks too wouldn't
> hurt. Given that, I'd recommend sticking to the latter, i.e. what this
> patch does.
>
> Best regards,
> Tomasz


Best regards
Vivek

> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Will Deacon Nov. 23, 2018, 6:34 p.m. UTC | #7
On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
> > <vivek.gautam@codeaurora.org> wrote:
> > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
> > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> > > > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > > > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > > > >
> > > > > +static const char * const qcom_smmuv2_clks[] = {
> > > > > +     "bus", "iface",
> > > > > +};
> > > > > +
> > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > > > +     .version = ARM_SMMU_V2,
> > > > > +     .model = QCOM_SMMUV2,
> > > > > +     .clks = qcom_smmuv2_clks,
> > > > > +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > > > +};
> > > >
> > > > These seems redundant if we go down the route proposed by Thor, where we
> > > > just pull all of the clocks out of the device-tree. In which case, why
> > > > do we need this match_data at all?
> > >
> > > Which is better? Driver relying solely on the device tree to tell
> > > which all clocks
> > > are required to be enabled,
> > > or, the driver deciding itself based on the platform's match data,
> > > that it should
> > > have X, Y, & Z clocks that should be supplied from the device tree.
> >
> > The former would simplify the driver, but would also make it
> > impossible to spot mistakes in DT, which would ultimately surface out
> > as very hard to debug bugs (likely complete system lockups).
> 
> Thanks.
> Yea, this is how I understand things presently. Relying on device tree
> puts the things out of driver's control.

But it also has the undesirable effect of having to update the driver
code whenever we want to add support for a new SMMU implementation. If
we do this all in the DT, as Thor is trying to do, then older kernels
will work well with new hardware.

> Hi Will,
> Am I unable to understand the intentions here for Thor's clock-fetch
> design change?

I'm having trouble parsing your question, sorry. Please work with Thor
so that we have a single way to get the clock information. My preference
is to take it from the firmware, for the reason I stated above.

Will
Will Deacon Nov. 23, 2018, 6:36 p.m. UTC | #8
On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote:
> Hi Will,
> 
> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
> >
> > On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
> > > From: Sricharan R <sricharan@codeaurora.org>
> > >
> > > The smmu device probe/remove and add/remove master device callbacks
> > > gets called when the smmu is not linked to its master, that is without
> > > the context of the master device. So calling runtime apis in those places
> > > separately.
> > > Global locks are also initialized before enabling runtime pm as the
> > > runtime_resume() calls device_reset() which does tlb_sync_global()
> > > that ultimately requires locks to be initialized.
> > >
> > > Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> > > [vivek: Cleanup pm runtime calls]
> > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > > Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> > > Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> > > Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >  drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 91 insertions(+), 10 deletions(-)
> >
> > Given that you're doing the get/put in the TLBI ops unconditionally:
> >
> > >  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> > >  {
> > >       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > >
> > > -     if (smmu_domain->tlb_ops)
> > > +     if (smmu_domain->tlb_ops) {
> > > +             arm_smmu_rpm_get(smmu);
> > >               smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
> > > +             arm_smmu_rpm_put(smmu);
> > > +     }
> > >  }
> > >
> > >  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> > >  {
> > >       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > >
> > > -     if (smmu_domain->tlb_ops)
> > > +     if (smmu_domain->tlb_ops) {
> > > +             arm_smmu_rpm_get(smmu);
> > >               smmu_domain->tlb_ops->tlb_sync(smmu_domain);
> > > +             arm_smmu_rpm_put(smmu);
> > > +     }
> >
> > Why do you need them around the map/unmap calls as well?
> 
> We still have .tlb_add_flush path?

Ok, so we could add the ops around that as well. Right now, we've got
the runtime pm hooks crossing two parts of the API.

Will
Tomasz Figa Nov. 26, 2018, 4:02 a.m. UTC | #9
On Sat, Nov 24, 2018 at 3:34 AM Will Deacon <will.deacon@arm.com> wrote:
>
> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
> > On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
> > > On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
> > > <vivek.gautam@codeaurora.org> wrote:
> > > > On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
> > > > > On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
> > > > > > @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
> > > > > >  ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
> > > > > >  ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> > > > > >
> > > > > > +static const char * const qcom_smmuv2_clks[] = {
> > > > > > +     "bus", "iface",
> > > > > > +};
> > > > > > +
> > > > > > +static const struct arm_smmu_match_data qcom_smmuv2 = {
> > > > > > +     .version = ARM_SMMU_V2,
> > > > > > +     .model = QCOM_SMMUV2,
> > > > > > +     .clks = qcom_smmuv2_clks,
> > > > > > +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
> > > > > > +};
> > > > >
> > > > > These seems redundant if we go down the route proposed by Thor, where we
> > > > > just pull all of the clocks out of the device-tree. In which case, why
> > > > > do we need this match_data at all?
> > > >
> > > > Which is better? Driver relying solely on the device tree to tell
> > > > which all clocks
> > > > are required to be enabled,
> > > > or, the driver deciding itself based on the platform's match data,
> > > > that it should
> > > > have X, Y, & Z clocks that should be supplied from the device tree.
> > >
> > > The former would simplify the driver, but would also make it
> > > impossible to spot mistakes in DT, which would ultimately surface out
> > > as very hard to debug bugs (likely complete system lockups).
> >
> > Thanks.
> > Yea, this is how I understand things presently. Relying on device tree
> > puts the things out of driver's control.
>
> But it also has the undesirable effect of having to update the driver
> code whenever we want to add support for a new SMMU implementation. If
> we do this all in the DT, as Thor is trying to do, then older kernels
> will work well with new hardware.

Fair enough, if you're okay with that. Obviously one would still have
to change the DT bindings to list the exact set of clocks for the new
hardware variant, unless the convention changed recently.

Best regards,
Tomasz
Vivek Gautam Nov. 26, 2018, 6:03 a.m. UTC | #10
On 11/24/2018 12:06 AM, Will Deacon wrote:
> On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote:
>> Hi Will,
>>
>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
>>> On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
>>>> From: Sricharan R <sricharan@codeaurora.org>
>>>>
>>>> The smmu device probe/remove and add/remove master device callbacks
>>>> gets called when the smmu is not linked to its master, that is without
>>>> the context of the master device. So calling runtime apis in those places
>>>> separately.
>>>> Global locks are also initialized before enabling runtime pm as the
>>>> runtime_resume() calls device_reset() which does tlb_sync_global()
>>>> that ultimately requires locks to be initialized.
>>>>
>>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
>>>> [vivek: Cleanup pm runtime calls]
>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>>> Reviewed-by: Tomasz Figa <tfiga@chromium.org>
>>>> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
>>>> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
>>>> ---
>>>>   drivers/iommu/arm-smmu.c | 101 ++++++++++++++++++++++++++++++++++++++++++-----
>>>>   1 file changed, 91 insertions(+), 10 deletions(-)
>>> Given that you're doing the get/put in the TLBI ops unconditionally:
>>>
>>>>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>>>>   {
>>>>        struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>
>>>> -     if (smmu_domain->tlb_ops)
>>>> +     if (smmu_domain->tlb_ops) {
>>>> +             arm_smmu_rpm_get(smmu);
>>>>                smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
>>>> +             arm_smmu_rpm_put(smmu);
>>>> +     }
>>>>   }
>>>>
>>>>   static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>>>>   {
>>>>        struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>
>>>> -     if (smmu_domain->tlb_ops)
>>>> +     if (smmu_domain->tlb_ops) {
>>>> +             arm_smmu_rpm_get(smmu);
>>>>                smmu_domain->tlb_ops->tlb_sync(smmu_domain);
>>>> +             arm_smmu_rpm_put(smmu);
>>>> +     }
>>> Why do you need them around the map/unmap calls as well?
>> We still have .tlb_add_flush path?
> Ok, so we could add the ops around that as well. Right now, we've got
> the runtime pm hooks crossing two parts of the API.

Sure, will do that then, and remove the runtime pm hooks from map/unmap.

Thanks
Vivek
>
> Will
Vivek Gautam Nov. 26, 2018, 10:55 a.m. UTC | #11
On 11/24/2018 12:04 AM, Will Deacon wrote:
> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
>> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
>>> <vivek.gautam@codeaurora.org> wrote:
>>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> wrote:
>>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
>>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
>>>>>>   ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>>>>>>   ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>>>>>>
>>>>>> +static const char * const qcom_smmuv2_clks[] = {
>>>>>> +     "bus", "iface",
>>>>>> +};
>>>>>> +
>>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = {
>>>>>> +     .version = ARM_SMMU_V2,
>>>>>> +     .model = QCOM_SMMUV2,
>>>>>> +     .clks = qcom_smmuv2_clks,
>>>>>> +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
>>>>>> +};
>>>>> These seems redundant if we go down the route proposed by Thor, where we
>>>>> just pull all of the clocks out of the device-tree. In which case, why
>>>>> do we need this match_data at all?
>>>> Which is better? Driver relying solely on the device tree to tell
>>>> which all clocks
>>>> are required to be enabled,
>>>> or, the driver deciding itself based on the platform's match data,
>>>> that it should
>>>> have X, Y, & Z clocks that should be supplied from the device tree.
>>> The former would simplify the driver, but would also make it
>>> impossible to spot mistakes in DT, which would ultimately surface out
>>> as very hard to debug bugs (likely complete system lockups).
>> Thanks.
>> Yea, this is how I understand things presently. Relying on device tree
>> puts the things out of driver's control.
> But it also has the undesirable effect of having to update the driver
> code whenever we want to add support for a new SMMU implementation. If
> we do this all in the DT, as Thor is trying to do, then older kernels
> will work well with new hardware.
>
>> Hi Will,
>> Am I unable to understand the intentions here for Thor's clock-fetch
>> design change?
> I'm having trouble parsing your question, sorry. Please work with Thor
> so that we have a single way to get the clock information. My preference
> is to take it from the firmware, for the reason I stated above.
Hi Will,

Sure, thanks. I will work with Thor to get this going.

Hi Thor,
Does it sound okay to you to squash your patch [1] into my patch [2] with
your 'Signed-off-by' tag?
I will update the commit log to include the information about getting
clock details from device tree.

[1] https://patchwork.kernel.org/patch/10628725/
[2] https://patchwork.kernel.org/patch/10686061/

Best regards
Vivek
>
> Will
Vivek Gautam Nov. 26, 2018, 11:26 a.m. UTC | #12
On 11/26/2018 11:33 AM, Vivek Gautam wrote:
>
>
> On 11/24/2018 12:06 AM, Will Deacon wrote:
>> On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote:
>>> Hi Will,
>>>
>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> 
>>> wrote:
>>>> On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
>>>>> From: Sricharan R <sricharan@codeaurora.org>
>>>>>
>>>>> The smmu device probe/remove and add/remove master device callbacks
>>>>> gets called when the smmu is not linked to its master, that is 
>>>>> without
>>>>> the context of the master device. So calling runtime apis in those 
>>>>> places
>>>>> separately.
>>>>> Global locks are also initialized before enabling runtime pm as the
>>>>> runtime_resume() calls device_reset() which does tlb_sync_global()
>>>>> that ultimately requires locks to be initialized.
>>>>>
>>>>> Signed-off-by: Sricharan R <sricharan@codeaurora.org>
>>>>> [vivek: Cleanup pm runtime calls]
>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>>>>> Reviewed-by: Tomasz Figa <tfiga@chromium.org>
>>>>> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
>>>>> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
>>>>> ---
>>>>>   drivers/iommu/arm-smmu.c | 101 
>>>>> ++++++++++++++++++++++++++++++++++++++++++-----
>>>>>   1 file changed, 91 insertions(+), 10 deletions(-)
>>>> Given that you're doing the get/put in the TLBI ops unconditionally:
>>>>
>>>>>   static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
>>>>>   {
>>>>>        struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>>
>>>>> -     if (smmu_domain->tlb_ops)
>>>>> +     if (smmu_domain->tlb_ops) {
>>>>> +             arm_smmu_rpm_get(smmu);
>>>>> smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
>>>>> +             arm_smmu_rpm_put(smmu);
>>>>> +     }
>>>>>   }
>>>>>
>>>>>   static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
>>>>>   {
>>>>>        struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>>>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>>
>>>>> -     if (smmu_domain->tlb_ops)
>>>>> +     if (smmu_domain->tlb_ops) {
>>>>> +             arm_smmu_rpm_get(smmu);
>>>>> smmu_domain->tlb_ops->tlb_sync(smmu_domain);
>>>>> +             arm_smmu_rpm_put(smmu);
>>>>> +     }
>>>> Why do you need them around the map/unmap calls as well?
>>> We still have .tlb_add_flush path?
>> Ok, so we could add the ops around that as well. Right now, we've got
>> the runtime pm hooks crossing two parts of the API.
>
> Sure, will do that then, and remove the runtime pm hooks from map/unmap.

I missed this earlier -
We are adding runtime pm hooks in the 'iommu_ops' callbacks and not 
really to
'tlb_ops'. So how the runtime pm hooks crossing the paths?
'.map/.unmap' iommu_ops don't call '.flush_iotlb_all' or '.iotlb_sync' 
iommu_ops
anywhere.

E.g., only callers to domain->ops->flush_iotlb_all() are:
iommu_dma_flush_iotlb_all(), or iommu_flush_tlb_all() which are not in 
map/unmap paths.

Regards
Vivek

>
> Thanks
> Vivek
>>
>> Will
>
Thor Thayer Nov. 26, 2018, 2:41 p.m. UTC | #13
Hi Vivek,

On 11/26/18 4:55 AM, Vivek Gautam wrote:
> 
> On 11/24/2018 12:04 AM, Will Deacon wrote:
>> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
>>> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
>>>> <vivek.gautam@codeaurora.org> wrote:
>>>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com> 
>>>>> wrote:
>>>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
>>>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, 
>>>>>>> ARM_SMMU_V1_64K, GENERIC_SMMU);
>>>>>>>   ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>>>>>>>   ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>>>>>>>
>>>>>>> +static const char * const qcom_smmuv2_clks[] = {
>>>>>>> +     "bus", "iface",
>>>>>>> +};
>>>>>>> +
>>>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = {
>>>>>>> +     .version = ARM_SMMU_V2,
>>>>>>> +     .model = QCOM_SMMUV2,
>>>>>>> +     .clks = qcom_smmuv2_clks,
>>>>>>> +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
>>>>>>> +};
>>>>>> These seems redundant if we go down the route proposed by Thor, 
>>>>>> where we
>>>>>> just pull all of the clocks out of the device-tree. In which case, 
>>>>>> why
>>>>>> do we need this match_data at all?
>>>>> Which is better? Driver relying solely on the device tree to tell
>>>>> which all clocks
>>>>> are required to be enabled,
>>>>> or, the driver deciding itself based on the platform's match data,
>>>>> that it should
>>>>> have X, Y, & Z clocks that should be supplied from the device tree.
>>>> The former would simplify the driver, but would also make it
>>>> impossible to spot mistakes in DT, which would ultimately surface out
>>>> as very hard to debug bugs (likely complete system lockups).
>>> Thanks.
>>> Yea, this is how I understand things presently. Relying on device tree
>>> puts the things out of driver's control.
>> But it also has the undesirable effect of having to update the driver
>> code whenever we want to add support for a new SMMU implementation. If
>> we do this all in the DT, as Thor is trying to do, then older kernels
>> will work well with new hardware.
>>
>>> Hi Will,
>>> Am I unable to understand the intentions here for Thor's clock-fetch
>>> design change?
>> I'm having trouble parsing your question, sorry. Please work with Thor
>> so that we have a single way to get the clock information. My preference
>> is to take it from the firmware, for the reason I stated above.
> Hi Will,
> 
> Sure, thanks. I will work with Thor to get this going.
> 
> Hi Thor,
> Does it sound okay to you to squash your patch [1] into my patch [2] with
> your 'Signed-off-by' tag?
> I will update the commit log to include the information about getting
> clock details from device tree.
> 
> [1] https://patchwork.kernel.org/patch/10628725/
> [2] https://patchwork.kernel.org/patch/10686061/
> 

Yes, that would be great and easier to understand than my patch on top 
of yours.

Additionally, can you remove the "Error:" as Will requested as part of 
the squash?

Thank you!

Thor

> Best regards
> Vivek
>>
>> Will
> 
>
Vivek Gautam Nov. 26, 2018, 5:55 p.m. UTC | #14
Hi Thor,


On 11/26/2018 8:11 PM, Thor Thayer wrote:
> Hi Vivek,
>
> On 11/26/18 4:55 AM, Vivek Gautam wrote:
>>
>> On 11/24/2018 12:04 AM, Will Deacon wrote:
>>> On Fri, Nov 23, 2018 at 03:06:29PM +0530, Vivek Gautam wrote:
>>>> On Fri, Nov 23, 2018 at 2:52 PM Tomasz Figa <tfiga@chromium.org> 
>>>> wrote:
>>>>> On Fri, Nov 23, 2018 at 6:13 PM Vivek Gautam
>>>>> <vivek.gautam@codeaurora.org> wrote:
>>>>>> On Wed, Nov 21, 2018 at 11:09 PM Will Deacon 
>>>>>> <will.deacon@arm.com> wrote:
>>>>>>> On Fri, Nov 16, 2018 at 04:54:30PM +0530, Vivek Gautam wrote:
>>>>>>>> @@ -2026,6 +2027,17 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, 
>>>>>>>> ARM_SMMU_V1_64K, GENERIC_SMMU);
>>>>>>>>   ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
>>>>>>>>   ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
>>>>>>>>
>>>>>>>> +static const char * const qcom_smmuv2_clks[] = {
>>>>>>>> +     "bus", "iface",
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static const struct arm_smmu_match_data qcom_smmuv2 = {
>>>>>>>> +     .version = ARM_SMMU_V2,
>>>>>>>> +     .model = QCOM_SMMUV2,
>>>>>>>> +     .clks = qcom_smmuv2_clks,
>>>>>>>> +     .num_clks = ARRAY_SIZE(qcom_smmuv2_clks),
>>>>>>>> +};
>>>>>>> These seems redundant if we go down the route proposed by Thor, 
>>>>>>> where we
>>>>>>> just pull all of the clocks out of the device-tree. In which 
>>>>>>> case, why
>>>>>>> do we need this match_data at all?
>>>>>> Which is better? Driver relying solely on the device tree to tell
>>>>>> which all clocks
>>>>>> are required to be enabled,
>>>>>> or, the driver deciding itself based on the platform's match data,
>>>>>> that it should
>>>>>> have X, Y, & Z clocks that should be supplied from the device tree.
>>>>> The former would simplify the driver, but would also make it
>>>>> impossible to spot mistakes in DT, which would ultimately surface out
>>>>> as very hard to debug bugs (likely complete system lockups).
>>>> Thanks.
>>>> Yea, this is how I understand things presently. Relying on device tree
>>>> puts the things out of driver's control.
>>> But it also has the undesirable effect of having to update the driver
>>> code whenever we want to add support for a new SMMU implementation. If
>>> we do this all in the DT, as Thor is trying to do, then older kernels
>>> will work well with new hardware.
>>>
>>>> Hi Will,
>>>> Am I unable to understand the intentions here for Thor's clock-fetch
>>>> design change?
>>> I'm having trouble parsing your question, sorry. Please work with Thor
>>> so that we have a single way to get the clock information. My 
>>> preference
>>> is to take it from the firmware, for the reason I stated above.
>> Hi Will,
>>
>> Sure, thanks. I will work with Thor to get this going.
>>
>> Hi Thor,
>> Does it sound okay to you to squash your patch [1] into my patch [2] 
>> with
>> your 'Signed-off-by' tag?
>> I will update the commit log to include the information about getting
>> clock details from device tree.
>>
>> [1] https://patchwork.kernel.org/patch/10628725/
>> [2] https://patchwork.kernel.org/patch/10686061/
>>
>
> Yes, that would be great and easier to understand than my patch on top 
> of yours.
>
> Additionally, can you remove the "Error:" as Will requested as part of 
> the squash?

Thanks for your consent. I have reworked the patch today, and have 
addressed Will's
comment. I will give a try on the board and post it by tomorrow.

Best regards
Vivek

>
> Thank you!
>
> Thor
>
>> Best regards
>> Vivek
>>>
>>> Will
>>
>>
>
Will Deacon Nov. 26, 2018, 7:31 p.m. UTC | #15
On Mon, Nov 26, 2018 at 04:56:42PM +0530, Vivek Gautam wrote:
> On 11/26/2018 11:33 AM, Vivek Gautam wrote:
> >On 11/24/2018 12:06 AM, Will Deacon wrote:
> >>On Thu, Nov 22, 2018 at 05:32:24PM +0530, Vivek Gautam wrote:
> >>>On Wed, Nov 21, 2018 at 11:09 PM Will Deacon <will.deacon@arm.com>
> >>>wrote:
> >>>>On Fri, Nov 16, 2018 at 04:54:27PM +0530, Vivek Gautam wrote:
> >>>>>From: Sricharan R <sricharan@codeaurora.org>
> >>>>>
> >>>>>The smmu device probe/remove and add/remove master device callbacks
> >>>>>gets called when the smmu is not linked to its master, that is
> >>>>>without
> >>>>>the context of the master device. So calling runtime apis in those
> >>>>>places
> >>>>>separately.
> >>>>>Global locks are also initialized before enabling runtime pm as the
> >>>>>runtime_resume() calls device_reset() which does tlb_sync_global()
> >>>>>that ultimately requires locks to be initialized.
> >>>>>
> >>>>>Signed-off-by: Sricharan R <sricharan@codeaurora.org>
> >>>>>[vivek: Cleanup pm runtime calls]
> >>>>>Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> >>>>>Reviewed-by: Tomasz Figa <tfiga@chromium.org>
> >>>>>Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> >>>>>Reviewed-by: Robin Murphy <robin.murphy@arm.com>
> >>>>>---
> >>>>>  drivers/iommu/arm-smmu.c | 101
> >>>>>++++++++++++++++++++++++++++++++++++++++++-----
> >>>>>  1 file changed, 91 insertions(+), 10 deletions(-)
> >>>>Given that you're doing the get/put in the TLBI ops unconditionally:
> >>>>
> >>>>>  static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain)
> >>>>>  {
> >>>>>       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>+     struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>>>>
> >>>>>-     if (smmu_domain->tlb_ops)
> >>>>>+     if (smmu_domain->tlb_ops) {
> >>>>>+             arm_smmu_rpm_get(smmu);
> >>>>>smmu_domain->tlb_ops->tlb_flush_all(smmu_domain);
> >>>>>+             arm_smmu_rpm_put(smmu);
> >>>>>+     }
> >>>>>  }
> >>>>>
> >>>>>  static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
> >>>>>  {
> >>>>>       struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>>>>+     struct arm_smmu_device *smmu = smmu_domain->smmu;
> >>>>>
> >>>>>-     if (smmu_domain->tlb_ops)
> >>>>>+     if (smmu_domain->tlb_ops) {
> >>>>>+             arm_smmu_rpm_get(smmu);
> >>>>>smmu_domain->tlb_ops->tlb_sync(smmu_domain);
> >>>>>+             arm_smmu_rpm_put(smmu);
> >>>>>+     }
> >>>>Why do you need them around the map/unmap calls as well?
> >>>We still have .tlb_add_flush path?
> >>Ok, so we could add the ops around that as well. Right now, we've got
> >>the runtime pm hooks crossing two parts of the API.
> >
> >Sure, will do that then, and remove the runtime pm hooks from map/unmap.
> 
> I missed this earlier -
> We are adding runtime pm hooks in the 'iommu_ops' callbacks and not really
> to
> 'tlb_ops'. So how the runtime pm hooks crossing the paths?
> '.map/.unmap' iommu_ops don't call '.flush_iotlb_all' or '.iotlb_sync'
> iommu_ops
> anywhere.
> 
> E.g., only callers to domain->ops->flush_iotlb_all() are:
> iommu_dma_flush_iotlb_all(), or iommu_flush_tlb_all() which are not in
> map/unmap paths.

Yes, sorry, I got confused here and completely misled you. In which case,
your original patch is ok because it intercepts the core IOMMU API via
iommu_ops. Apologies.

At that level, should we also annotate arm_smmu_iova_to_phys_hard()
for the iova_to_phys() implementation?

With that detail and clock bits sorted out, we should be able to get this
queued at last.

Will