diff mbox series

OPP: Fix support for required OPPs for multiple PM domains

Message ID 20240618155013.323322-1-ulf.hansson@linaro.org
State Handled Elsewhere
Headers show
Series OPP: Fix support for required OPPs for multiple PM domains | expand

Commit Message

Ulf Hansson June 18, 2024, 3:50 p.m. UTC
In _set_opp() we are normally bailing out when trying to set an OPP that is
the current one. This make perfect sense, but becomes a problem when
_set_required_opps() calls it recursively.

More precisely, when a required OPP is being shared by multiple PM domains,
we end up skipping to request the corresponding performance-state for all
of the PM domains, but the first one. Let's fix the problem, by calling
_set_opp_level() from _set_required_opps() instead.

Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs")
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/opp/core.c | 47 +++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

Comments

Viresh Kumar June 25, 2024, 10:54 a.m. UTC | #1
On 18-06-24, 17:50, Ulf Hansson wrote:
> In _set_opp() we are normally bailing out when trying to set an OPP that is
> the current one. This make perfect sense, but becomes a problem when
> _set_required_opps() calls it recursively.
> 
> More precisely, when a required OPP is being shared by multiple PM domains,
> we end up skipping to request the corresponding performance-state for all
> of the PM domains, but the first one. Let's fix the problem, by calling
> _set_opp_level() from _set_required_opps() instead.
> 
> Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs")
> Cc: stable@vger.kernel.org
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/opp/core.c | 47 +++++++++++++++++++++++-----------------------
>  1 file changed, 24 insertions(+), 23 deletions(-)
 
>  /* This is only called for PM domain for now */
>  static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
>  			      struct dev_pm_opp *opp, bool up)
> @@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
>  		if (devs[index]) {
>  			required_opp = opp ? opp->required_opps[index] : NULL;
>  
> -			ret = dev_pm_opp_set_opp(devs[index], required_opp);
> +			ret = _set_opp_level(devs[index], opp_table,
> +					     required_opp);

No, we won't be doing this I guess. Its like going back instead of
moving forward :)

The required OPPs is not just a performance domain thing, but
specially with devs[] here, it can be used to set OPP for any device
type and so dev_pm_opp_set_opp() is the right call here.

Coming back to the problem you are pointing to, I am not very clear of
the whole picture here. Can you please help me get some details on
that ?

From what I understand, you have a device which has multiple power
domains. Now all these power domains share the same OPP table in the
device tree (i.e. to avoid duplication of tables only). Is that right
?

Even if in DT we have the same OPP table for all the domains, the OPP
core will have separate OPP tables structures (as the domains aren't
connected). And these OPP tables will have their own `current_opp`
fields and so we shouldn't really bail out earlier.

Maybe there is a bug somewhere that is causing it. Maybe I can look at
the DT to find the issue ? (Hint: The OPP table shouldn't have the
`shared` flag set).

Maybe I completely misunderstood the whole thing :)
Ulf Hansson June 29, 2024, 9:09 a.m. UTC | #2
On Tue, 25 Jun 2024 at 12:54, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-06-24, 17:50, Ulf Hansson wrote:
> > In _set_opp() we are normally bailing out when trying to set an OPP that is
> > the current one. This make perfect sense, but becomes a problem when
> > _set_required_opps() calls it recursively.
> >
> > More precisely, when a required OPP is being shared by multiple PM domains,
> > we end up skipping to request the corresponding performance-state for all
> > of the PM domains, but the first one. Let's fix the problem, by calling
> > _set_opp_level() from _set_required_opps() instead.
> >
> > Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  drivers/opp/core.c | 47 +++++++++++++++++++++++-----------------------
> >  1 file changed, 24 insertions(+), 23 deletions(-)
>
> >  /* This is only called for PM domain for now */
> >  static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
> >                             struct dev_pm_opp *opp, bool up)
> > @@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
> >               if (devs[index]) {
> >                       required_opp = opp ? opp->required_opps[index] : NULL;
> >
> > -                     ret = dev_pm_opp_set_opp(devs[index], required_opp);
> > +                     ret = _set_opp_level(devs[index], opp_table,
> > +                                          required_opp);
>
> No, we won't be doing this I guess. Its like going back instead of
> moving forward :)
>
> The required OPPs is not just a performance domain thing, but
> specially with devs[] here, it can be used to set OPP for any device
> type and so dev_pm_opp_set_opp() is the right call here.
>
> Coming back to the problem you are pointing to, I am not very clear of
> the whole picture here. Can you please help me get some details on
> that ?

I get your point, but I am not sure I agree with it.

For the required-opps, the only existing use case is power/perf
domains with performance-states, so why make the code more complicated
than it needs to be?

>
> From what I understand, you have a device which has multiple power
> domains. Now all these power domains share the same OPP table in the
> device tree (i.e. to avoid duplication of tables only). Is that right
> ?

No, that's not correct. Let me try to elaborate on my setup, which is
very similar to a use case on a Tegra platform.

...

pd_perf0: pd-perf0 {
    #power-domain-cells = <0>;
    operating-points-v2 = <&opp_table_pd_perf0>;
};

//Note: no opp-table
pd_power4: pd-power4 {
    #power-domain-cells = <0>;
     power-domains = <&pd_perf0>;
};

//Note: no opp-table
pd_power5: pd-power5 {
     #power-domain-cells = <0>;
     power-domains = <&pd_perf0>;
};

//Note: The opp_table_pm_test10 are having required-opps pointing to
pd_perf0's opp-table.
pm_test10 {
    ...
    power-domains = <&pd_power4>, <&pd_power5>;
    power-domain-names = "perf4", "perf5";
    operating-points-v2 = <&opp_table_pm_test10>;
};

...

>
> Even if in DT we have the same OPP table for all the domains, the OPP
> core will have separate OPP tables structures (as the domains aren't
> connected). And these OPP tables will have their own `current_opp`
> fields and so we shouldn't really bail out earlier.

In the use case above, we end up never voting on pd_power5.

>
> Maybe there is a bug somewhere that is causing it. Maybe I can look at
> the DT to find the issue ? (Hint: The OPP table shouldn't have the
> `shared` flag set).
>
> Maybe I completely misunderstood the whole thing :)

The DT parsing of the required-opps is already complicated and there
seems to be endless new corner-cases showing up. Maybe we can fix this
too, but perhaps we should simply take a step back and go for
simplifications instead?

Kind regards
Uffe
Viresh Kumar July 1, 2024, 11:47 a.m. UTC | #3
On 29-06-24, 11:09, Ulf Hansson wrote:
> I get your point, but I am not sure I agree with it.
> 
> For the required-opps, the only existing use case is power/perf
> domains with performance-states, so why make the code more complicated
> than it needs to be?

That is a fair argument generally, i.e. keep things as simple as we
can, but this is a bit different. We are talking about setting the
(required) OPP for a device (parent genpd) here and it should follow
the full path.

Even in case of genpds we may want to configure more properties and
not just vote, like bandwidth, regulator, clk, etc. And so I would
really like to set the OPP in a standard way, no matter what.

> No, that's not correct. Let me try to elaborate on my setup, which is
> very similar to a use case on a Tegra platform.

Thanks, I wasn't thinking about this setup earlier.

> pd_perf0: pd-perf0 {
>     #power-domain-cells = <0>;
>     operating-points-v2 = <&opp_table_pd_perf0>;
> };
> 
> //Note: no opp-table
> pd_power4: pd-power4 {
>     #power-domain-cells = <0>;
>      power-domains = <&pd_perf0>;
> };
> 
> //Note: no opp-table
> pd_power5: pd-power5 {
>      #power-domain-cells = <0>;
>      power-domains = <&pd_perf0>;
> };
> 
> //Note: The opp_table_pm_test10 are having required-opps pointing to
> pd_perf0's opp-table.
> pm_test10 {
>     ...
>     power-domains = <&pd_power4>, <&pd_power5>;
>     power-domain-names = "perf4", "perf5";
>     operating-points-v2 = <&opp_table_pm_test10>;
> };


> In the use case above, we end up never voting on pd_power5.
 
> The DT parsing of the required-opps is already complicated and there
> seems to be endless new corner-cases showing up. Maybe we can fix this
> too, but perhaps we should simply take a step back and go for
> simplifications instead?

I truly believe that keeping a standard way of updating OPPs is the
right way to go and that will only prevent complicated corner cases
coming later on.

What about this patch instead ?

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 5f4598246a87..2086292f8355 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
 		if (devs[index]) {
 			required_opp = opp ? opp->required_opps[index] : NULL;
 
-			ret = dev_pm_opp_set_opp(devs[index], required_opp);
+			/* Set required OPPs forcefully */
+			ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true);
 			if (ret)
 				return ret;
 		}
@@ -1365,17 +1366,8 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_set_rate);
 
-/**
- * dev_pm_opp_set_opp() - Configure device for OPP
- * @dev: device for which we do this operation
- * @opp: OPP to set to
- *
- * This configures the device based on the properties of the OPP passed to this
- * routine.
- *
- * Return: 0 on success, a negative error number otherwise.
- */
-int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp)
+static int dev_pm_opp_set_opp_forced(struct device *dev, struct dev_pm_opp *opp,
+				     bool forced)
 {
 	struct opp_table *opp_table;
 	int ret;
@@ -1386,11 +1378,25 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp)
 		return PTR_ERR(opp_table);
 	}
 
-	ret = _set_opp(dev, opp_table, opp, NULL, false);
+	ret = _set_opp(dev, opp_table, opp, NULL, forced);
 	dev_pm_opp_put_opp_table(opp_table);
 
 	return ret;
 }
+/**
+ * dev_pm_opp_set_opp() - Configure device for OPP
+ * @dev: device for which we do this operation
+ * @opp: OPP to set to
+ *
+ * This configures the device based on the properties of the OPP passed to this
+ * routine.
+ *
+ * Return: 0 on success, a negative error number otherwise.
+ */
+int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp)
+{
+	return dev_pm_opp_set_opp_forced(dev, opp, false);
+}
 EXPORT_SYMBOL_GPL(dev_pm_opp_set_opp);
 
 /* OPP-dev Helpers */
Viresh Kumar July 2, 2024, 5:15 a.m. UTC | #4
On 01-07-24, 17:17, Viresh Kumar wrote:
> What about this patch instead ?
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 5f4598246a87..2086292f8355 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
>  		if (devs[index]) {
>  			required_opp = opp ? opp->required_opps[index] : NULL;
>  
> -			ret = dev_pm_opp_set_opp(devs[index], required_opp);
> +			/* Set required OPPs forcefully */
> +			ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true);

Maybe better to do just this instead:

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 5f4598246a87..9484acbeaa66 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1386,7 +1386,12 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp)
                return PTR_ERR(opp_table);
        }

-       ret = _set_opp(dev, opp_table, opp, NULL, false);
+       /*
+        * For a genpd's OPP table, we always want to set the OPP (and
+        * performance level) and let the genpd core take care of aggregating
+        * the votes. Set `forced` to true for a genpd here.
+        */
+       ret = _set_opp(dev, opp_table, opp, NULL, opp_table->is_genpd);
        dev_pm_opp_put_opp_table(opp_table);

        return ret;
Ulf Hansson July 10, 2024, 1:51 p.m. UTC | #5
On Tue, 2 Jul 2024 at 07:15, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 01-07-24, 17:17, Viresh Kumar wrote:
> > What about this patch instead ?
> >
> > diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> > index 5f4598246a87..2086292f8355 100644
> > --- a/drivers/opp/core.c
> > +++ b/drivers/opp/core.c
> > @@ -1091,7 +1091,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
> >               if (devs[index]) {
> >                       required_opp = opp ? opp->required_opps[index] : NULL;
> >
> > -                     ret = dev_pm_opp_set_opp(devs[index], required_opp);
> > +                     /* Set required OPPs forcefully */
> > +                     ret = dev_pm_opp_set_opp_forced(devs[index], required_opp, true);
>
> Maybe better to do just this instead:
>
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 5f4598246a87..9484acbeaa66 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1386,7 +1386,12 @@ int dev_pm_opp_set_opp(struct device *dev, struct dev_pm_opp *opp)
>                 return PTR_ERR(opp_table);
>         }
>
> -       ret = _set_opp(dev, opp_table, opp, NULL, false);
> +       /*
> +        * For a genpd's OPP table, we always want to set the OPP (and
> +        * performance level) and let the genpd core take care of aggregating
> +        * the votes. Set `forced` to true for a genpd here.
> +        */
> +       ret = _set_opp(dev, opp_table, opp, NULL, opp_table->is_genpd);
>         dev_pm_opp_put_opp_table(opp_table);

I think this should work, but in this case we seem to need a similar
thing for dev_pm_opp_set_rate().

Another option is to let _set_opp() check "opp_table->is_genpd" and
enforce the opp to be set in that case. Whatever you prefer, I can
re-spin the patch.

Kind regards
Uffe
Viresh Kumar July 11, 2024, 3:13 a.m. UTC | #6
On 10-07-24, 15:51, Ulf Hansson wrote:
> I think this should work, but in this case we seem to need a similar
> thing for dev_pm_opp_set_rate().

We don't go to that path for genpd's I recall. Do we ? For genpd's,
since there is no freq, we always call _set_opp().
Ulf Hansson July 11, 2024, 10:31 a.m. UTC | #7
On Thu, 11 Jul 2024 at 05:13, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 10-07-24, 15:51, Ulf Hansson wrote:
> > I think this should work, but in this case we seem to need a similar
> > thing for dev_pm_opp_set_rate().
>
> We don't go to that path for genpd's I recall. Do we ? For genpd's,
> since there is no freq, we always call _set_opp().

You are right! Although, maybe it's still preferred to do it in
_set_opp() as it looks like the code would be more consistent? No?

Kind regards
Uffe
Ulf Hansson July 11, 2024, 11:05 a.m. UTC | #8
On Thu, 11 Jul 2024 at 12:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 11 Jul 2024 at 05:13, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 10-07-24, 15:51, Ulf Hansson wrote:
> > > I think this should work, but in this case we seem to need a similar
> > > thing for dev_pm_opp_set_rate().
> >
> > We don't go to that path for genpd's I recall. Do we ? For genpd's,
> > since there is no freq, we always call _set_opp().
>
> You are right! Although, maybe it's still preferred to do it in
> _set_opp() as it looks like the code would be more consistent? No?

No matter how we do this, we end up enforcing OPPs for genpds.

It means that we may be requesting the same performance-level that we
have already requested for the device. Of course genpd manages this,
but it just seems a bit in-efficient to mee. Or maybe this isn't a big
deal as consumer drivers should end up doing this anyway?

Kind regards
Uffe
Viresh Kumar July 11, 2024, 1:16 p.m. UTC | #9
On 11-07-24, 13:05, Ulf Hansson wrote:
> On Thu, 11 Jul 2024 at 12:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 11 Jul 2024 at 05:13, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 10-07-24, 15:51, Ulf Hansson wrote:
> > > > I think this should work, but in this case we seem to need a similar
> > > > thing for dev_pm_opp_set_rate().
> > >
> > > We don't go to that path for genpd's I recall. Do we ? For genpd's,
> > > since there is no freq, we always call _set_opp().
> >
> > You are right! Although, maybe it's still preferred to do it in
> > _set_opp() as it looks like the code would be more consistent? No?

Since the function already accepted a flag, it was very easier to just reuse it
without.

> No matter how we do this, we end up enforcing OPPs for genpds.
> 
> It means that we may be requesting the same performance-level that we
> have already requested for the device. Of course genpd manages this,
> but it just seems a bit in-efficient to mee. Or maybe this isn't a big
> deal as consumer drivers should end up doing this anyway?

Normally I won't expect a consumer driver to do this check and so was the
opp core handling that. But for genpd's we need to make this inefficient to not
miss a vote.

The problem is at another level though. Normally for any other device, like CPU,
there is one vote for the entire range of devices supported by the OPP table.
For example all CPUs of a cluster will share an OPP table (and they do dvfs
together), and you call set_opp() for any of the CPU, we will go and change the
OPP. There is no per-device vote.

This whole design is broken in case of genpd, since you are expecting a separate
vote per device. Ideally, each device should have had its own copy of the OPP
table, but it is messy in case of genpd and to make it all work nicely, we may
have to choose this inefficient way of doing it :(
Ulf Hansson July 11, 2024, 3:25 p.m. UTC | #10
On Thu, 11 Jul 2024 at 15:16, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 11-07-24, 13:05, Ulf Hansson wrote:
> > On Thu, 11 Jul 2024 at 12:31, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 11 Jul 2024 at 05:13, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > >
> > > > On 10-07-24, 15:51, Ulf Hansson wrote:
> > > > > I think this should work, but in this case we seem to need a similar
> > > > > thing for dev_pm_opp_set_rate().
> > > >
> > > > We don't go to that path for genpd's I recall. Do we ? For genpd's,
> > > > since there is no freq, we always call _set_opp().
> > >
> > > You are right! Although, maybe it's still preferred to do it in
> > > _set_opp() as it looks like the code would be more consistent? No?
>
> Since the function already accepted a flag, it was very easier to just reuse it
> without.
>
> > No matter how we do this, we end up enforcing OPPs for genpds.
> >
> > It means that we may be requesting the same performance-level that we
> > have already requested for the device. Of course genpd manages this,
> > but it just seems a bit in-efficient to mee. Or maybe this isn't a big
> > deal as consumer drivers should end up doing this anyway?
>
> Normally I won't expect a consumer driver to do this check and so was the
> opp core handling that. But for genpd's we need to make this inefficient to not
> miss a vote.
>
> The problem is at another level though. Normally for any other device, like CPU,
> there is one vote for the entire range of devices supported by the OPP table.
> For example all CPUs of a cluster will share an OPP table (and they do dvfs
> together), and you call set_opp() for any of the CPU, we will go and change the
> OPP. There is no per-device vote.
>
> This whole design is broken in case of genpd, since you are expecting a separate
> vote per device. Ideally, each device should have had its own copy of the OPP
> table, but it is messy in case of genpd and to make it all work nicely, we may
> have to choose this inefficient way of doing it :(

Right, I get your point.

Although, it seems to me that just limiting required-opps to
performance-levels, could avoid us from having to enforce the OPPs for
genpd. In other words, doing something along the lines of $subject
patch should work fine.

In fact, it looks to me that the required-opps handling for the
*single* PM domain case, is already limited to solely
performance-levels (opp-level), as there are no required_devs being
assigned for it. Or did I get that wrong?

Kind regards
Uffe
Viresh Kumar July 18, 2024, 3:05 a.m. UTC | #11
On 11-07-24, 17:25, Ulf Hansson wrote:
> Right, I get your point.
> 
> Although, it seems to me that just limiting required-opps to
> performance-levels, could avoid us from having to enforce the OPPs for
> genpd. In other words, doing something along the lines of $subject
> patch should work fine.

I really don't want to design the code that way. Required OPPs don't
have anything to do with a genpd. Genpd is just one of the possible
use cases and I would like the code to reflect it, even if we don't
have any other users for this kind of stuff for now, but we surely
can. Just that those problems are solved differently for now. For
example, cache DVFS along with CPUs, etc.

And as I said earlier, it is entirely possible that the genpd OPP
table wants to configure few more things apart from just level, and
hence a full fledged set-opp is a better design choice.

> In fact, it looks to me that the required-opps handling for the
> *single* PM domain case, is already limited to solely
> performance-levels (opp-level), as there are no required_devs being
> assigned for it. Or did I get that wrong?

That's why the API for setting required-opps was introduced, to make
it a central point of entry for all use cases where we want to attach
a device.
Ulf Hansson July 18, 2024, 10:38 a.m. UTC | #12
On Thu, 18 Jul 2024 at 05:06, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 11-07-24, 17:25, Ulf Hansson wrote:
> > Right, I get your point.
> >
> > Although, it seems to me that just limiting required-opps to
> > performance-levels, could avoid us from having to enforce the OPPs for
> > genpd. In other words, doing something along the lines of $subject
> > patch should work fine.
>
> I really don't want to design the code that way. Required OPPs don't
> have anything to do with a genpd. Genpd is just one of the possible
> use cases and I would like the code to reflect it, even if we don't
> have any other users for this kind of stuff for now, but we surely
> can. Just that those problems are solved differently for now. For
> example, cache DVFS along with CPUs, etc.
>
> And as I said earlier, it is entirely possible that the genpd OPP
> table wants to configure few more things apart from just level, and
> hence a full fledged set-opp is a better design choice.

I understand your point, but we don't need to call
dev_pm_opp_set_opp() from _set_required_opps() to accomplish this. In
fact, I have realized that calling dev_pm_opp_set_opp() from there
doesn't work the way we expected.

More precisely, at each recursive call to dev_pm_opp_set_opp() we are
changing the OPP for a genpd's OPP table for a device that has been
attached to it. The problem with this, is that we may have several
devices being attached to the same genpd, thus sharing the same
OPP-table that is being used for their required OPPs. So, we may have
several active requests for different OPPs for a genpd's OPP table
simultaneously. It seems wrong from the OPP library point of view. To
me, it's would be better to leave any kind of aggregation to be
managed by genpd itself.

For the reason explained above, it still looks correct to call
_set_opp_level() from _set_required_opps(), as $subject patch proposes
(but clarifications of why is certainly needed in the commit message).
Moreover, when/if we see a need for additonal resourses but the level,
to be managed through a genpd's OPP table, we can extend
_set_required_opps() to cover that too.

>
> > In fact, it looks to me that the required-opps handling for the
> > *single* PM domain case, is already limited to solely
> > performance-levels (opp-level), as there are no required_devs being
> > assigned for it. Or did I get that wrong?
>
> That's why the API for setting required-opps was introduced, to make
> it a central point of entry for all use cases where we want to attach
> a device.

The API as such isn't the problem, but rather that we are recursivly
calling dev_pm_opp_set_opp() for the required-devs.

In the single PM domain case, this would simply not work, as there is
not a separate virtual device we can assign to the required-dev to.

Kind regards
Uffe
Viresh Kumar July 25, 2024, 6:02 a.m. UTC | #13
On 18-07-24, 12:38, Ulf Hansson wrote:
> I understand your point, but we don't need to call
> dev_pm_opp_set_opp() from _set_required_opps() to accomplish this.

I _strongly_ feel that the OPP core should be doing what other frameworks, like
clk, regulator, genpd, are doing in this case. Call recursively.

> In fact, I have realized that calling dev_pm_opp_set_opp() from there
> doesn't work the way we expected.
>
> More precisely, at each recursive call to dev_pm_opp_set_opp() we are
> changing the OPP for a genpd's OPP table for a device that has been
> attached to it. The problem with this, is that we may have several
> devices being attached to the same genpd, thus sharing the same
> OPP-table that is being used for their required OPPs. So, we may have
> several active requests for different OPPs for a genpd's OPP table
> simultaneously. It seems wrong from the OPP library point of view. To
> me, it's would be better to leave any kind of aggregation to be
> managed by genpd itself.

Right. I see this problem too and that's why I said earlier that OPP core was
designed for a different use case and genpd doesn't fit perfectly. Though I
don't see how several calls the dev_pm_opp_set_opp() simultaneously is a
problem. This can happen without recursive calling too, where simultaneous calls
for genpds occur.

I think the main problem here, on how genpd doesn't fit with OPP core, is that
the OPP core is trying to do some sort of aggregation generally at its level,
like avoiding a change of OPP if the OPP is same. I think the right way to fix
this is by not doing any aggregation at OPP core level and genpd handle it all.
Which you are also aligned with I guess. This would also mean that OPP core
shouldn't try configuring, clk, regulator, bandwidth, etc for a genpd. The Genpd
core should handle that, else we may end up incorrectly configuring things.

I guess this is what you were trying to say as well, when you were trying to
replace the recursive call with set-level only.

I think, we don't need that change but rather avoid all these extra settings
from dev_pm_opp_set_opp() itself.

Also consider that genpd configuration doesn't only happen with recursive call,
but can happen with a call to dev_pm_opp_set_opp() directly too for the genpd.

> The API as such isn't the problem, but rather that we are recursivly
> calling dev_pm_opp_set_opp() for the required-devs.

I think that design is rather correct, just like other frameworks. Just that we
need to do only set-level for genpds and nothing else. That will have exactly
the same behavior that you want.

> In the single PM domain case, this would simply not work, as there is
> not a separate virtual device we can assign to the required-dev to.

We can assign the real device in that case, why is that a problem ?
Ulf Hansson July 25, 2024, 9:21 a.m. UTC | #14
On Thu, 25 Jul 2024 at 08:02, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-07-24, 12:38, Ulf Hansson wrote:
> > I understand your point, but we don't need to call
> > dev_pm_opp_set_opp() from _set_required_opps() to accomplish this.
>
> I _strongly_ feel that the OPP core should be doing what other frameworks, like
> clk, regulator, genpd, are doing in this case. Call recursively.
>
> > In fact, I have realized that calling dev_pm_opp_set_opp() from there
> > doesn't work the way we expected.
> >
> > More precisely, at each recursive call to dev_pm_opp_set_opp() we are
> > changing the OPP for a genpd's OPP table for a device that has been
> > attached to it. The problem with this, is that we may have several
> > devices being attached to the same genpd, thus sharing the same
> > OPP-table that is being used for their required OPPs. So, we may have
> > several active requests for different OPPs for a genpd's OPP table
> > simultaneously. It seems wrong from the OPP library point of view. To
> > me, it's would be better to leave any kind of aggregation to be
> > managed by genpd itself.
>
> Right. I see this problem too and that's why I said earlier that OPP core was
> designed for a different use case and genpd doesn't fit perfectly. Though I
> don't see how several calls the dev_pm_opp_set_opp() simultaneously is a
> problem. This can happen without recursive calling too, where simultaneous calls
> for genpds occur.

Right.

The main issue in regards to the above, is that we may end up trying
to vote for different devices, which votes correspond to the same
OPP/OPP-table. The one that comes first will request the OPP, the
other ones will be ignored as the OPP core thinks there is no reason
to already set the current OPP.

>
> I think the main problem here, on how genpd doesn't fit with OPP core, is that
> the OPP core is trying to do some sort of aggregation generally at its level,
> like avoiding a change of OPP if the OPP is same. I think the right way to fix
> this is by not doing any aggregation at OPP core level and genpd handle it all.
> Which you are also aligned with I guess. This would also mean that OPP core
> shouldn't try configuring, clk, regulator, bandwidth, etc for a genpd. The Genpd
> core should handle that, else we may end up incorrectly configuring things.
>
> I guess this is what you were trying to say as well, when you were trying to
> replace the recursive call with set-level only.

Right, I think we are in agreement. Aggregation of the
*performance-state* (opp-level) needs to be managed by genpd, solely.

>
> I think, we don't need that change but rather avoid all these extra settings
> from dev_pm_opp_set_opp() itself.
>
> Also consider that genpd configuration doesn't only happen with recursive call,
> but can happen with a call to dev_pm_opp_set_opp() directly too for the genpd.

Right.

>
> > The API as such isn't the problem, but rather that we are recursivly
> > calling dev_pm_opp_set_opp() for the required-devs.
>
> I think that design is rather correct, just like other frameworks. Just that we
> need to do only set-level for genpds and nothing else. That will have exactly
> the same behavior that you want.

I don't quite understand what you are proposing. Do you want to add a
separate path for opp-levels?

The problem with that would be that platforms (Tegra at least) are
already using a combination of opp-level and clocks.

>
> > In the single PM domain case, this would simply not work, as there is
> > not a separate virtual device we can assign to the required-dev to.
>
> We can assign the real device in that case, why is that a problem ?

To be able to call dev_pm_opp_set_opp() on the required-dev (which
would be the real device in this case), we need to add it to genpd's
OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().

The problem with this, is that the real device already has its own OPP
table (with the required-OPPs pointing to genpd's OPP table), which
means that we would end up adding the device to two different OPP
tables.

Kind regards
Uffe
Viresh Kumar July 25, 2024, 11:25 a.m. UTC | #15
On 25-07-24, 11:21, Ulf Hansson wrote:
> Right.
> 
> The main issue in regards to the above, is that we may end up trying
> to vote for different devices, which votes correspond to the same
> OPP/OPP-table. The one that comes first will request the OPP, the
> other ones will be ignored as the OPP core thinks there is no reason
> to already set the current OPP.

Right, but that won't happen with the diff I shared earlier where we set
"forced" to true. Isn't it ?

> > I think that design is rather correct, just like other frameworks. Just that we
> > need to do only set-level for genpds and nothing else. That will have exactly
> > the same behavior that you want.
> 
> I don't quite understand what you are proposing. Do you want to add a
> separate path for opp-levels?

Not separate paths, but ignore clk/regulator changes if the table belongs to a
genpd.

> The problem with that would be that platforms (Tegra at least) are
> already using a combination of opp-level and clocks.

If they are using both for a genpd's OPP table (and changes are made for both
opp-level and clock by the OPP core), then it should already be wrong, isn't it?
Two simultaneous calls to dev_pm_opp_set_opp() would set the level correctly (as
aggregation happens in the genpd core), but clock setting would always reflect
the second caller. This should be fixed too, isn't it ?

> To be able to call dev_pm_opp_set_opp() on the required-dev (which
> would be the real device in this case), we need to add it to genpd's
> OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().
> 
> The problem with this, is that the real device already has its own OPP
> table (with the required-OPPs pointing to genpd's OPP table), which
> means that we would end up adding the device to two different OPP
> tables.

I was terrified for a minute after reading this and the current code, as I also
thought there is an issue there. But I was confident that we used to take care
of this case separately earlier. A short dive into git logs got me to this:

commit 6d366d0e5446 ("OPP: Use _set_opp_level() for single genpd case")

This should be working just fine I guess.
Ulf Hansson July 28, 2024, 8:05 p.m. UTC | #16
On Thu, 25 Jul 2024 at 13:25, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 25-07-24, 11:21, Ulf Hansson wrote:
> > Right.
> >
> > The main issue in regards to the above, is that we may end up trying
> > to vote for different devices, which votes correspond to the same
> > OPP/OPP-table. The one that comes first will request the OPP, the
> > other ones will be ignored as the OPP core thinks there is no reason
> > to already set the current OPP.
>
> Right, but that won't happen with the diff I shared earlier where we set
> "forced" to true. Isn't it ?

Correct.

>
> > > I think that design is rather correct, just like other frameworks. Just that we
> > > need to do only set-level for genpds and nothing else. That will have exactly
> > > the same behavior that you want.
> >
> > I don't quite understand what you are proposing. Do you want to add a
> > separate path for opp-levels?
>
> Not separate paths, but ignore clk/regulator changes if the table belongs to a
> genpd.
>
> > The problem with that would be that platforms (Tegra at least) are
> > already using a combination of opp-level and clocks.
>
> If they are using both for a genpd's OPP table (and changes are made for both
> opp-level and clock by the OPP core), then it should already be wrong, isn't it?

They are changing the clock through the device's OPP table and the
level (performance-state) via genpd's table (using required OPPs).
This works fine as of today.

> Two simultaneous calls to dev_pm_opp_set_opp() would set the level correctly (as
> aggregation happens in the genpd core), but clock setting would always reflect
> the second caller. This should be fixed too, isn't it ?

As I said before, I don't see a need for this. The recursive call to
dev_pm_opp_set_opp() is today superfluous.

>
> > To be able to call dev_pm_opp_set_opp() on the required-dev (which
> > would be the real device in this case), we need to add it to genpd's
> > OPP table by calling _add_opp_dev() on it. See _opp_attach_genpd().
> >
> > The problem with this, is that the real device already has its own OPP
> > table (with the required-OPPs pointing to genpd's OPP table), which
> > means that we would end up adding the device to two different OPP
> > tables.
>
> I was terrified for a minute after reading this and the current code, as I also
> thought there is an issue there. But I was confident that we used to take care
> of this case separately earlier. A short dive into git logs got me to this:
>
> commit 6d366d0e5446 ("OPP: Use _set_opp_level() for single genpd case")
>
> This should be working just fine I guess.

It's working today for *opp-level* only, because of the commit above.
That's correct.

My point is that calling dev_pm_opp_set_opp() recursively from
_set_required_opps() doesn't make sense for the single PM domain case,
as we can't assign a required-dev for it. This leads to an
inconsistent behaviour when managing the required-OPPs.

To make the behavior consistent (and to fix the bug), I still think it
would be better to do something along what $subject patch proposes.

Kind regards
Uffe
Viresh Kumar July 29, 2024, 6:05 a.m. UTC | #17
On 28-07-24, 22:05, Ulf Hansson wrote:
> > > > I think that design is rather correct, just like other frameworks. Just that we
> > > > need to do only set-level for genpds and nothing else. That will have exactly
> > > > the same behavior that you want.
> > >
> > > I don't quite understand what you are proposing. Do you want to add a
> > > separate path for opp-levels?
> >
> > Not separate paths, but ignore clk/regulator changes if the table belongs to a
> > genpd.
> >
> > > The problem with that would be that platforms (Tegra at least) are
> > > already using a combination of opp-level and clocks.
> >
> > If they are using both for a genpd's OPP table (and changes are made for both
> > opp-level and clock by the OPP core), then it should already be wrong, isn't it?
> 
> They are changing the clock through the device's OPP table and the
> level (performance-state) via genpd's table (using required OPPs).
> This works fine as of today.

There is a problem here I guess then. Lets say there are two devices A and B,
that depend on a genpd.

A requests required OPP 5 (level 5, clk 1.4 GHz), followed by 
B requests required OPP 3 (level 3, clk 1 GHz).

After this level will be configured to 5 and clk to 1 GHz I think.

> It's working today for *opp-level* only, because of the commit above.
> That's correct.

Good.

> My point is that calling dev_pm_opp_set_opp() recursively from
> _set_required_opps() doesn't make sense for the single PM domain case,
> as we can't assign a required-dev for it. This leads to an
> inconsistent behaviour when managing the required-OPPs.

We won't be calling that because of the above patch. In case of a single dev,
the required device isn't set and so we will never end up calling
dev_pm_opp_set_opp() for a single genpd case.
Ulf Hansson July 29, 2024, 8:30 p.m. UTC | #18
On Mon, 29 Jul 2024 at 08:05, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 28-07-24, 22:05, Ulf Hansson wrote:
> > > > > I think that design is rather correct, just like other frameworks. Just that we
> > > > > need to do only set-level for genpds and nothing else. That will have exactly
> > > > > the same behavior that you want.
> > > >
> > > > I don't quite understand what you are proposing. Do you want to add a
> > > > separate path for opp-levels?
> > >
> > > Not separate paths, but ignore clk/regulator changes if the table belongs to a
> > > genpd.
> > >
> > > > The problem with that would be that platforms (Tegra at least) are
> > > > already using a combination of opp-level and clocks.
> > >
> > > If they are using both for a genpd's OPP table (and changes are made for both
> > > opp-level and clock by the OPP core), then it should already be wrong, isn't it?
> >
> > They are changing the clock through the device's OPP table and the
> > level (performance-state) via genpd's table (using required OPPs).
> > This works fine as of today.
>
> There is a problem here I guess then. Lets say there are two devices A and B,
> that depend on a genpd.
>
> A requests required OPP 5 (level 5, clk 1.4 GHz), followed by
> B requests required OPP 3 (level 3, clk 1 GHz).
>
> After this level will be configured to 5 and clk to 1 GHz I think.

The level would be 5, as the aggregated votes in genpd would be
correct in this case.

In regards to the clocks, I assume this is handled correctly too, as
the clocks are per device clocks that don't belong to the genpd.

>
> > It's working today for *opp-level* only, because of the commit above.
> > That's correct.
>
> Good.
>
> > My point is that calling dev_pm_opp_set_opp() recursively from
> > _set_required_opps() doesn't make sense for the single PM domain case,
> > as we can't assign a required-dev for it. This leads to an
> > inconsistent behaviour when managing the required-OPPs.
>
> We won't be calling that because of the above patch. In case of a single dev,
> the required device isn't set and so we will never end up calling
> dev_pm_opp_set_opp() for a single genpd case.

That's right, but why do we want to call dev_pm_opp_set_opp() for the
multiple PM domain case then? It makes the behaviour inconsistent.

Kind regards
Uffe
Viresh Kumar July 30, 2024, 3:32 a.m. UTC | #19
On 29-07-24, 22:30, Ulf Hansson wrote:
> In regards to the clocks, I assume this is handled correctly too, as
> the clocks are per device clocks that don't belong to the genpd.

It would be if the clk node is present in the device's node. I was talking about
a clock node in the genpd's table earlier. If that is ever the case, we will end
up programming the wrong clk here.

> That's right, but why do we want to call dev_pm_opp_set_opp() for the
> multiple PM domain case then? It makes the behaviour inconsistent.

To have a common path for all required OPP device types, irrespective of the
fact that the required OPP device is a genpd or not. And we are talking about a
required OPP of a separate device here, it must be set via this call only,
technically speaking.

Genpd makes it a little complex, and I agree to that. But I strongly feel this
code needs to be generic and not genpd specific. The OPP core should have as
less genpd specific code as possible. It must handle all device types with a
single code path.
Ulf Hansson July 31, 2024, 10:35 a.m. UTC | #20
[...]

>
> > That's right, but why do we want to call dev_pm_opp_set_opp() for the
> > multiple PM domain case then? It makes the behaviour inconsistent.
>
> To have a common path for all required OPP device types, irrespective of the
> fact that the required OPP device is a genpd or not. And we are talking about a
> required OPP of a separate device here, it must be set via this call only,
> technically speaking.
>
> Genpd makes it a little complex, and I agree to that. But I strongly feel this
> code needs to be generic and not genpd specific. The OPP core should have as
> less genpd specific code as possible. It must handle all device types with a
> single code path.

I agree that we really should avoid genpd specific code and that's
exactly what I am working towards too.

However, calling dev_pm_opp_set_opp() from _set_required_opps() looks
to me like it has the exact opposite effect:
*) To solve the bug according to the change you proposed, means more
genpd hacks.
**) To make the code for the required OPPs consistent between the
single/multiple PM domain case, we need additional genpd hacks.
***) We can't remove some of the existing genpd hacks [1], as those
would then still be needed.

Finally, while I understand that you prefer a single code path, we can
still keep _set_required_opps() common and generic. Today, it's used
only for performance-states of PM domains (the involved code isn't
even genpd specific as it calls
dev_pm_domain_set_performance_state()). If tomorrow we see a need to
extend it to additional resources, it's easy also without calling
dev_pm_opp_set_opp() from it.

Kind regards
Uffe

 [1]
https://lore.kernel.org/all/20240718234319.356451-7-ulf.hansson@linaro.org/
diff mbox series

Patch

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index cb4611fe1b5b..45eca65f27f9 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1061,6 +1061,28 @@  static int _set_opp_bw(const struct opp_table *opp_table,
 	return 0;
 }
 
+static int _set_opp_level(struct device *dev, struct opp_table *opp_table,
+			  struct dev_pm_opp *opp)
+{
+	unsigned int level = 0;
+	int ret = 0;
+
+	if (opp) {
+		if (opp->level == OPP_LEVEL_UNSET)
+			return 0;
+
+		level = opp->level;
+	}
+
+	/* Request a new performance state through the device's PM domain. */
+	ret = dev_pm_domain_set_performance_state(dev, level);
+	if (ret)
+		dev_err(dev, "Failed to set performance state %u (%d)\n", level,
+			ret);
+
+	return ret;
+}
+
 /* This is only called for PM domain for now */
 static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
 			      struct dev_pm_opp *opp, bool up)
@@ -1091,7 +1113,8 @@  static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
 		if (devs[index]) {
 			required_opp = opp ? opp->required_opps[index] : NULL;
 
-			ret = dev_pm_opp_set_opp(devs[index], required_opp);
+			ret = _set_opp_level(devs[index], opp_table,
+					     required_opp);
 			if (ret)
 				return ret;
 		}
@@ -1102,28 +1125,6 @@  static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
 	return 0;
 }
 
-static int _set_opp_level(struct device *dev, struct opp_table *opp_table,
-			  struct dev_pm_opp *opp)
-{
-	unsigned int level = 0;
-	int ret = 0;
-
-	if (opp) {
-		if (opp->level == OPP_LEVEL_UNSET)
-			return 0;
-
-		level = opp->level;
-	}
-
-	/* Request a new performance state through the device's PM domain. */
-	ret = dev_pm_domain_set_performance_state(dev, level);
-	if (ret)
-		dev_err(dev, "Failed to set performance state %u (%d)\n", level,
-			ret);
-
-	return ret;
-}
-
 static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
 {
 	struct dev_pm_opp *opp = ERR_PTR(-ENODEV);