mbox series

[v7,0/7] Introduce OPP bandwidth bindings

Message ID 20200424155404.10746-1-georgi.djakov@linaro.org
Headers show
Series Introduce OPP bandwidth bindings | expand

Message

Georgi Djakov April 24, 2020, 3:53 p.m. UTC
Here is a proposal to extend the OPP bindings with bandwidth based on
a few previous discussions [1] and patchsets from me [2][3] and Saravana
[4][5][6][7][8][9].

Changes in v7:
* This version is combination of both patchsets by Saravana and me, based
on [3] and [9].
* The latest version of DT bindings from Saravana is used here, with a
minor change of using arrays instead of single integers for opp-peak-kBps
and opp-avg-kBps. This is needed to support multiple interconnect paths.
* The concept of having multiple OPP tables per device has been dropped,
as it was nacked by Viresh.
* Various reviews comments have been addressed and some patches are
split, and there are also some new patches. Thanks to Viresh, Sibi and
others for providing feedback!

With this version of the patchset, the CPU/GPU to DDR bandwidth scaling
will look like this in DT:

One interconnect path (no change from Saravana's v6 patches):

cpu@0 {
	operating-points-v2 = <&cpu_opp_table>;
	interconnects = <&noc1 MASTER1 &noc2 SLAVE1>,
};

cpu_opp_table: cpu_opp_table {
	compatible = "operating-points-v2";

	opp-800000000 {
		opp-hz = /bits/ 64 <800000000>;
		opp-peak-kBps = <1525000>;
		opp-avg-kBps = <457000>;
	};

	opp-998400000 {
		opp-hz = /bits/ 64 <998400000>;
		opp-peak-kBps = <7614000>;
		opp-avg-kBps = <2284000>;
	};
};

Two interconnect paths:

cpu@0 {
	operating-points-v2 = <&cpu_opp_table>;
	interconnects = <&noc1 MASTER1 &noc2 SLAVE1>,
			<&noc3 MASTER2 &noc4 SLAVE2>;
};

cpu_opp_table: cpu_opp_table {
	compatible = "operating-points-v2";

	opp-800000000 {
		opp-hz = /bits/ 64 <800000000>;
		opp-peak-kBps = <1525000 2000>;
		opp-avg-kBps = <457000 1000>;
	};

	opp-998400000 {
		opp-hz = /bits/ 64 <998400000>;
		opp-peak-kBps = <7614000 4000>;
		opp-avg-kBps = <2284000 2000>;
	};
};

------

Every functional block on a SoC can contribute to the system power
efficiency by expressing its own bandwidth needs (to memory or other SoC
modules). This will allow the system to save power when high throughput
is not required (and also provide maximum throughput when needed).

There are at least three ways for a device to determine its bandwidth
needs:
	1. The device can dynamically calculate the needed bandwidth
based on some known variable. For example: UART (baud rate), I2C (fast
mode, high-speed mode, etc), USB (specification version, data transfer
type), SDHC (SD standard, clock rate, bus-width), Video Encoder/Decoder
(video format, resolution, frame-rate)

	2. There is a hardware specific value. For example: hardware
specific constant value (e.g. for PRNG) or use-case specific value that
is hard-coded.

	3. Predefined SoC/board specific bandwidth values. For example:
CPU or GPU bandwidth is related to the current core frequency and both
bandwidth and frequency are scaled together.

This patchset is trying to address point 3 above by extending the OPP
bindings to support predefined SoC/board bandwidth values and adds
support in cpufreq-dt to scale the interconnect between the CPU and the
DDR together with frequency and voltage.

[1] https://patchwork.kernel.org/patch/10577315/
[2] https://lore.kernel.org/r/20190313090010.20534-1-georgi.djakov@linaro.org/
[3] https://lore.kernel.org/r/20190423132823.7915-1-georgi.djakov@linaro.org/
[4] https://lore.kernel.org/r/20190608044339.115026-1-saravanak@google.com
[5] https://lore.kernel.org/r/20190614041733.120807-1-saravanak@google.com
[6] https://lore.kernel.org/r/20190703011020.151615-1-saravanak@google.com
[7] https://lore.kernel.org/r/20190726231558.175130-1-saravanak@google.com
[8] https://lore.kernel.org/r/20190807223111.230846-1-saravanak@google.com
[9] https://lore.kernel.org/r/20191207002424.201796-1-saravanak@google.com

Georgi Djakov (5):
  interconnect: Add of_icc_get_by_index() helper function
  OPP: Add support for parsing interconnect bandwidth
  OPP: Add sanity checks in _read_opp_key()
  OPP: Update the bandwidth on OPP frequency changes
  cpufreq: dt: Add support for interconnect bandwidth scaling

Saravana Kannan (2):
  dt-bindings: opp: Introduce opp-peak-kBps and opp-avg-kBps bindings
  OPP: Add helpers for reading the binding properties

 Documentation/devicetree/bindings/opp/opp.txt |  20 ++-
 .../devicetree/bindings/property-units.txt    |   4 +
 drivers/cpufreq/Kconfig                       |   1 +
 drivers/cpufreq/cpufreq-dt.c                  |  15 ++
 drivers/interconnect/core.c                   |  68 +++++--
 drivers/opp/Kconfig                           |   1 +
 drivers/opp/core.c                            |  44 ++++-
 drivers/opp/of.c                              | 170 ++++++++++++++++--
 drivers/opp/opp.h                             |  10 ++
 include/linux/interconnect.h                  |   6 +
 include/linux/pm_opp.h                        |  12 ++
 11 files changed, 311 insertions(+), 40 deletions(-)

Comments

Matthias Kaehlcke April 24, 2020, 5:30 p.m. UTC | #1
Hi Georgi,

On Fri, Apr 24, 2020 at 06:53:59PM +0300, Georgi Djakov wrote:
> From: Saravana Kannan <saravanak@google.com>
> 
> The opp-hz DT property is not mandatory and we may use another property
> as a key in the OPP table. Add helper functions to simplify the reading
> and comparing the keys.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Extracted just the helpers from patch v6, as Viresh advised to split it.
> 
> v6: https://lore.kernel.org/r/20191207002424.201796-3-saravanak@google.com
> 
>  drivers/opp/core.c | 15 +++++++++++++--
>  drivers/opp/of.c   | 42 ++++++++++++++++++++++++++----------------
>  drivers/opp/opp.h  |  1 +
>  3 files changed, 40 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index ba43e6a3dc0a..c9c1bbe6ae27 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1272,11 +1272,21 @@ static bool _opp_supported_by_regulators(struct dev_pm_opp *opp,
>  	return true;
>  }
>  
> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
> +{
> +	if (opp1->rate != opp2->rate)
> +		return opp1->rate < opp2->rate ? -1 : 1;
> +	if (opp1->level != opp2->level)
> +		return opp1->level < opp2->level ? -1 : 1;
> +	return 0;
> +}
> +
>  static int _opp_is_duplicate(struct device *dev, struct dev_pm_opp *new_opp,
>  			     struct opp_table *opp_table,
>  			     struct list_head **head)
>  {
>  	struct dev_pm_opp *opp;
> +	int opp_cmp;
>  
>  	/*
>  	 * Insert new OPP in order of increasing frequency and discard if
> @@ -1287,12 +1297,13 @@ static int _opp_is_duplicate(struct device *dev, struct dev_pm_opp *new_opp,
>  	 * loop.
>  	 */
>  	list_for_each_entry(opp, &opp_table->opp_list, node) {
> -		if (new_opp->rate > opp->rate) {
> +		opp_cmp = _opp_compare_key(new_opp, opp);
> +		if (opp_cmp > 0) {
>  			*head = &opp->node;
>  			continue;
>  		}
>  
> -		if (new_opp->rate < opp->rate)
> +		if (opp_cmp < 0)
>  			return 0;
>  
>  		/* Duplicate OPPs */
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 9cd8f0adacae..e33169c7e045 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -521,6 +521,28 @@ void dev_pm_opp_of_remove_table(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>  
> +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
> +			 bool *rate_not_available)
> +{
> +	u64 rate;
> +	int ret;
> +
> +	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	if (!ret) {
> +		/*
> +		 * Rate is defined as an unsigned long in clk API, and so
> +		 * casting explicitly to its type. Must be fixed once rate is 64
> +		 * bit guaranteed in clk API.
> +		 */
> +		new_opp->rate = (unsigned long)rate;
> +	}

nit: curly braces are not needed

> +	*rate_not_available = !!ret;
> +
> +	of_property_read_u32(np, "opp-level", &new_opp->level);
> +
> +	return ret;
> +}
> +
>  /**
>   * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
>   * @opp_table:	OPP table
> @@ -558,26 +580,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
>  	if (!new_opp)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	ret = _read_opp_key(new_opp, np, &rate_not_available);
>  	if (ret < 0) {
> -		/* "opp-hz" is optional for devices like power domains. */
> -		if (!opp_table->is_genpd) {
> -			dev_err(dev, "%s: opp-hz not found\n", __func__);
> -			goto free_opp;
> -		}
> +		if (!opp_table->is_genpd)
> +			dev_err(dev, "%s: opp key field not found\n", __func__);
>  
> -		rate_not_available = true;
> -	} else {
> -		/*
> -		 * Rate is defined as an unsigned long in clk API, and so
> -		 * casting explicitly to its type. Must be fixed once rate is 64
> -		 * bit guaranteed in clk API.
> -		 */
> -		new_opp->rate = (unsigned long)rate;
> +		goto free_opp;
>  	}
>  
> -	of_property_read_u32(np, "opp-level", &new_opp->level);
> -
>  	/* Check if the OPP supports hardware's hierarchy of versions or not */
>  	if (!_opp_is_supported(dev, opp_table, np)) {
>  		dev_dbg(dev, "OPP not supported by hardware: %llu\n", rate);
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index d14e27102730..bcadb1e328a4 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -211,6 +211,7 @@ struct opp_device *_add_opp_dev(const struct device *dev, struct opp_table *opp_
>  void _dev_pm_opp_find_and_remove_table(struct device *dev);
>  struct dev_pm_opp *_opp_allocate(struct opp_table *opp_table);
>  void _opp_free(struct dev_pm_opp *opp);
> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2);
>  int _opp_add(struct device *dev, struct dev_pm_opp *new_opp, struct opp_table *opp_table, bool rate_not_available);
>  int _opp_add_v1(struct opp_table *opp_table, struct device *dev, unsigned long freq, long u_volt, bool dynamic);
>  void _dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask, int last_cpu);

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Matthias Kaehlcke April 24, 2020, 6:02 p.m. UTC | #2
Hi,

On Fri, Apr 24, 2020 at 06:54:00PM +0300, Georgi Djakov wrote:
> This is the same as the traditional of_icc_get() function, but the
> difference is that it takes index as an argument, instead of name.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed review comments from Sibi.
> * Re-based patch.
> 
> v2: https://lore.kernel.org/r/20190423132823.7915-3-georgi.djakov@linaro.org
> 
>  drivers/interconnect/core.c  | 68 +++++++++++++++++++++++++++---------
>  include/linux/interconnect.h |  6 ++++
>  2 files changed, 58 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 2c6515e3ecf1..648237f4de49 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -351,9 +351,9 @@ static struct icc_node *of_icc_get_from_provider(struct of_phandle_args *spec)
>  }
>  
>  /**
> - * of_icc_get() - get a path handle from a DT node based on name
> + * of_icc_get_by_index() - get a path handle from a DT node based on index
>   * @dev: device pointer for the consumer device
> - * @name: interconnect path name
> + * @idx: interconnect path index
>   *
>   * This function will search for a path between two endpoints and return an
>   * icc_path handle on success. Use icc_put() to release constraints when they
> @@ -365,13 +365,12 @@ static struct icc_node *of_icc_get_from_provider(struct of_phandle_args *spec)
>   * Return: icc_path pointer on success or ERR_PTR() on error. NULL is returned
>   * when the API is disabled or the "interconnects" DT property is missing.
>   */
> -struct icc_path *of_icc_get(struct device *dev, const char *name)
> +struct icc_path *of_icc_get_by_index(struct device *dev, int idx)
>  {
>  	struct icc_path *path = ERR_PTR(-EPROBE_DEFER);

nit: initialization is not needed. According to the diff this is existing
code, but since we are adding a new function we can as well 'fix' it :)

>  	struct icc_node *src_node, *dst_node;
>  	struct device_node *np = NULL;

ditto

>  	struct of_phandle_args src_args, dst_args;
> -	int idx = 0;
>  	int ret;
>  
>  	if (!dev || !dev->of_node)
> @@ -391,12 +390,6 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  	 * lets support only global ids and extend this in the future if needed
>  	 * without breaking DT compatibility.
>  	 */
> -	if (name) {
> -		idx = of_property_match_string(np, "interconnect-names", name);
> -		if (idx < 0)
> -			return ERR_PTR(idx);
> -	}
> -
>  	ret = of_parse_phandle_with_args(np, "interconnects",
>  					 "#interconnect-cells", idx * 2,
>  					 &src_args);
> @@ -439,12 +432,8 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  		return path;
>  	}
>  
> -	if (name)
> -		path->name = kstrdup_const(name, GFP_KERNEL);
> -	else
> -		path->name = kasprintf(GFP_KERNEL, "%s-%s",
> -				       src_node->name, dst_node->name);
> -
> +	path->name = kasprintf(GFP_KERNEL, "%s-%s",
> +			       src_node->name, dst_node->name);
>  	if (!path->name) {
>  		kfree(path);
>  		return ERR_PTR(-ENOMEM);
> @@ -452,6 +441,53 @@ struct icc_path *of_icc_get(struct device *dev, const char *name)
>  
>  	return path;
>  }
> +EXPORT_SYMBOL_GPL(of_icc_get_by_index);
> +
> +/**
> + * of_icc_get() - get a path handle from a DT node based on name
> + * @dev: device pointer for the consumer device
> + * @name: interconnect path name
> + *
> + * This function will search for a path between two endpoints and return an
> + * icc_path handle on success. Use icc_put() to release constraints when they
> + * are not needed anymore.
> + * If the interconnect API is disabled, NULL is returned and the consumer
> + * drivers will still build. Drivers are free to handle this specifically,
> + * but they don't have to.
> + *
> + * Return: icc_path pointer on success or ERR_PTR() on error. NULL is returned
> + * when the API is disabled or the "interconnects" DT property is missing.
> + */
> +struct icc_path *of_icc_get(struct device *dev, const char *name)
> +{
> +	struct device_node *np = NULL;

nit: initialization is not needed

> +	int idx = 0;
> +
> +	if (!dev || !dev->of_node)
> +		return ERR_PTR(-ENODEV);
> +
> +	np = dev->of_node;
> +
> +	/*
> +	 * When the consumer DT node do not have "interconnects" property
> +	 * return a NULL path to skip setting constraints.
> +	 */
> +	if (!of_find_property(np, "interconnects", NULL))
> +		return NULL;
> +
> +	/*
> +	 * We use a combination of phandle and specifier for endpoint. For now
> +	 * lets support only global ids and extend this in the future if needed
> +	 * without breaking DT compatibility.
> +	 */
> +	if (name) {
> +		idx = of_property_match_string(np, "interconnect-names", name);
> +		if (idx < 0)
> +			return ERR_PTR(idx);
> +	}
> +
> +	return of_icc_get_by_index(dev, idx);
> +}
>  EXPORT_SYMBOL_GPL(of_icc_get);
>  
>  /**
> diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h
> index d70a914cba11..34e97231a6ab 100644
> --- a/include/linux/interconnect.h
> +++ b/include/linux/interconnect.h
> @@ -28,6 +28,7 @@ struct device;
>  struct icc_path *icc_get(struct device *dev, const int src_id,
>  			 const int dst_id);
>  struct icc_path *of_icc_get(struct device *dev, const char *name);
> +struct icc_path *of_icc_get_by_index(struct device *dev, int idx);
>  void icc_put(struct icc_path *path);
>  int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw);
>  void icc_set_tag(struct icc_path *path, u32 tag);
> @@ -46,6 +47,11 @@ static inline struct icc_path *of_icc_get(struct device *dev,
>  	return NULL;
>  }
>  
> +static inline struct icc_path *of_icc_get_by_index(struct device *dev, int idx)
> +{
> +	return NULL;
> +}
> +
>  static inline void icc_put(struct icc_path *path)
>  {
>  }

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Matthias Kaehlcke April 24, 2020, 7:20 p.m. UTC | #3
Hi,

On Fri, Apr 24, 2020 at 06:54:01PM +0300, Georgi Djakov wrote:
> The OPP bindings now support bandwidth values, so add support to parse it
> from device tree and store it into the new dev_pm_opp_icc_bw struct, which
> is part of the dev_pm_opp.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed some review comments from Viresh and Sibi.
> * Various other changes.
> 
> v2: https://lore.kernel.org/linux-arm-msm/20190423132823.7915-4-georgi.djakov@linaro.org/
> 
>  drivers/opp/Kconfig    |   1 +
>  drivers/opp/core.c     |  16 +++++-
>  drivers/opp/of.c       | 119 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/opp/opp.h      |   9 ++++
>  include/linux/pm_opp.h |  12 +++++
>  5 files changed, 153 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/opp/Kconfig b/drivers/opp/Kconfig
> index 35dfc7e80f92..230d2b84436c 100644
> --- a/drivers/opp/Kconfig
> +++ b/drivers/opp/Kconfig
> @@ -2,6 +2,7 @@
>  config PM_OPP
>  	bool
>  	select SRCU
> +	depends on INTERCONNECT || !INTERCONNECT

huh?

>  	---help---
>  	  SOCs have a standard set of tuples consisting of frequency and
>  	  voltage pairs that the device will support per voltage domain. This
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index c9c1bbe6ae27..8e86811eb7b2 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -985,6 +985,12 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
>  				ret);
>  	}
>  
> +	/* Find interconnect path(s) for the device */
> +	ret = _of_find_paths(opp_table, dev);
> +	if (ret)
> +		dev_dbg(dev, "%s: Error finding interconnect paths: %d\n",
> +			__func__, ret);

why dev_dbg and not dev_warn?

> +
>  	BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head);
>  	INIT_LIST_HEAD(&opp_table->opp_list);
>  	kref_init(&opp_table->kref);
> @@ -1229,19 +1235,22 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_remove_all_dynamic);
>  struct dev_pm_opp *_opp_allocate(struct opp_table *table)
>  {
>  	struct dev_pm_opp *opp;
> -	int count, supply_size;
> +	int count, supply_size, icc_size;
>  
>  	/* Allocate space for at least one supply */
>  	count = table->regulator_count > 0 ? table->regulator_count : 1;
>  	supply_size = sizeof(*opp->supplies) * count;
> +	icc_size = sizeof(*opp->bandwidth) * table->path_count;
>  
>  	/* allocate new OPP node and supplies structures */
> -	opp = kzalloc(sizeof(*opp) + supply_size, GFP_KERNEL);
> +	opp = kzalloc(sizeof(*opp) + supply_size + icc_size, GFP_KERNEL);
> +
>  	if (!opp)
>  		return NULL;
>  
>  	/* Put the supplies at the end of the OPP structure as an empty array */
>  	opp->supplies = (struct dev_pm_opp_supply *)(opp + 1);
> +	opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + 1);

IIUC this needs to be:

	opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + count);

maybe s/count/supply_count/

>  	INIT_LIST_HEAD(&opp->node);
>  
>  	return opp;
> @@ -1276,6 +1285,9 @@ int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
>  {
>  	if (opp1->rate != opp2->rate)
>  		return opp1->rate < opp2->rate ? -1 : 1;
> +	if (opp1->bandwidth && opp2->bandwidth &&
> +	    opp1->bandwidth[0].peak != opp2->bandwidth[0].peak)
> +		return opp1->bandwidth[0].peak < opp2->bandwidth[0].peak ? -1 : 1;
>  	if (opp1->level != opp2->level)
>  		return opp1->level < opp2->level ? -1 : 1;
>  	return 0;
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index e33169c7e045..978e445b0cdb 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -332,6 +332,59 @@ static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
>  	return ret;
>  }
>  
> +int _of_find_paths(struct opp_table *opp_table, struct device *dev)

nit: _of_find_icc_paths() to be more concise?

> +{
> +	struct device_node *np;
> +	int ret, i, count, num_paths;
> +
> +	np = of_node_get(dev->of_node);
> +	if (!np)
> +		return 0;
> +
> +	count = of_count_phandle_with_args(np, "interconnects",
> +					   "#interconnect-cells");
> +	of_node_put(np);
> +	if (count < 0)
> +		return 0;
> +
> +	/* two phandles when #interconnect-cells = <1> */
> +	if (count % 2) {
> +		dev_err(dev, "%s: Invalid interconnects values\n",
> +			__func__);

nit: no need for separate line

> +		return -EINVAL;
> +	}
> +
> +	num_paths = count / 2;
> +	opp_table->paths = kcalloc(num_paths, sizeof(*opp_table->paths),
> +				   GFP_KERNEL);

Add kfree(opp_table->paths) to _opp_table_kref_release() ?

> +	if (!opp_table->paths)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_paths; i++) {
> +		opp_table->paths[i] = of_icc_get_by_index(dev, i);
> +		if (IS_ERR(opp_table->paths[i])) {
> +			ret = PTR_ERR(opp_table->paths[i]);
> +			if (ret != -EPROBE_DEFER) {
> +				dev_err(dev, "%s: Unable to get path%d: %d\n",
> +					__func__, i, ret);
> +			}

nit: curly braces not needed

> +			goto err;
> +		}
> +	}
> +	opp_table->path_count = num_paths;
> +
> +	return 0;
> +
> +err:
> +	while (i--)
> +		icc_put(opp_table->paths[i]);
> +
> +	kfree(opp_table->paths);
> +	opp_table->paths = NULL;
> +
> +	return ret;
> +}
> +
>  static bool _opp_is_supported(struct device *dev, struct opp_table *opp_table,
>  			      struct device_node *np)
>  {
> @@ -524,8 +577,11 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>  static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
>  			 bool *rate_not_available)
>  {
> +	struct property *peak, *avg;
> +	u32 *peak_bw, *avg_bw;
>  	u64 rate;
> -	int ret;
> +	int ret, i, count;
> +	bool found = false;
>  
>  	ret = of_property_read_u64(np, "opp-hz", &rate);
>  	if (!ret) {
> @@ -535,10 +591,69 @@ static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
>  		 * bit guaranteed in clk API.
>  		 */
>  		new_opp->rate = (unsigned long)rate;
> +		found = true;
>  	}
>  	*rate_not_available = !!ret;
>  
> -	of_property_read_u32(np, "opp-level", &new_opp->level);
> +	peak = of_find_property(np, "opp-peak-kBps", NULL);
> +	if (peak) {
> +		/*
> +		 * Bandwidth consists of peak and average (optional) values:
> +		 * opp-peak-kBps = <path1_value path2_value>;
> +		 * opp-avg-kBps = <path1_value path2_value>;
> +		 */
> +		count = peak->length / sizeof(u32);
> +		peak_bw = kmalloc_array(count, sizeof(*peak_bw), GFP_KERNEL);
> +		if (!peak_bw)
> +			return -ENOMEM;
> +
> +		ret = of_property_read_u32_array(np, "opp-peak-kBps", peak_bw,
> +						 count);
> +		if (ret) {
> +			pr_err("%s: Error parsing opp-peak-kBps: %d\n",
> +			       __func__, ret);
> +			goto free_peak_bw;
> +		}
> +
> +		for (i = 0; i < count; i++)
> +			new_opp->bandwidth[i].peak = kBps_to_icc(peak_bw[i]);
> +
> +		found = true;

		kfree(peak_bw);

or re-arrange the kfree()'s below to be in the common code path

> +	}
> +
> +	avg = of_find_property(np, "opp-avg-kBps", NULL);
> +	if (peak && avg) {
> +		count = avg->length / sizeof(u32);
> +		avg_bw = kmalloc_array(count, sizeof(*avg_bw), GFP_KERNEL);
> +		if (!avg_bw) {
> +			ret = -ENOMEM;
> +			goto free_peak_bw;
> +		}
> +
> +		ret = of_property_read_u32_array(np, "opp-avg-kBps", avg_bw,
> +						 count);
> +		if (ret) {
> +			pr_err("%s: Error parsing opp-avg-kBps: %d\n",
> +			       __func__, ret);
> +			goto free_avg_bw;
> +		}
> +
> +		for (i = 0; i < count; i++)
> +			new_opp->bandwidth[i].avg = kBps_to_icc(avg_bw[i]);

		kfree(avg_bw);

> +	}

nit: the two code blocks for peak and average bandwidth are mostly redundant.
If it weren't for the assignment of 'new_opp->bandwidth[i].avg' vs
'new_opp->bandwidth[i].peak' the above could easily be outsourced into a
helper function. With some pointer hacks you could still do this, but not
sure if it's worth the effort.

> +
> +	if (of_property_read_u32(np, "opp-level", &new_opp->level))
> +		found = true;
> +
> +	if (found)
> +		return 0;
> +
> +	return ret;
> +
> +free_avg_bw:
> +	kfree(avg_bw);
> +free_peak_bw:
> +	kfree(peak_bw);
>  
>  	return ret;
>  }
Matthias Kaehlcke April 24, 2020, 7:26 p.m. UTC | #4
On Fri, Apr 24, 2020 at 06:54:02PM +0300, Georgi Djakov wrote:
> When we read the OPP keys, it would be nice to do some sanity checks
> of the values we get from DT and see if they match with the information
> that is populated in the OPP table. Let's pass a pointer of the table,
> so that we can do some validation.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> New patch.
> 
>  drivers/opp/of.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 978e445b0cdb..2b590fe2e69a 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -574,8 +574,8 @@ void dev_pm_opp_of_remove_table(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>  
> -static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
> -			 bool *rate_not_available)
> +static int _read_opp_key(struct dev_pm_opp *new_opp, struct opp_table *table,
> +			 struct device_node *np, bool *rate_not_available)
>  {
>  	struct property *peak, *avg;
>  	u32 *peak_bw, *avg_bw;
> @@ -603,6 +603,12 @@ static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
>  		 * opp-avg-kBps = <path1_value path2_value>;
>  		 */
>  		count = peak->length / sizeof(u32);
> +		if (table->path_count != count) {
> +			pr_err("%s: Mismatch between opp-peak-kBps and paths (%d %d)\n",
> +			       __func__, count, table->path_count);
> +			return -EINVAL;
> +		}
> +
>  		peak_bw = kmalloc_array(count, sizeof(*peak_bw), GFP_KERNEL);
>  		if (!peak_bw)
>  			return -ENOMEM;
> @@ -624,6 +630,13 @@ static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
>  	avg = of_find_property(np, "opp-avg-kBps", NULL);
>  	if (peak && avg) {
>  		count = avg->length / sizeof(u32);
> +		if (table->path_count != count) {
> +			pr_err("%s: Mismatch between opp-avg-kBps and paths (%d %d)\n",
> +			       __func__, count, table->path_count);
> +			ret = -EINVAL;
> +			goto free_peak_bw;
> +		}
> +
>  		avg_bw = kmalloc_array(count, sizeof(*avg_bw), GFP_KERNEL);
>  		if (!avg_bw) {
>  			ret = -ENOMEM;
> @@ -695,7 +708,7 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
>  	if (!new_opp)
>  		return ERR_PTR(-ENOMEM);
>  
> -	ret = _read_opp_key(new_opp, np, &rate_not_available);
> +	ret = _read_opp_key(new_opp, opp_table, np, &rate_not_available);
>  	if (ret < 0) {
>  		if (!opp_table->is_genpd)
>  			dev_err(dev, "%s: opp key field not found\n", __func__);

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Matthias Kaehlcke April 24, 2020, 7:36 p.m. UTC | #5
On Fri, Apr 24, 2020 at 06:54:03PM +0300, Georgi Djakov wrote:
> If the OPP bandwidth values are populated, we want to switch also the
> interconnect bandwidth in addition to frequency and voltage.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed review comments from Viresh.
> 
> v2: https://lore.kernel.org/r/20190423132823.7915-5-georgi.djakov@linaro.org
> 
>  drivers/opp/core.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 8e86811eb7b2..66a8ea10f3de 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -808,7 +808,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>  	unsigned long freq, old_freq, temp_freq;
>  	struct dev_pm_opp *old_opp, *opp;
>  	struct clk *clk;
> -	int ret;
> +	int ret, i;
>  
>  	opp_table = _find_opp_table(dev);
>  	if (IS_ERR(opp_table)) {
> @@ -895,6 +895,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>  			dev_err(dev, "Failed to set required opps: %d\n", ret);
>  	}
>  
> +	if (!ret && opp_table->paths) {
> +		for (i = 0; i < opp_table->path_count; i++) {
> +			ret = icc_set_bw(opp_table->paths[i],
> +					 opp->bandwidth[i].avg,
> +					 opp->bandwidth[i].peak);
> +			if (ret)
> +				dev_err(dev, "Failed to set bandwidth[%d]: %d\n",
> +					i, ret);
> +		}
> +	}
> +
>  put_opp:
>  	dev_pm_opp_put(opp);
>  put_old_opp:

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Matthias Kaehlcke April 24, 2020, 7:41 p.m. UTC | #6
On Fri, Apr 24, 2020 at 06:54:04PM +0300, Georgi Djakov wrote:
> In addition to clocks and regulators, some devices can scale the bandwidth
> of their on-chip interconnect - for example between CPU and DDR memory. Add
> support for that, so that platforms which support it can make use of it.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Drop using dev_pm_opp_set_paths(), as it has been removed.
> * Add Kconfig dependency on INTERCONNECT, as it can be module.
> 
> 
> v2: https://lore.kernel.org/r/20190423132823.7915-6-georgi.djakov@linaro.org
> 
>  drivers/cpufreq/Kconfig      |  1 +
>  drivers/cpufreq/cpufreq-dt.c | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> index c3e6bd59e920..db2ad54ee67f 100644
> --- a/drivers/cpufreq/Kconfig
> +++ b/drivers/cpufreq/Kconfig
> @@ -217,6 +217,7 @@ config CPUFREQ_DT
>  
>  config CPUFREQ_DT_PLATDEV
>  	bool
> +	depends on INTERCONNECT || !INTERCONNECT
>  	help
>  	  This adds a generic DT based cpufreq platdev driver for frequency
>  	  management.  This creates a 'cpufreq-dt' platform device, on the
> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
> index 26fe8dfb9ce6..4ecef3257532 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -13,6 +13,7 @@
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
>  #include <linux/err.h>
> +#include <linux/interconnect.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/pm_opp.h>
> @@ -95,6 +96,7 @@ static int resources_available(void)
>  	struct device *cpu_dev;
>  	struct regulator *cpu_reg;
>  	struct clk *cpu_clk;
> +	struct icc_path *cpu_path;
>  	int ret = 0;
>  	const char *name;
>  
> @@ -121,6 +123,19 @@ static int resources_available(void)
>  
>  	clk_put(cpu_clk);
>  
> +	cpu_path = of_icc_get(cpu_dev, NULL);
> +	ret = PTR_ERR_OR_ZERO(cpu_path);
> +	if (ret) {
> +		if (ret == -EPROBE_DEFER)
> +			dev_dbg(cpu_dev, "defer icc path: %d\n", ret);
> +		else
> +			dev_err(cpu_dev, "failed to get icc path: %d\n", ret);
> +
> +		return ret;
> +	}
> +
> +	icc_put(cpu_path);
> +
>  	name = find_supply_name(cpu_dev);
>  	/* Platform doesn't require regulator */
>  	if (!name)

Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Saravana Kannan April 24, 2020, 9:18 p.m. UTC | #7
On Fri, Apr 24, 2020 at 8:54 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>
> If the OPP bandwidth values are populated, we want to switch also the
> interconnect bandwidth in addition to frequency and voltage.
>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed review comments from Viresh.
>
> v2: https://lore.kernel.org/r/20190423132823.7915-5-georgi.djakov@linaro.org
>
>  drivers/opp/core.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 8e86811eb7b2..66a8ea10f3de 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -808,7 +808,7 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>         unsigned long freq, old_freq, temp_freq;
>         struct dev_pm_opp *old_opp, *opp;
>         struct clk *clk;
> -       int ret;
> +       int ret, i;
>
>         opp_table = _find_opp_table(dev);
>         if (IS_ERR(opp_table)) {
> @@ -895,6 +895,17 @@ int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq)
>                         dev_err(dev, "Failed to set required opps: %d\n", ret);
>         }
>
> +       if (!ret && opp_table->paths) {
> +               for (i = 0; i < opp_table->path_count; i++) {
> +                       ret = icc_set_bw(opp_table->paths[i],
> +                                        opp->bandwidth[i].avg,
> +                                        opp->bandwidth[i].peak);
> +                       if (ret)
> +                               dev_err(dev, "Failed to set bandwidth[%d]: %d\n",
> +                                       i, ret);
> +               }
> +       }
> +

Hey Georgi,

Thanks for getting this series going again and converging on the DT
bindings! Will be nice to see this land finally.

I skimmed through all the patches in the series and they mostly look
good (if you address some of Matthias's comments).

My only comment is -- can we drop this patch please? I'd like to use
devfreq governors for voting on bandwidth and this will effectively
override whatever bandwidth decisions are made by the devfreq
governor.

If you really want to keep this, then maybe don't "get" the icc path
by default in patch 4/7 and then let the device driver set the icc
path if it wants the opp framework to manage the bandwidth too?

-Saravana
Georgi Djakov April 28, 2020, 4:21 p.m. UTC | #8
Hi Matthias,

On 4/24/20 22:20, Matthias Kaehlcke wrote:
> Hi,
> 
> On Fri, Apr 24, 2020 at 06:54:01PM +0300, Georgi Djakov wrote:
>> The OPP bindings now support bandwidth values, so add support to parse it
>> from device tree and store it into the new dev_pm_opp_icc_bw struct, which
>> is part of the dev_pm_opp.
>>
>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
>> ---
>> v7:
>> * Addressed some review comments from Viresh and Sibi.
>> * Various other changes.
>>
>> v2: https://lore.kernel.org/linux-arm-msm/20190423132823.7915-4-georgi.djakov@linaro.org/
>>
>>  drivers/opp/Kconfig    |   1 +
>>  drivers/opp/core.c     |  16 +++++-
>>  drivers/opp/of.c       | 119 ++++++++++++++++++++++++++++++++++++++++-
>>  drivers/opp/opp.h      |   9 ++++
>>  include/linux/pm_opp.h |  12 +++++
>>  5 files changed, 153 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/opp/Kconfig b/drivers/opp/Kconfig
>> index 35dfc7e80f92..230d2b84436c 100644
>> --- a/drivers/opp/Kconfig
>> +++ b/drivers/opp/Kconfig
>> @@ -2,6 +2,7 @@
>>  config PM_OPP
>>  	bool
>>  	select SRCU
>> +	depends on INTERCONNECT || !INTERCONNECT
> 
> huh?

Yeah, PM_OPP can be built-in only, but interconnect can be a module and in this
case i expect the linker to complain.

> 
>>  	---help---
>>  	  SOCs have a standard set of tuples consisting of frequency and
>>  	  voltage pairs that the device will support per voltage domain. This
>> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
>> index c9c1bbe6ae27..8e86811eb7b2 100644
>> --- a/drivers/opp/core.c
>> +++ b/drivers/opp/core.c
>> @@ -985,6 +985,12 @@ static struct opp_table *_allocate_opp_table(struct device *dev, int index)
>>  				ret);
>>  	}
>>  
>> +	/* Find interconnect path(s) for the device */
>> +	ret = _of_find_paths(opp_table, dev);
>> +	if (ret)
>> +		dev_dbg(dev, "%s: Error finding interconnect paths: %d\n",
>> +			__func__, ret);
> 
> why dev_dbg and not dev_warn?

Will make it dev_warn. Thanks!

>> +
>>  	BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head);
>>  	INIT_LIST_HEAD(&opp_table->opp_list);
>>  	kref_init(&opp_table->kref);
>> @@ -1229,19 +1235,22 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_remove_all_dynamic);
>>  struct dev_pm_opp *_opp_allocate(struct opp_table *table)
>>  {
>>  	struct dev_pm_opp *opp;
>> -	int count, supply_size;
>> +	int count, supply_size, icc_size;
>>  
>>  	/* Allocate space for at least one supply */
>>  	count = table->regulator_count > 0 ? table->regulator_count : 1;
>>  	supply_size = sizeof(*opp->supplies) * count;
>> +	icc_size = sizeof(*opp->bandwidth) * table->path_count;
>>  
>>  	/* allocate new OPP node and supplies structures */
>> -	opp = kzalloc(sizeof(*opp) + supply_size, GFP_KERNEL);
>> +	opp = kzalloc(sizeof(*opp) + supply_size + icc_size, GFP_KERNEL);
>> +
>>  	if (!opp)
>>  		return NULL;
>>  
>>  	/* Put the supplies at the end of the OPP structure as an empty array */
>>  	opp->supplies = (struct dev_pm_opp_supply *)(opp + 1);
>> +	opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + 1);
> 
> IIUC this needs to be:
> 
> 	opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + count);
> 
> maybe s/count/supply_count/

Right, thank you!

> 
>>  	INIT_LIST_HEAD(&opp->node);
>>  
>>  	return opp;
>> @@ -1276,6 +1285,9 @@ int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
>>  {
>>  	if (opp1->rate != opp2->rate)
>>  		return opp1->rate < opp2->rate ? -1 : 1;
>> +	if (opp1->bandwidth && opp2->bandwidth &&
>> +	    opp1->bandwidth[0].peak != opp2->bandwidth[0].peak)
>> +		return opp1->bandwidth[0].peak < opp2->bandwidth[0].peak ? -1 : 1;
>>  	if (opp1->level != opp2->level)
>>  		return opp1->level < opp2->level ? -1 : 1;
>>  	return 0;
>> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
>> index e33169c7e045..978e445b0cdb 100644
>> --- a/drivers/opp/of.c
>> +++ b/drivers/opp/of.c
>> @@ -332,6 +332,59 @@ static int _of_opp_alloc_required_opps(struct opp_table *opp_table,
>>  	return ret;
>>  }
>>  
>> +int _of_find_paths(struct opp_table *opp_table, struct device *dev)
> 
> nit: _of_find_icc_paths() to be more concise?

Ok!

> 
>> +{
>> +	struct device_node *np;
>> +	int ret, i, count, num_paths;
>> +
>> +	np = of_node_get(dev->of_node);
>> +	if (!np)
>> +		return 0;
>> +
>> +	count = of_count_phandle_with_args(np, "interconnects",
>> +					   "#interconnect-cells");
>> +	of_node_put(np);
>> +	if (count < 0)
>> +		return 0;
>> +
>> +	/* two phandles when #interconnect-cells = <1> */
>> +	if (count % 2) {
>> +		dev_err(dev, "%s: Invalid interconnects values\n",
>> +			__func__);
> 
> nit: no need for separate line

Ok!

> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	num_paths = count / 2;
>> +	opp_table->paths = kcalloc(num_paths, sizeof(*opp_table->paths),
>> +				   GFP_KERNEL);
> 
> Add kfree(opp_table->paths) to _opp_table_kref_release() ?

Yes, sure.

>> +	if (!opp_table->paths)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < num_paths; i++) {
>> +		opp_table->paths[i] = of_icc_get_by_index(dev, i);
>> +		if (IS_ERR(opp_table->paths[i])) {
>> +			ret = PTR_ERR(opp_table->paths[i]);
>> +			if (ret != -EPROBE_DEFER) {
>> +				dev_err(dev, "%s: Unable to get path%d: %d\n",
>> +					__func__, i, ret);
>> +			}
> 
> nit: curly braces not needed

Ok!

[..]
>> +		for (i = 0; i < count; i++)
>> +			new_opp->bandwidth[i].peak = kBps_to_icc(peak_bw[i]);
>> +
>> +		found = true;
> 
> 		kfree(peak_bw);
> 
> or re-arrange the kfree()'s below to be in the common code path
> 
>> +	}
>> +
>> +	avg = of_find_property(np, "opp-avg-kBps", NULL);
>> +	if (peak && avg) {
>> +		count = avg->length / sizeof(u32);
>> +		avg_bw = kmalloc_array(count, sizeof(*avg_bw), GFP_KERNEL);
>> +		if (!avg_bw) {
>> +			ret = -ENOMEM;
>> +			goto free_peak_bw;
>> +		}
>> +
>> +		ret = of_property_read_u32_array(np, "opp-avg-kBps", avg_bw,
>> +						 count);
>> +		if (ret) {
>> +			pr_err("%s: Error parsing opp-avg-kBps: %d\n",
>> +			       __func__, ret);
>> +			goto free_avg_bw;
>> +		}
>> +
>> +		for (i = 0; i < count; i++)
>> +			new_opp->bandwidth[i].avg = kBps_to_icc(avg_bw[i]);
> 
> 		kfree(avg_bw);
> 
>> +	}
> 
> nit: the two code blocks for peak and average bandwidth are mostly redundant.
> If it weren't for the assignment of 'new_opp->bandwidth[i].avg' vs
> 'new_opp->bandwidth[i].peak' the above could easily be outsourced into a
> helper function. With some pointer hacks you could still do this, but not
> sure if it's worth the effort.

Yeah, i didn't really like this part. I'll see if i can improve it a bit.

Thanks a lot for reviewing!

BR,
Georgi
Viresh Kumar April 30, 2020, 5:21 a.m. UTC | #9
On 24-04-20, 10:30, Matthias Kaehlcke wrote:
> Hi Georgi,
> 
> On Fri, Apr 24, 2020 at 06:53:59PM +0300, Georgi Djakov wrote:
> > From: Saravana Kannan <saravanak@google.com>
> > 
> > The opp-hz DT property is not mandatory and we may use another property
> > as a key in the OPP table. Add helper functions to simplify the reading
> > and comparing the keys.
> > 
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> > ---
> > v7:
> > * Extracted just the helpers from patch v6, as Viresh advised to split it.
> > 
> > v6: https://lore.kernel.org/r/20191207002424.201796-3-saravanak@google.com
> > 
> >  drivers/opp/core.c | 15 +++++++++++++--
> >  drivers/opp/of.c   | 42 ++++++++++++++++++++++++++----------------
> >  drivers/opp/opp.h  |  1 +
> >  3 files changed, 40 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> > index ba43e6a3dc0a..c9c1bbe6ae27 100644
> > --- a/drivers/opp/core.c
> > +++ b/drivers/opp/core.c
> > @@ -1272,11 +1272,21 @@ static bool _opp_supported_by_regulators(struct dev_pm_opp *opp,
> >  	return true;
> >  }
> >  
> > +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
> > +{
> > +	if (opp1->rate != opp2->rate)
> > +		return opp1->rate < opp2->rate ? -1 : 1;
> > +	if (opp1->level != opp2->level)
> > +		return opp1->level < opp2->level ? -1 : 1;
> > +	return 0;
> > +}
> > +
> >  static int _opp_is_duplicate(struct device *dev, struct dev_pm_opp *new_opp,
> >  			     struct opp_table *opp_table,
> >  			     struct list_head **head)
> >  {
> >  	struct dev_pm_opp *opp;
> > +	int opp_cmp;
> >  
> >  	/*
> >  	 * Insert new OPP in order of increasing frequency and discard if
> > @@ -1287,12 +1297,13 @@ static int _opp_is_duplicate(struct device *dev, struct dev_pm_opp *new_opp,
> >  	 * loop.
> >  	 */
> >  	list_for_each_entry(opp, &opp_table->opp_list, node) {
> > -		if (new_opp->rate > opp->rate) {
> > +		opp_cmp = _opp_compare_key(new_opp, opp);
> > +		if (opp_cmp > 0) {
> >  			*head = &opp->node;
> >  			continue;
> >  		}
> >  
> > -		if (new_opp->rate < opp->rate)
> > +		if (opp_cmp < 0)
> >  			return 0;
> >  
> >  		/* Duplicate OPPs */
> > diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> > index 9cd8f0adacae..e33169c7e045 100644
> > --- a/drivers/opp/of.c
> > +++ b/drivers/opp/of.c
> > @@ -521,6 +521,28 @@ void dev_pm_opp_of_remove_table(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
> >  
> > +static int _read_opp_key(struct dev_pm_opp *new_opp, struct device_node *np,
> > +			 bool *rate_not_available)
> > +{
> > +	u64 rate;
> > +	int ret;
> > +
> > +	ret = of_property_read_u64(np, "opp-hz", &rate);
> > +	if (!ret) {
> > +		/*
> > +		 * Rate is defined as an unsigned long in clk API, and so
> > +		 * casting explicitly to its type. Must be fixed once rate is 64
> > +		 * bit guaranteed in clk API.
> > +		 */
> > +		new_opp->rate = (unsigned long)rate;
> > +	}
> 
> nit: curly braces are not needed

In fact they are as the comment is present within the if block (which
is the right thing to do). Yes the code will compile fine without
braces, but coding guideline suggests it around multi-line-statements.

> > +	*rate_not_available = !!ret;
> > +
> > +	of_property_read_u32(np, "opp-level", &new_opp->level);
> > +
> > +	return ret;
> > +}
> > +
> >  /**
> >   * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT bindings)
> >   * @opp_table:	OPP table
> > @@ -558,26 +580,14 @@ static struct dev_pm_opp *_opp_add_static_v2(struct opp_table *opp_table,
> >  	if (!new_opp)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	ret = of_property_read_u64(np, "opp-hz", &rate);
> > +	ret = _read_opp_key(new_opp, np, &rate_not_available);
> >  	if (ret < 0) {
> > -		/* "opp-hz" is optional for devices like power domains. */
> > -		if (!opp_table->is_genpd) {
> > -			dev_err(dev, "%s: opp-hz not found\n", __func__);
> > -			goto free_opp;
> > -		}
> > +		if (!opp_table->is_genpd)
> > +			dev_err(dev, "%s: opp key field not found\n", __func__);

Looks like the logic got changed here ? We used to goto free_opp only
if !is_genpd earlier..

> >  
> > -		rate_not_available = true;
> > -	} else {
> > -		/*
> > -		 * Rate is defined as an unsigned long in clk API, and so
> > -		 * casting explicitly to its type. Must be fixed once rate is 64
> > -		 * bit guaranteed in clk API.
> > -		 */
> > -		new_opp->rate = (unsigned long)rate;
> > +		goto free_opp;
> >  	}
> >  
> > -	of_property_read_u32(np, "opp-level", &new_opp->level);
> > -
> >  	/* Check if the OPP supports hardware's hierarchy of versions or not */
> >  	if (!_opp_is_supported(dev, opp_table, np)) {
> >  		dev_dbg(dev, "OPP not supported by hardware: %llu\n", rate);
> > diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> > index d14e27102730..bcadb1e328a4 100644
> > --- a/drivers/opp/opp.h
> > +++ b/drivers/opp/opp.h
> > @@ -211,6 +211,7 @@ struct opp_device *_add_opp_dev(const struct device *dev, struct opp_table *opp_
> >  void _dev_pm_opp_find_and_remove_table(struct device *dev);
> >  struct dev_pm_opp *_opp_allocate(struct opp_table *opp_table);
> >  void _opp_free(struct dev_pm_opp *opp);
> > +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2);
> >  int _opp_add(struct device *dev, struct dev_pm_opp *new_opp, struct opp_table *opp_table, bool rate_not_available);
> >  int _opp_add_v1(struct opp_table *opp_table, struct device *dev, unsigned long freq, long u_volt, bool dynamic);
> >  void _dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask, int last_cpu);
> 
> Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Viresh Kumar April 30, 2020, 5:28 a.m. UTC | #10
On 24-04-20, 12:20, Matthias Kaehlcke wrote:
> On Fri, Apr 24, 2020 at 06:54:01PM +0300, Georgi Djakov wrote:
> > +	for (i = 0; i < num_paths; i++) {
> > +		opp_table->paths[i] = of_icc_get_by_index(dev, i);
> > +		if (IS_ERR(opp_table->paths[i])) {
> > +			ret = PTR_ERR(opp_table->paths[i]);
> > +			if (ret != -EPROBE_DEFER) {
> > +				dev_err(dev, "%s: Unable to get path%d: %d\n",
> > +					__func__, i, ret);
> > +			}
> 
> nit: curly braces not needed

Again, braces are preferred across multi-line blocks. Please keep it.
Viresh Kumar April 30, 2020, 6:09 a.m. UTC | #11
On 24-04-20, 14:18, Saravana Kannan wrote:
> My only comment is -- can we drop this patch please? I'd like to use
> devfreq governors for voting on bandwidth and this will effectively
> override whatever bandwidth decisions are made by the devfreq
> governor.

And why would that be better ? FWIW, that will have the same problem
which cpufreq governors had since ages, i.e. they were not proactive
and were always too late.

The bw should get updated right with frequency, why shouldn't it ?
Saravana Kannan April 30, 2020, 7:35 a.m. UTC | #12
On Wed, Apr 29, 2020 at 11:09 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 24-04-20, 14:18, Saravana Kannan wrote:
> > My only comment is -- can we drop this patch please? I'd like to use
> > devfreq governors for voting on bandwidth and this will effectively
> > override whatever bandwidth decisions are made by the devfreq
> > governor.
>
> And why would that be better ? FWIW, that will have the same problem
> which cpufreq governors had since ages, i.e. they were not proactive
> and were always too late.
>
> The bw should get updated right with frequency, why shouldn't it ?

I didn't say the bw would be voted based on just CPUfreq. It can also
be based on CPU busy time and other stats. Having said that, this is
not just about CPUfreq. Having the bw be force changed every time a
device has it's OPP is changed is very inflexible. Please don't do it.

-Saravana
Viresh Kumar April 30, 2020, 7:53 a.m. UTC | #13
On 30-04-20, 00:35, Saravana Kannan wrote:
> On Wed, Apr 29, 2020 at 11:09 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 24-04-20, 14:18, Saravana Kannan wrote:
> > > My only comment is -- can we drop this patch please? I'd like to use
> > > devfreq governors for voting on bandwidth and this will effectively
> > > override whatever bandwidth decisions are made by the devfreq
> > > governor.
> >
> > And why would that be better ? FWIW, that will have the same problem
> > which cpufreq governors had since ages, i.e. they were not proactive
> > and were always too late.
> >
> > The bw should get updated right with frequency, why shouldn't it ?
> 
> I didn't say the bw would be voted based on just CPUfreq. It can also
> be based on CPU busy time and other stats. Having said that, this is
> not just about CPUfreq. Having the bw be force changed every time a
> device has it's OPP is changed is very inflexible. Please don't do it.

So, the vote based on the requirements of cpufreq driver should come
directly from the cpufreq side itself, but no one stops the others
layers to aggregate the requests and then act on them. This is how it
is done for other frameworks like clk, regulator, genpd, etc.

You guys need to figure out who aggregates the requests from all users
or input providers for a certain path. This was pushed into the genpd
core in case of performance state for example.
Saravana Kannan April 30, 2020, 4:32 p.m. UTC | #14
On Thu, Apr 30, 2020 at 12:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 30-04-20, 00:35, Saravana Kannan wrote:
> > On Wed, Apr 29, 2020 at 11:09 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 24-04-20, 14:18, Saravana Kannan wrote:
> > > > My only comment is -- can we drop this patch please? I'd like to use
> > > > devfreq governors for voting on bandwidth and this will effectively
> > > > override whatever bandwidth decisions are made by the devfreq
> > > > governor.
> > >
> > > And why would that be better ? FWIW, that will have the same problem
> > > which cpufreq governors had since ages, i.e. they were not proactive
> > > and were always too late.
> > >
> > > The bw should get updated right with frequency, why shouldn't it ?
> >
> > I didn't say the bw would be voted based on just CPUfreq. It can also
> > be based on CPU busy time and other stats. Having said that, this is
> > not just about CPUfreq. Having the bw be force changed every time a
> > device has it's OPP is changed is very inflexible. Please don't do it.
>
> So, the vote based on the requirements of cpufreq driver should come
> directly from the cpufreq side itself, but no one stops the others
> layers to aggregate the requests and then act on them. This is how it
> is done for other frameworks like clk, regulator, genpd, etc.

You are missing the point. This is not about aggregation. This is
about OPP voting for bandwidth on a path when the vote can/should be
0.

I'll give another example. Say one of the interconnect paths needs to
be voted only when a particular use case is running. Say, the GPU
needs to vote for bandwidth to L3 only when it's running in cache
coherent mode. But it always needs to vote for bandwidth to DDR. With
the way it's written now, OPP is going to force vote a non-zero
bandwidth to L3 even when it can be zero. Wasting power for no good
reason.

Just let the drivers/device get the bandwidth values from OPP without
forcing them to vote for the bandwidth when they don't need to. Just
because they decide to use OPP to set their clock doesn't mean they
should lose to ability to control their bandwidth in a more
intelligent fashion.

-Saravana
Viresh Kumar May 4, 2020, 5 a.m. UTC | #15
On 30-04-20, 09:32, Saravana Kannan wrote:
> You are missing the point. This is not about aggregation. This is
> about OPP voting for bandwidth on a path when the vote can/should be
> 0.
> 
> I'll give another example. Say one of the interconnect paths needs to
> be voted only when a particular use case is running. Say, the GPU
> needs to vote for bandwidth to L3 only when it's running in cache
> coherent mode. But it always needs to vote for bandwidth to DDR. With
> the way it's written now, OPP is going to force vote a non-zero
> bandwidth to L3 even when it can be zero. Wasting power for no good
> reason.
> 
> Just let the drivers/device get the bandwidth values from OPP without
> forcing them to vote for the bandwidth when they don't need to. Just
> because they decide to use OPP to set their clock doesn't mean they
> should lose to ability to control their bandwidth in a more
> intelligent fashion.

They shouldn't use opp_set_rate() in such a scenario. Why should they?

opp_set_rate() was introduced to take care of only the simple cases
and the complex ones are left for the drivers to handle. For example,
they take care of programming multiple regulators (in case of TI), as
OPP core can't know the order in which regulators need to be
programmed. But for the simple cases, opp core can program everything
the way it is presented in DT.
Sibi Sankar May 4, 2020, 8:40 p.m. UTC | #16
On 2020-04-24 21:23, Georgi Djakov wrote:
> From: Saravana Kannan <saravanak@google.com>
> 
> The opp-hz DT property is not mandatory and we may use another property
> as a key in the OPP table. Add helper functions to simplify the reading
> and comparing the keys.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Extracted just the helpers from patch v6, as Viresh advised to split 
> it.
> 
> v6: 
> https://lore.kernel.org/r/20191207002424.201796-3-saravanak@google.com
> 
>  drivers/opp/core.c | 15 +++++++++++++--
>  drivers/opp/of.c   | 42 ++++++++++++++++++++++++++----------------
>  drivers/opp/opp.h  |  1 +
>  3 files changed, 40 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index ba43e6a3dc0a..c9c1bbe6ae27 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1272,11 +1272,21 @@ static bool
> _opp_supported_by_regulators(struct dev_pm_opp *opp,
>  	return true;
>  }
> 
> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
> +{
> +	if (opp1->rate != opp2->rate)
> +		return opp1->rate < opp2->rate ? -1 : 1;
> +	if (opp1->level != opp2->level)
> +		return opp1->level < opp2->level ? -1 : 1;
> +	return 0;
> +}
> +
>  static int _opp_is_duplicate(struct device *dev, struct dev_pm_opp 
> *new_opp,
>  			     struct opp_table *opp_table,
>  			     struct list_head **head)
>  {
>  	struct dev_pm_opp *opp;
> +	int opp_cmp;
> 
>  	/*
>  	 * Insert new OPP in order of increasing frequency and discard if
> @@ -1287,12 +1297,13 @@ static int _opp_is_duplicate(struct device
> *dev, struct dev_pm_opp *new_opp,
>  	 * loop.
>  	 */
>  	list_for_each_entry(opp, &opp_table->opp_list, node) {
> -		if (new_opp->rate > opp->rate) {
> +		opp_cmp = _opp_compare_key(new_opp, opp);
> +		if (opp_cmp > 0) {
>  			*head = &opp->node;
>  			continue;
>  		}
> 
> -		if (new_opp->rate < opp->rate)
> +		if (opp_cmp < 0)
>  			return 0;
> 
>  		/* Duplicate OPPs */
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index 9cd8f0adacae..e33169c7e045 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -521,6 +521,28 @@ void dev_pm_opp_of_remove_table(struct device 
> *dev)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
> 
> +static int _read_opp_key(struct dev_pm_opp *new_opp, struct 
> device_node *np,
> +			 bool *rate_not_available)
> +{
> +	u64 rate;
> +	int ret;
> +
> +	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	if (!ret) {
> +		/*
> +		 * Rate is defined as an unsigned long in clk API, and so
> +		 * casting explicitly to its type. Must be fixed once rate is 64
> +		 * bit guaranteed in clk API.
> +		 */
> +		new_opp->rate = (unsigned long)rate;
> +	}
> +	*rate_not_available = !!ret;
> +
> +	of_property_read_u32(np, "opp-level", &new_opp->level);
> +
> +	return ret;
> +}
> +
>  /**
>   * _opp_add_static_v2() - Allocate static OPPs (As per 'v2' DT 
> bindings)
>   * @opp_table:	OPP table
> @@ -558,26 +580,14 @@ static struct dev_pm_opp
> *_opp_add_static_v2(struct opp_table *opp_table,
>  	if (!new_opp)
>  		return ERR_PTR(-ENOMEM);
> 
> -	ret = of_property_read_u64(np, "opp-hz", &rate);
> +	ret = _read_opp_key(new_opp, np, &rate_not_available);
>  	if (ret < 0) {
> -		/* "opp-hz" is optional for devices like power domains. */
> -		if (!opp_table->is_genpd) {
> -			dev_err(dev, "%s: opp-hz not found\n", __func__);
> -			goto free_opp;
> -		}
> +		if (!opp_table->is_genpd)
> +			dev_err(dev, "%s: opp key field not found\n", __func__);

With ^^ regression fixed

Reviewed-by: Sibi Sankar <sibis@codeaurora.org>

> 
> -		rate_not_available = true;
> -	} else {
> -		/*
> -		 * Rate is defined as an unsigned long in clk API, and so
> -		 * casting explicitly to its type. Must be fixed once rate is 64
> -		 * bit guaranteed in clk API.
> -		 */
> -		new_opp->rate = (unsigned long)rate;
> +		goto free_opp;
>  	}
> 
> -	of_property_read_u32(np, "opp-level", &new_opp->level);
> -
>  	/* Check if the OPP supports hardware's hierarchy of versions or not 
> */
>  	if (!_opp_is_supported(dev, opp_table, np)) {
>  		dev_dbg(dev, "OPP not supported by hardware: %llu\n", rate);
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index d14e27102730..bcadb1e328a4 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -211,6 +211,7 @@ struct opp_device *_add_opp_dev(const struct
> device *dev, struct opp_table *opp_
>  void _dev_pm_opp_find_and_remove_table(struct device *dev);
>  struct dev_pm_opp *_opp_allocate(struct opp_table *opp_table);
>  void _opp_free(struct dev_pm_opp *opp);
> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp 
> *opp2);
>  int _opp_add(struct device *dev, struct dev_pm_opp *new_opp, struct
> opp_table *opp_table, bool rate_not_available);
>  int _opp_add_v1(struct opp_table *opp_table, struct device *dev,
> unsigned long freq, long u_volt, bool dynamic);
>  void _dev_pm_opp_cpumask_remove_table(const struct cpumask *cpumask,
> int last_cpu);
Sibi Sankar May 4, 2020, 8:47 p.m. UTC | #17
On 2020-04-24 21:24, Georgi Djakov wrote:
> When we read the OPP keys, it would be nice to do some sanity checks
> of the values we get from DT and see if they match with the information
> that is populated in the OPP table. Let's pass a pointer of the table,
> so that we can do some validation.
> 

Reviewed-by: Sibi Sankar <sibis@codeaurora.org>

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> New patch.
> 
...
Sibi Sankar May 4, 2020, 8:50 p.m. UTC | #18
Hey Georgi,

On 2020-04-24 21:24, Georgi Djakov wrote:
> In addition to clocks and regulators, some devices can scale the 
> bandwidth
> of their on-chip interconnect - for example between CPU and DDR memory. 
> Add
> support for that, so that platforms which support it can make use of 
> it.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Drop using dev_pm_opp_set_paths(), as it has been removed.
> * Add Kconfig dependency on INTERCONNECT, as it can be module.
> 
> 
> v2: 
> https://lore.kernel.org/r/20190423132823.7915-6-georgi.djakov@linaro.org
> 
>  drivers/cpufreq/Kconfig      |  1 +
>  drivers/cpufreq/cpufreq-dt.c | 15 +++++++++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> index c3e6bd59e920..db2ad54ee67f 100644
> --- a/drivers/cpufreq/Kconfig
> +++ b/drivers/cpufreq/Kconfig
> @@ -217,6 +217,7 @@ config CPUFREQ_DT
> 
>  config CPUFREQ_DT_PLATDEV
>  	bool
> +	depends on INTERCONNECT || !INTERCONNECT
>  	help
>  	  This adds a generic DT based cpufreq platdev driver for frequency
>  	  management.  This creates a 'cpufreq-dt' platform device, on the
> diff --git a/drivers/cpufreq/cpufreq-dt.c 
> b/drivers/cpufreq/cpufreq-dt.c
> index 26fe8dfb9ce6..4ecef3257532 100644
> --- a/drivers/cpufreq/cpufreq-dt.c
> +++ b/drivers/cpufreq/cpufreq-dt.c
> @@ -13,6 +13,7 @@
>  #include <linux/cpufreq.h>
>  #include <linux/cpumask.h>
>  #include <linux/err.h>
> +#include <linux/interconnect.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
>  #include <linux/pm_opp.h>
> @@ -95,6 +96,7 @@ static int resources_available(void)
>  	struct device *cpu_dev;
>  	struct regulator *cpu_reg;
>  	struct clk *cpu_clk;
> +	struct icc_path *cpu_path;
>  	int ret = 0;
>  	const char *name;
> 
> @@ -121,6 +123,19 @@ static int resources_available(void)
> 
>  	clk_put(cpu_clk);
> 
> +	cpu_path = of_icc_get(cpu_dev, NULL);
> +	ret = PTR_ERR_OR_ZERO(cpu_path);

Wouldn't we want to verify all
available paths instead of just
the first path?

> +	if (ret) {
> +		if (ret == -EPROBE_DEFER)
> +			dev_dbg(cpu_dev, "defer icc path: %d\n", ret);
> +		else
> +			dev_err(cpu_dev, "failed to get icc path: %d\n", ret);
> +
> +		return ret;
> +	}
> +
> +	icc_put(cpu_path);
> +
>  	name = find_supply_name(cpu_dev);
>  	/* Platform doesn't require regulator */
>  	if (!name)
Sibi Sankar May 4, 2020, 8:54 p.m. UTC | #19
On 2020-04-24 21:24, Georgi Djakov wrote:
> If the OPP bandwidth values are populated, we want to switch also the
> interconnect bandwidth in addition to frequency and voltage.
> 

https://patchwork.kernel.org/patch/11527571/

Scaling from set_rate or using ^^
to set bw levels, I'm fine with
both.

Reviewed-by: Sibi Sankar <sibis@codeaurora.org>

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed review comments from Viresh.
> 
> v2: 
> https://lore.kernel.org/r/20190423132823.7915-5-georgi.djakov@linaro.org
> 
>  drivers/opp/core.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 8e86811eb7b2..66a8ea10f3de 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -808,7 +808,7 @@ int dev_pm_opp_set_rate(struct device *dev,
...
Sibi Sankar May 4, 2020, 8:58 p.m. UTC | #20
On 2020-04-24 21:24, Georgi Djakov wrote:
> This is the same as the traditional of_icc_get() function, but the
> difference is that it takes index as an argument, instead of name.
> 

Reviewed-by: Sibi Sankar <sibis@codeaurora.org>

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed review comments from Sibi.
> * Re-based patch.
> 
> v2: 
> https://lore.kernel.org/r/20190423132823.7915-3-georgi.djakov@linaro.org
> 
>  drivers/interconnect/core.c  | 68 +++++++++++++++++++++++++++---------
>  include/linux/interconnect.h |  6 ++++
>  2 files changed, 58 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c
> index 2c6515e3ecf1..648237f4de49 100644
> --- a/drivers/interconnect/core.c
> +++ b/drivers/interconnect/core.c
> @@ -351,9 +351,9 @@ static struct icc_node
...
Saravana Kannan May 4, 2020, 9:01 p.m. UTC | #21
On Sun, May 3, 2020 at 10:00 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 30-04-20, 09:32, Saravana Kannan wrote:
> > You are missing the point. This is not about aggregation. This is
> > about OPP voting for bandwidth on a path when the vote can/should be
> > 0.
> >
> > I'll give another example. Say one of the interconnect paths needs to
> > be voted only when a particular use case is running. Say, the GPU
> > needs to vote for bandwidth to L3 only when it's running in cache
> > coherent mode. But it always needs to vote for bandwidth to DDR. With
> > the way it's written now, OPP is going to force vote a non-zero
> > bandwidth to L3 even when it can be zero. Wasting power for no good
> > reason.
> >
> > Just let the drivers/device get the bandwidth values from OPP without
> > forcing them to vote for the bandwidth when they don't need to. Just
> > because they decide to use OPP to set their clock doesn't mean they
> > should lose to ability to control their bandwidth in a more
> > intelligent fashion.
>
> They shouldn't use opp_set_rate() in such a scenario. Why should they?
>
> opp_set_rate() was introduced to take care of only the simple cases
> and the complex ones are left for the drivers to handle. For example,
> they take care of programming multiple regulators (in case of TI), as
> OPP core can't know the order in which regulators need to be
> programmed. But for the simple cases, opp core can program everything
> the way it is presented in DT.

Fair enough. But don't "voltage corner" based devices NEED to use OPP
framework to set their frequencies?

Because, if voltage corners are only handled through OPP framework,
then any device that uses voltage corners doesn't get to pick and
choose when to vote for what path. Also, maybe a one liner helper
function to enable BW voting using OPP framework by default might be
another option. Something like:
dev_pm_opp_enable_bw_voting(struct device *dev)?

If devices with voltage corners can still do their own
frequency/voltage corner control without having to use OPP framework,
then I agree with your point above.

-Saravana
Sibi Sankar May 4, 2020, 9:03 p.m. UTC | #22
Hey Georgi,

Apart from the Matthias's comments
ran into a few issues during testing.

On 2020-04-24 21:24, Georgi Djakov wrote:
> The OPP bindings now support bandwidth values, so add support to parse 
> it
> from device tree and store it into the new dev_pm_opp_icc_bw struct, 
> which
> is part of the dev_pm_opp.
> 
> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>
> ---
> v7:
> * Addressed some review comments from Viresh and Sibi.
> * Various other changes.
> 
> v2:
> https://lore.kernel.org/linux-arm-msm/20190423132823.7915-4-georgi.djakov@linaro.org/
> 
>  drivers/opp/Kconfig    |   1 +
>  drivers/opp/core.c     |  16 +++++-
>  drivers/opp/of.c       | 119 ++++++++++++++++++++++++++++++++++++++++-
>  drivers/opp/opp.h      |   9 ++++
>  include/linux/pm_opp.h |  12 +++++
>  5 files changed, 153 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/opp/Kconfig b/drivers/opp/Kconfig
> index 35dfc7e80f92..230d2b84436c 100644
> --- a/drivers/opp/Kconfig
> +++ b/drivers/opp/Kconfig
> @@ -2,6 +2,7 @@
>  config PM_OPP
>  	bool
>  	select SRCU
> +	depends on INTERCONNECT || !INTERCONNECT
>  	---help---
>  	  SOCs have a standard set of tuples consisting of frequency and
>  	  voltage pairs that the device will support per voltage domain. This
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index c9c1bbe6ae27..8e86811eb7b2 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -985,6 +985,12 @@ static struct opp_table
> *_allocate_opp_table(struct device *dev, int index)
>  				ret);
>  	}
> 
> +	/* Find interconnect path(s) for the device */
> +	ret = _of_find_paths(opp_table, dev);
> +	if (ret)
> +		dev_dbg(dev, "%s: Error finding interconnect paths: %d\n",
> +			__func__, ret);
> +
>  	BLOCKING_INIT_NOTIFIER_HEAD(&opp_table->head);
>  	INIT_LIST_HEAD(&opp_table->opp_list);
>  	kref_init(&opp_table->kref);
> @@ -1229,19 +1235,22 @@ 
> EXPORT_SYMBOL_GPL(dev_pm_opp_remove_all_dynamic);
>  struct dev_pm_opp *_opp_allocate(struct opp_table *table)
>  {
>  	struct dev_pm_opp *opp;
> -	int count, supply_size;
> +	int count, supply_size, icc_size;
> 
>  	/* Allocate space for at least one supply */
>  	count = table->regulator_count > 0 ? table->regulator_count : 1;
>  	supply_size = sizeof(*opp->supplies) * count;
> +	icc_size = sizeof(*opp->bandwidth) * table->path_count;
> 
>  	/* allocate new OPP node and supplies structures */
> -	opp = kzalloc(sizeof(*opp) + supply_size, GFP_KERNEL);
> +	opp = kzalloc(sizeof(*opp) + supply_size + icc_size, GFP_KERNEL);
> +
>  	if (!opp)
>  		return NULL;
> 
>  	/* Put the supplies at the end of the OPP structure as an empty array 
> */
>  	opp->supplies = (struct dev_pm_opp_supply *)(opp + 1);
> +	opp->bandwidth = (struct dev_pm_opp_icc_bw *)(opp->supplies + 1);
>  	INIT_LIST_HEAD(&opp->node);
> 
>  	return opp;
> @@ -1276,6 +1285,9 @@ int _opp_compare_key(struct dev_pm_opp *opp1,
> struct dev_pm_opp *opp2)
>  {
>  	if (opp1->rate != opp2->rate)
>  		return opp1->rate < opp2->rate ? -1 : 1;
> +	if (opp1->bandwidth && opp2->bandwidth &&
> +	    opp1->bandwidth[0].peak != opp2->bandwidth[0].peak)
> +		return opp1->bandwidth[0].peak < opp2->bandwidth[0].peak ? -1 : 1;
>  	if (opp1->level != opp2->level)
>  		return opp1->level < opp2->level ? -1 : 1;
>  	return 0;
> diff --git a/drivers/opp/of.c b/drivers/opp/of.c
> index e33169c7e045..978e445b0cdb 100644
> --- a/drivers/opp/of.c
> +++ b/drivers/opp/of.c
> @@ -332,6 +332,59 @@ static int _of_opp_alloc_required_opps(struct
> opp_table *opp_table,
>  	return ret;
>  }
> 
> +int _of_find_paths(struct opp_table *opp_table, struct device *dev)
> +{
> +	struct device_node *np;
> +	int ret, i, count, num_paths;
> +
> +	np = of_node_get(dev->of_node);
> +	if (!np)
> +		return 0;
> +
> +	count = of_count_phandle_with_args(np, "interconnects",
> +					   "#interconnect-cells");
> +	of_node_put(np);
> +	if (count < 0)
> +		return 0;
> +
> +	/* two phandles when #interconnect-cells = <1> */
> +	if (count % 2) {
> +		dev_err(dev, "%s: Invalid interconnects values\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	num_paths = count / 2;
> +	opp_table->paths = kcalloc(num_paths, sizeof(*opp_table->paths),
> +				   GFP_KERNEL);
> +	if (!opp_table->paths)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_paths; i++) {
> +		opp_table->paths[i] = of_icc_get_by_index(dev, i);

now that icc_get is part of the
opp core framework shouldn't we
do a put on table remove as well?

> +		if (IS_ERR(opp_table->paths[i])) {
> +			ret = PTR_ERR(opp_table->paths[i]);
> +			if (ret != -EPROBE_DEFER) {
> +				dev_err(dev, "%s: Unable to get path%d: %d\n",
> +					__func__, i, ret);
> +			}
> +			goto err;
> +		}
> +	}
> +	opp_table->path_count = num_paths;
> +
> +	return 0;
> +
> +err:
> +	while (i--)
> +		icc_put(opp_table->paths[i]);
> +
> +	kfree(opp_table->paths);
> +	opp_table->paths = NULL;
> +
> +	return ret;
> +}
> +
>  static bool _opp_is_supported(struct device *dev, struct opp_table 
> *opp_table,
>  			      struct device_node *np)
>  {
> @@ -524,8 +577,11 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>  static int _read_opp_key(struct dev_pm_opp *new_opp, struct 
> device_node *np,
>  			 bool *rate_not_available)
>  {
> +	struct property *peak, *avg;
> +	u32 *peak_bw, *avg_bw;
>  	u64 rate;
> -	int ret;
> +	int ret, i, count;
> +	bool found = false;
> 
>  	ret = of_property_read_u64(np, "opp-hz", &rate);
>  	if (!ret) {
> @@ -535,10 +591,69 @@ static int _read_opp_key(struct dev_pm_opp
> *new_opp, struct device_node *np,
>  		 * bit guaranteed in clk API.
>  		 */
>  		new_opp->rate = (unsigned long)rate;
> +		found = true;
>  	}
>  	*rate_not_available = !!ret;
> 
> -	of_property_read_u32(np, "opp-level", &new_opp->level);
> +	peak = of_find_property(np, "opp-peak-kBps", NULL);
> +	if (peak) {
> +		/*
> +		 * Bandwidth consists of peak and average (optional) values:
> +		 * opp-peak-kBps = <path1_value path2_value>;
> +		 * opp-avg-kBps = <path1_value path2_value>;
> +		 */
> +		count = peak->length / sizeof(u32);
> +		peak_bw = kmalloc_array(count, sizeof(*peak_bw), GFP_KERNEL);
> +		if (!peak_bw)
> +			return -ENOMEM;
> +
> +		ret = of_property_read_u32_array(np, "opp-peak-kBps", peak_bw,
> +						 count);
> +		if (ret) {
> +			pr_err("%s: Error parsing opp-peak-kBps: %d\n",
> +			       __func__, ret);
> +			goto free_peak_bw;
> +		}
> +
> +		for (i = 0; i < count; i++)
> +			new_opp->bandwidth[i].peak = kBps_to_icc(peak_bw[i]);
> +
> +		found = true;
> +	}
> +
> +	avg = of_find_property(np, "opp-avg-kBps", NULL);
> +	if (peak && avg) {
> +		count = avg->length / sizeof(u32);
> +		avg_bw = kmalloc_array(count, sizeof(*avg_bw), GFP_KERNEL);
> +		if (!avg_bw) {
> +			ret = -ENOMEM;
> +			goto free_peak_bw;
> +		}
> +
> +		ret = of_property_read_u32_array(np, "opp-avg-kBps", avg_bw,
> +						 count);
> +		if (ret) {
> +			pr_err("%s: Error parsing opp-avg-kBps: %d\n",
> +			       __func__, ret);
> +			goto free_avg_bw;
> +		}
> +
> +		for (i = 0; i < count; i++)
> +			new_opp->bandwidth[i].avg = kBps_to_icc(avg_bw[i]);
> +	}
> +
> +	if (of_property_read_u32(np, "opp-level", &new_opp->level))

of_property_read_u32 returns 0 on
success so should use logical not
instead ^^

> +		found = true;
> +
> +	if (found)
> +		return 0;
> +
> +	return ret;
> +
> +free_avg_bw:
> +	kfree(avg_bw);
> +free_peak_bw:
> +	kfree(peak_bw);
> 
>  	return ret;
>  }
> diff --git a/drivers/opp/opp.h b/drivers/opp/opp.h
> index bcadb1e328a4..2e0113360297 100644
> --- a/drivers/opp/opp.h
> +++ b/drivers/opp/opp.h
> @@ -12,6 +12,7 @@
>  #define __DRIVER_OPP_H__
> 
>  #include <linux/device.h>
> +#include <linux/interconnect.h>
>  #include <linux/kernel.h>
>  #include <linux/kref.h>
>  #include <linux/list.h>
> @@ -59,6 +60,7 @@ extern struct list_head opp_tables;
>   * @rate:	Frequency in hertz
>   * @level:	Performance level
>   * @supplies:	Power supplies voltage/current values
> + * @bandwidth:	Interconnect bandwidth values
>   * @clock_latency_ns: Latency (in nanoseconds) of switching to this 
> OPP's
>   *		frequency from any other OPP's frequency.
>   * @required_opps: List of OPPs that are required by this OPP.
> @@ -81,6 +83,7 @@ struct dev_pm_opp {
>  	unsigned int level;
> 
>  	struct dev_pm_opp_supply *supplies;
> +	struct dev_pm_opp_icc_bw *bandwidth;
> 
>  	unsigned long clock_latency_ns;
> 
> @@ -146,6 +149,8 @@ enum opp_table_access {
>   * @regulator_count: Number of power supply regulators. Its value can 
> be -1
>   * (uninitialized), 0 (no opp-microvolt property) or > 0 (has 
> opp-microvolt
>   * property).
> + * @paths: Interconnect path handles
> + * @path_count: Number of interconnect paths
>   * @genpd_performance_state: Device's power domain support performance 
> state.
>   * @is_genpd: Marks if the OPP table belongs to a genpd.
>   * @set_opp: Platform specific set_opp callback
> @@ -189,6 +194,8 @@ struct opp_table {
>  	struct clk *clk;
>  	struct regulator **regulators;
>  	int regulator_count;
> +	struct icc_path **paths;
> +	unsigned int path_count;
>  	bool genpd_performance_state;
>  	bool is_genpd;
> 
> @@ -224,12 +231,14 @@ void _of_clear_opp_table(struct opp_table 
> *opp_table);
>  struct opp_table *_managed_opp(struct device *dev, int index);
>  void _of_opp_free_required_opps(struct opp_table *opp_table,
>  				struct dev_pm_opp *opp);
> +int _of_find_paths(struct opp_table *opp_table, struct device *dev);
>  #else
>  static inline void _of_init_opp_table(struct opp_table *opp_table,
> struct device *dev, int index) {}
>  static inline void _of_clear_opp_table(struct opp_table *opp_table) {}
>  static inline struct opp_table *_managed_opp(struct device *dev, int
> index) { return NULL; }
>  static inline void _of_opp_free_required_opps(struct opp_table 
> *opp_table,
>  					      struct dev_pm_opp *opp) {}
> +static inline int _of_find_paths(struct opp_table *opp_table, struct
> device *dev) { return 0; }
>  #endif
> 
>  #ifdef CONFIG_DEBUG_FS
> diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
> index 747861816f4f..cfceb0290401 100644
> --- a/include/linux/pm_opp.h
> +++ b/include/linux/pm_opp.h
> @@ -41,6 +41,18 @@ struct dev_pm_opp_supply {
>  	unsigned long u_amp;
>  };
> 
> +/**
> + * struct dev_pm_opp_icc_bw - Interconnect bandwidth values
> + * @avg:	Average bandwidth corresponding to this OPP (in icc units)
> + * @peak:	Peak bandwidth corresponding to this OPP (in icc units)
> + *
> + * This structure stores the bandwidth values for a single 
> interconnect path.
> + */
> +struct dev_pm_opp_icc_bw {
> +	u32 avg;
> +	u32 peak;
> +};
> +
>  /**
>   * struct dev_pm_opp_info - OPP freq/voltage/current values
>   * @rate:	Target clk rate in hz
Viresh Kumar May 5, 2020, 3:38 a.m. UTC | #23
On 04-05-20, 14:01, Saravana Kannan wrote:
> Fair enough. But don't "voltage corner" based devices NEED to use OPP
> framework to set their frequencies?

No. Anyone can call dev_pm_genpd_set_performance_state().