[v8,00/34] NVIDIA Tegra power management patches for 5.16

Message ID	20210817012754.8710-1-digetx@gmail.com
Headers	show Return-Path: <devicetree-owner@vger.kernel.org> From: Dmitry Osipenko <digetx@gmail.com> To: Thierry Reding <thierry.reding@gmail.com>, Jonathan Hunter <jonathanh@nvidia.com>, Ulf Hansson <ulf.hansson@linaro.org>, Viresh Kumar <vireshk@kernel.org>, Stephen Boyd <sboyd@kernel.org>, Peter De Schrijver <pdeschrijver@nvidia.com>, Mikko Perttunen <mperttunen@nvidia.com>, Peter Chen <peter.chen@kernel.org>, Mark Brown <broonie@kernel.org>, Lee Jones <lee.jones@linaro.org>, =?utf-8?q?Uwe_Kleine-K=C3=B6nig?= <u.kleine-koenig@pengutronix.de>, Nishanth Menon <nm@ti.com>, Vignesh Raghavendra <vigneshr@ti.com>, Richard Weinberger <richard@nod.at>, Miquel Raynal <miquel.raynal@bootlin.com>, Lucas Stach <dev@lynxeye.de>, Stefan Agner <stefan@agner.ch>, Adrian Hunter <adrian.hunter@intel.com>, Mauro Carvalho Chehab <mchehab@kernel.org>, Rob Herring <robh+dt@kernel.org>, Michael Turquette <mturquette@baylibre.com> Cc: linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org, linux-pm@vger.kernel.org, linux-usb@vger.kernel.org, linux-staging@lists.linux.dev, linux-spi@vger.kernel.org, linux-pwm@vger.kernel.org, linux-mtd@lists.infradead.org, linux-mmc@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, devicetree@vger.kernel.org, linux-clk@vger.kernel.org Subject: [PATCH v8 00/34] NVIDIA Tegra power management patches for 5.16 Date: Tue, 17 Aug 2021 04:27:20 +0300 Message-Id: <20210817012754.8710-1-digetx@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	NVIDIA Tegra power management patches for 5.16 \| expand [v8,00/34] NVIDIA Tegra power management patches for 5.16 [v8,06/34] dt-bindings: clock: tegra-car: Document new tegra-clocks sub-node [v8,08/34] dt-bindings: host1x: Document OPP and power domain properties [v8,09/34] dt-bindings: host1x: Document Memory Client resets of Host1x, GR2D and GR3D [v8,23/34] media: dt: bindings: tegra-vde: Convert to schema [v8,24/34] media: dt: bindings: tegra-vde: Document OPP and power domain

Dmitry Osipenko Aug. 17, 2021, 1:27 a.m. UTC

This series adds runtime PM support to Tegra drivers and enables core
voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.

All patches should go via Tegra tree because they are interdependent,
please review and ack.

If you haven't seen this series before, that's because I wanted to
finalize the GENPD part at first and didn't bother you previously.

Changelog:

v8: - Added new generic dev_pm_opp_sync() helper that syncs OPP state with
      hardware. All drivers changed to use it. This replaces GENPD attach_dev
      callback hacks that were used in v7.

    - Added new patch patch "soc/tegra: regulators: Prepare for suspend"
      that fixes dying Tegra20 SoC after enabling VENC power domain during
      resume from suspend. It matches to what downstream kernel does on
      suspend/resume.

    - After a second thought, I dropped patches which added RPM to memory
      drivers since hardware is always-on and RPM not needed.

    - Replaced the "dummy host1x driver" patch with new "Disable unused
      host1x hardware" patch, since it's a cleaner solution.

Dmitry Osipenko (34):
  opp: Add dev_pm_opp_sync() helper
  soc/tegra: pmc: Disable PMC state syncing
  soc/tegra: Don't print error message when OPPs not available
  soc/tegra: Add devm_tegra_core_dev_init_opp_table_simple()
  soc/tegra: Use dev_pm_opp_sync()
  dt-bindings: clock: tegra-car: Document new tegra-clocks sub-node
  clk: tegra: Support runtime PM and power domain
  dt-bindings: host1x: Document OPP and power domain properties
  dt-bindings: host1x: Document Memory Client resets of Host1x, GR2D and
    GR3D
  gpu: host1x: Add host1x_channel_stop()
  gpu: host1x: Add runtime PM and OPP support
  drm/tegra: dc: Support OPP and SoC core voltage scaling
  drm/tegra: hdmi: Add OPP support
  drm/tegra: gr2d: Support power management
  drm/tegra: gr3d: Support power management
  drm/tegra: vic: Support system suspend
  usb: chipidea: tegra: Add runtime PM and OPP support
  bus: tegra-gmi: Add runtime PM and OPP support
  pwm: tegra: Add runtime PM and OPP support
  mmc: sdhci-tegra: Add runtime PM and OPP support
  mtd: rawnand: tegra: Add runtime PM and OPP support
  spi: tegra20-slink: Add OPP support
  media: dt: bindings: tegra-vde: Convert to schema
  media: dt: bindings: tegra-vde: Document OPP and power domain
  media: staging: tegra-vde: Support generic power domain and OPP
  soc/tegra: fuse: Add OPP support
  soc/tegra: fuse: Reset hardware
  soc/tegra: regulators: Prepare for suspend
  soc/tegra: pmc: Enable core domain support for Tegra20 and Tegra30
  ARM: tegra: Add OPP tables and power domains to Tegra20 device-trees
  ARM: tegra: Add OPP tables and power domains to Tegra30 device-trees
  ARM: tegra: Add Memory Client resets to Tegra20 GR2D, GR3D and Host1x
  ARM: tegra: Add Memory Client resets to Tegra30 GR2D, GR3D and Host1x
  ARM: tegra20/30: Disable unused host1x hardware

 .../bindings/clock/nvidia,tegra20-car.yaml    |   51 +
 .../display/tegra/nvidia,tegra20-host1x.txt   |   53 +
 .../bindings/media/nvidia,tegra-vde.txt       |   64 -
 .../bindings/media/nvidia,tegra-vde.yaml      |  119 ++
 .../boot/dts/tegra20-acer-a500-picasso.dts    |    1 +
 arch/arm/boot/dts/tegra20-colibri.dtsi        |    3 +-
 arch/arm/boot/dts/tegra20-harmony.dts         |    3 +-
 arch/arm/boot/dts/tegra20-paz00.dts           |    1 +
 .../arm/boot/dts/tegra20-peripherals-opp.dtsi |  941 +++++++++++
 arch/arm/boot/dts/tegra20-seaboard.dts        |    3 +-
 arch/arm/boot/dts/tegra20-tamonten.dtsi       |    3 +-
 arch/arm/boot/dts/tegra20-trimslice.dts       |    9 +
 arch/arm/boot/dts/tegra20-ventana.dts         |    1 +
 arch/arm/boot/dts/tegra20.dtsi                |  119 +-
 .../tegra30-asus-nexus7-grouper-common.dtsi   |    1 +
 arch/arm/boot/dts/tegra30-beaver.dts          |    1 +
 arch/arm/boot/dts/tegra30-cardhu.dtsi         |    1 +
 arch/arm/boot/dts/tegra30-colibri.dtsi        |   17 +-
 arch/arm/boot/dts/tegra30-ouya.dts            |    1 +
 .../arm/boot/dts/tegra30-peripherals-opp.dtsi | 1412 +++++++++++++++++
 arch/arm/boot/dts/tegra30.dtsi                |  181 ++-
 drivers/bus/tegra-gmi.c                       |   92 +-
 drivers/clk/tegra/Makefile                    |    1 +
 drivers/clk/tegra/clk-device.c                |  222 +++
 drivers/clk/tegra/clk-pll.c                   |    2 +-
 drivers/clk/tegra/clk-super.c                 |    2 +-
 drivers/clk/tegra/clk-tegra20.c               |   39 +-
 drivers/clk/tegra/clk-tegra30.c               |   70 +-
 drivers/clk/tegra/clk.c                       |   64 +
 drivers/clk/tegra/clk.h                       |    2 +
 drivers/gpu/drm/tegra/dc.c                    |   74 +
 drivers/gpu/drm/tegra/dc.h                    |    2 +
 drivers/gpu/drm/tegra/gr2d.c                  |  154 +-
 drivers/gpu/drm/tegra/gr3d.c                  |  393 ++++-
 drivers/gpu/drm/tegra/hdmi.c                  |   15 +-
 drivers/gpu/drm/tegra/vic.c                   |    4 +
 drivers/gpu/host1x/channel.c                  |    8 +
 drivers/gpu/host1x/debug.c                    |   15 +
 drivers/gpu/host1x/dev.c                      |  157 +-
 drivers/gpu/host1x/dev.h                      |    3 +-
 drivers/gpu/host1x/hw/channel_hw.c            |   44 +-
 drivers/gpu/host1x/intr.c                     |    3 -
 drivers/gpu/host1x/syncpt.c                   |    5 +-
 drivers/mmc/host/sdhci-tegra.c                |  146 +-
 drivers/mtd/nand/raw/tegra_nand.c             |   62 +-
 drivers/opp/core.c                            |   42 +-
 drivers/pwm/pwm-tegra.c                       |  104 +-
 drivers/soc/tegra/common.c                    |   34 +-
 drivers/soc/tegra/fuse/fuse-tegra.c           |   36 +
 drivers/soc/tegra/fuse/fuse.h                 |    1 +
 drivers/soc/tegra/pmc.c                       |   17 +
 drivers/soc/tegra/regulators-tegra20.c        |   99 ++
 drivers/soc/tegra/regulators-tegra30.c        |  122 ++
 drivers/spi/spi-tegra20-slink.c               |   15 +-
 drivers/staging/media/tegra-vde/vde.c         |   65 +-
 drivers/usb/chipidea/ci_hdrc_tegra.c          |   61 +-
 include/linux/host1x.h                        |    1 +
 include/linux/pm_opp.h                        |    6 +
 include/soc/tegra/common.h                    |   13 +
 59 files changed, 4796 insertions(+), 384 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/media/nvidia,tegra-vde.txt
 create mode 100644 Documentation/devicetree/bindings/media/nvidia,tegra-vde.yaml
 create mode 100644 drivers/clk/tegra/clk-device.c

Viresh Kumar Aug. 17, 2021, 7:55 a.m. UTC | #1

On 17-08-21, 04:27, Dmitry Osipenko wrote:
> Add dev_pm_opp_sync() helper which syncs OPP table with hardware state
> and vice versa.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/opp/core.c     | 42 +++++++++++++++++++++++++++++++++++++++---
>  include/linux/pm_opp.h |  6 ++++++
>  2 files changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 5543c54dacc5..18016e49605f 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -939,7 +939,8 @@ static int _set_required_opps(struct device *dev,
>  	return ret;
>  }
>  
> -static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
> +static struct dev_pm_opp *
> +_find_current_opp(struct device *dev, struct opp_table *opp_table)
>  {
>  	struct dev_pm_opp *opp = ERR_PTR(-ENODEV);
>  	unsigned long freq;
> @@ -961,7 +962,7 @@ static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
>  		mutex_unlock(&opp_table->lock);
>  	}
>  
> -	opp_table->current_opp = opp;
> +	return opp;
>  }
>  
>  static int _disable_opp_table(struct device *dev, struct opp_table *opp_table)
> @@ -1003,7 +1004,7 @@ static int _set_opp(struct device *dev, struct opp_table *opp_table,
>  
>  	/* Find the currently set OPP if we don't know already */
>  	if (unlikely(!opp_table->current_opp))
> -		_find_current_opp(dev, opp_table);
> +		opp_table->current_opp = _find_current_opp(dev, opp_table);
>  
>  	old_opp = opp_table->current_opp;
>  
> @@ -2931,3 +2932,38 @@ int dev_pm_opp_sync_regulators(struct device *dev)
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_sync_regulators);
> +
> +/**
> + * dev_pm_opp_sync() - Sync OPP state
> + * @dev:	device for which we do this operation
> + *
> + * Initialize OPP table accordingly to current clock rate or
> + * first available OPP if clock not available for this device.
> + *
> + * Return: 0 on success or a negative error value.
> + */
> +int dev_pm_opp_sync(struct device *dev)
> +{
> +	struct opp_table *opp_table;
> +	struct dev_pm_opp *opp;
> +	int ret = 0;
> +
> +	/* Device may not have OPP table */
> +	opp_table = _find_opp_table(dev);
> +	if (IS_ERR(opp_table))
> +		return 0;
> +
> +	if (!_get_opp_count(opp_table))
> +		goto put_table;
> +
> +	opp = _find_current_opp(dev, opp_table);
> +	ret = _set_opp(dev, opp_table, opp, opp->rate);

And I am not sure how this will end up working, since new OPP will be
equal to old one. Since I see you call this from resume() at many
places.

what exactly are you trying to do here ? Those details would be good
to have in commit log as well, I haven't really followed V7 of your
series.

Miquel Raynal Aug. 17, 2021, 8:41 a.m. UTC | #2

Hi Dmitry,

Dmitry Osipenko <digetx@gmail.com> wrote on Tue, 17 Aug 2021 04:27:41
+0300:

> The NAND on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now NAND must be resumed using
> runtime PM API in order to initialize the NAND power state. Add runtime PM
> and OPP support to the NAND driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/mtd/nand/raw/tegra_nand.c | 62 +++++++++++++++++++++++++++----
>  1 file changed, 54 insertions(+), 8 deletions(-)
> 

Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>

Thanks,
Miquèl

Ulf Hansson Aug. 17, 2021, 12:04 p.m. UTC | #3

On Tue, 17 Aug 2021 at 03:30, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add runtime PM and OPP support to the Host1x driver. It's required for
> enabling system-wide DVFS and supporting dynamic power management using
> a generic power domain. For the starter we will keep host1x always-on
> because dynamic power management require a major refactoring of the driver
> code since lot's of code paths will need the RPM handling and we're going
> to remove some of these paths in the future. Host1x doesn't consume much
> power so it is good enough, we at least need to resume Host1x in order
> to initialize the power state.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---

[...]

> +
>  static int host1x_probe(struct platform_device *pdev)
>  {
>         struct host1x *host;
> @@ -394,6 +423,10 @@ static int host1x_probe(struct platform_device *pdev)
>         /* set common host1x device data */
>         platform_set_drvdata(pdev, host);
>
> +       err = devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);
> +       if (err)
> +               return err;
> +
>         host->regs = devm_ioremap_resource(&pdev->dev, regs);
>         if (IS_ERR(host->regs))
>                 return PTR_ERR(host->regs);
> @@ -423,12 +456,9 @@ static int host1x_probe(struct platform_device *pdev)
>                 return err;
>         }
>
> -       host->rst = devm_reset_control_get(&pdev->dev, "host1x");
> -       if (IS_ERR(host->rst)) {
> -               err = PTR_ERR(host->rst);
> -               dev_err(&pdev->dev, "failed to get reset: %d\n", err);
> +       err = host1x_get_resets(host);
> +       if (err)
>                 return err;
> -       }
>
>         err = host1x_iommu_init(host);
>         if (err < 0) {
> @@ -443,22 +473,10 @@ static int host1x_probe(struct platform_device *pdev)
>                 goto iommu_exit;
>         }
>
> -       err = clk_prepare_enable(host->clk);
> -       if (err < 0) {
> -               dev_err(&pdev->dev, "failed to enable clock\n");
> -               goto free_channels;
> -       }
> -
> -       err = reset_control_deassert(host->rst);
> -       if (err < 0) {
> -               dev_err(&pdev->dev, "failed to deassert reset: %d\n", err);
> -               goto unprepare_disable;
> -       }
> -

Removing the clk_prepare_enable() and reset_control_deassert() from
host1x_probe(), might not be a good idea. See more about why, below.

>         err = host1x_syncpt_init(host);
>         if (err) {
>                 dev_err(&pdev->dev, "failed to initialize syncpts\n");
> -               goto reset_assert;
> +               goto free_channels;
>         }
>
>         err = host1x_intr_init(host, syncpt_irq);
> @@ -467,10 +485,14 @@ static int host1x_probe(struct platform_device *pdev)
>                 goto deinit_syncpt;
>         }
>
> -       host1x_debug_init(host);
> +       pm_runtime_enable(&pdev->dev);
>
> -       if (host->info->has_hypervisor)
> -               host1x_setup_sid_table(host);
> +       /* the driver's code isn't ready yet for the dynamic RPM */
> +       err = pm_runtime_resume_and_get(&pdev->dev);

If the driver is being built with the CONFIG_PM Kconfig option being
unset, pm_runtime_resume_and_get() will return 0 to indicate success -
and without calling the ->runtime_resume() callback.
In other words, the clock will remain gated and the reset will not be
deasserted, likely causing the driver to be malfunctioning.

If the driver isn't ever being built with CONFIG_PM unset, feel free
to ignore my above comments.

Otherwise, if it needs to work both with and without CONFIG_PM being
set, you may use the following pattern in host1x_probe() to deploy
runtime PM support:

"Enable the needed resources to probe the device"
pm_runtime_get_noresume()
pm_runtime_set_active()
pm_runtime_enable()

"Before successfully completing probe"
pm_runtime_put()

> +       if (err)
> +               goto deinit_intr;
> +
> +       host1x_debug_init(host);
>
>         err = host1x_register(host);
>         if (err < 0)
> @@ -486,13 +508,13 @@ static int host1x_probe(struct platform_device *pdev)
>         host1x_unregister(host);
>  deinit_debugfs:
>         host1x_debug_deinit(host);
> +
> +       pm_runtime_put(&pdev->dev);
> +       pm_runtime_disable(&pdev->dev);
> +deinit_intr:
>         host1x_intr_deinit(host);
>  deinit_syncpt:
>         host1x_syncpt_deinit(host);
> -reset_assert:
> -       reset_control_assert(host->rst);
> -unprepare_disable:
> -       clk_disable_unprepare(host->clk);
>  free_channels:
>         host1x_channel_list_free(&host->channel_list);
>  iommu_exit:

[...]

Kind regards
Uffe

Mark Brown Aug. 17, 2021, 12:22 p.m. UTC | #4

On Tue, Aug 17, 2021 at 04:27:42AM +0300, Dmitry Osipenko wrote:
> The SPI on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now SPI driver must use OPP
> API for driving the controller's clock rate because OPP API takes care
> of reconfiguring the domain's performance state in accordance to the
> rate. Add OPP support to the driver.

Acked-by: Mark Brown <broonie@kernel.org>

Is there a concrete dependency here or can I merge this separately?

Thierry Reding Aug. 17, 2021, 2:02 p.m. UTC | #5

On Tue, Aug 17, 2021 at 02:04:38PM +0200, Ulf Hansson wrote:
> On Tue, 17 Aug 2021 at 03:30, Dmitry Osipenko <digetx@gmail.com> wrote:
> >
> > Add runtime PM and OPP support to the Host1x driver. It's required for
> > enabling system-wide DVFS and supporting dynamic power management using
> > a generic power domain. For the starter we will keep host1x always-on
> > because dynamic power management require a major refactoring of the driver
> > code since lot's of code paths will need the RPM handling and we're going
> > to remove some of these paths in the future. Host1x doesn't consume much
> > power so it is good enough, we at least need to resume Host1x in order
> > to initialize the power state.
> >
> > Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> > Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> > Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> > Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> 
> [...]
> 
> > +
> >  static int host1x_probe(struct platform_device *pdev)
> >  {
> >         struct host1x *host;
> > @@ -394,6 +423,10 @@ static int host1x_probe(struct platform_device *pdev)
> >         /* set common host1x device data */
> >         platform_set_drvdata(pdev, host);
> >
> > +       err = devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);
> > +       if (err)
> > +               return err;
> > +
> >         host->regs = devm_ioremap_resource(&pdev->dev, regs);
> >         if (IS_ERR(host->regs))
> >                 return PTR_ERR(host->regs);
> > @@ -423,12 +456,9 @@ static int host1x_probe(struct platform_device *pdev)
> >                 return err;
> >         }
> >
> > -       host->rst = devm_reset_control_get(&pdev->dev, "host1x");
> > -       if (IS_ERR(host->rst)) {
> > -               err = PTR_ERR(host->rst);
> > -               dev_err(&pdev->dev, "failed to get reset: %d\n", err);
> > +       err = host1x_get_resets(host);
> > +       if (err)
> >                 return err;
> > -       }
> >
> >         err = host1x_iommu_init(host);
> >         if (err < 0) {
> > @@ -443,22 +473,10 @@ static int host1x_probe(struct platform_device *pdev)
> >                 goto iommu_exit;
> >         }
> >
> > -       err = clk_prepare_enable(host->clk);
> > -       if (err < 0) {
> > -               dev_err(&pdev->dev, "failed to enable clock\n");
> > -               goto free_channels;
> > -       }
> > -
> > -       err = reset_control_deassert(host->rst);
> > -       if (err < 0) {
> > -               dev_err(&pdev->dev, "failed to deassert reset: %d\n", err);
> > -               goto unprepare_disable;
> > -       }
> > -
> 
> Removing the clk_prepare_enable() and reset_control_deassert() from
> host1x_probe(), might not be a good idea. See more about why, below.
> 
> >         err = host1x_syncpt_init(host);
> >         if (err) {
> >                 dev_err(&pdev->dev, "failed to initialize syncpts\n");
> > -               goto reset_assert;
> > +               goto free_channels;
> >         }
> >
> >         err = host1x_intr_init(host, syncpt_irq);
> > @@ -467,10 +485,14 @@ static int host1x_probe(struct platform_device *pdev)
> >                 goto deinit_syncpt;
> >         }
> >
> > -       host1x_debug_init(host);
> > +       pm_runtime_enable(&pdev->dev);
> >
> > -       if (host->info->has_hypervisor)
> > -               host1x_setup_sid_table(host);
> > +       /* the driver's code isn't ready yet for the dynamic RPM */
> > +       err = pm_runtime_resume_and_get(&pdev->dev);
> 
> If the driver is being built with the CONFIG_PM Kconfig option being
> unset, pm_runtime_resume_and_get() will return 0 to indicate success -
> and without calling the ->runtime_resume() callback.
> In other words, the clock will remain gated and the reset will not be
> deasserted, likely causing the driver to be malfunctioning.
> 
> If the driver isn't ever being built with CONFIG_PM unset, feel free
> to ignore my above comments.
> 
> Otherwise, if it needs to work both with and without CONFIG_PM being
> set, you may use the following pattern in host1x_probe() to deploy
> runtime PM support:
> 
> "Enable the needed resources to probe the device"
> pm_runtime_get_noresume()
> pm_runtime_set_active()
> pm_runtime_enable()
> 
> "Before successfully completing probe"
> pm_runtime_put()

We made a conscious decision a few years ago to have ARCH_TEGRA select
PM on both 32-bit and 64-bit ARM, specifically to avoid the need to do
this dance (though there are still a few drivers left that do this, I
think).

So I think this should be unnecessary. Unless perhaps if the sysfs PM
controls have any influence on this. As far as I know, as long as the
PM kconfig option is enabled, the sysfs control only influence the
runtime behaviour (i.e. setting the sysfs PM control to "on" is going
to force runtime PM to be resumed) but there's no way to disable
runtime PM altogether via sysfs that would make the above necessary.

Thierry

Dmitry Osipenko Aug. 17, 2021, 3:49 p.m. UTC | #6

17.08.2021 10:55, Viresh Kumar пишет:
...
>> +int dev_pm_opp_sync(struct device *dev)
>> +{
>> +	struct opp_table *opp_table;
>> +	struct dev_pm_opp *opp;
>> +	int ret = 0;
>> +
>> +	/* Device may not have OPP table */
>> +	opp_table = _find_opp_table(dev);
>> +	if (IS_ERR(opp_table))
>> +		return 0;
>> +
>> +	if (!_get_opp_count(opp_table))
>> +		goto put_table;
>> +
>> +	opp = _find_current_opp(dev, opp_table);
>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
> 
> And I am not sure how this will end up working, since new OPP will be
> equal to old one. Since I see you call this from resume() at many
> places.

Initially OPP table is "uninitialized" and opp_table->enabled=false,
hence the first sync always works even if OPP is equal to old one. Once
OPP has been synced, all further syncs are NO-OPs, hence it doesn't
matter how many times syncing is called.

https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012

> what exactly are you trying to do here ? Those details would be good
> to have in commit log as well, I haven't really followed V7 of your
> series.

I'm initializing voltage/power state of OPP table in accordance to the
clock rate, bumping voltage before clock is enabled by device driver.
I'll improve the commit message.

An alternative to the explicit syncing could be something like a new
dev_pm_opp_resume/suspend helpers that will take care of
enabling/disabling the OPP table clock/etc and syncing the OPP state
with h/w.

Dmitry Osipenko Aug. 17, 2021, 3:53 p.m. UTC | #7

17.08.2021 15:22, Mark Brown пишет:
> On Tue, Aug 17, 2021 at 04:27:42AM +0300, Dmitry Osipenko wrote:
>> The SPI on Tegra belongs to the core power domain and we're going to
>> enable GENPD support for the core domain. Now SPI driver must use OPP
>> API for driving the controller's clock rate because OPP API takes care
>> of reconfiguring the domain's performance state in accordance to the
>> rate. Add OPP support to the driver.
> 
> Acked-by: Mark Brown <broonie@kernel.org>
> 
> Is there a concrete dependency here or can I merge this separately?

This patch depends on the new OPP helpers added earlier in this series.
In particular it depends on these patches:

opp: Add dev_pm_opp_sync() helper
soc/tegra: Add devm_tegra_core_dev_init_opp_table_simple()

Thank you for the ack!

Viresh Kumar Aug. 18, 2021, 3:55 a.m. UTC | #8

On 17-08-21, 18:49, Dmitry Osipenko wrote:
> 17.08.2021 10:55, Viresh Kumar пишет:
> ...
> >> +int dev_pm_opp_sync(struct device *dev)
> >> +{
> >> +	struct opp_table *opp_table;
> >> +	struct dev_pm_opp *opp;
> >> +	int ret = 0;
> >> +
> >> +	/* Device may not have OPP table */
> >> +	opp_table = _find_opp_table(dev);
> >> +	if (IS_ERR(opp_table))
> >> +		return 0;
> >> +
> >> +	if (!_get_opp_count(opp_table))
> >> +		goto put_table;
> >> +
> >> +	opp = _find_current_opp(dev, opp_table);
> >> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
> > 
> > And I am not sure how this will end up working, since new OPP will be
> > equal to old one. Since I see you call this from resume() at many
> > places.
> 
> Initially OPP table is "uninitialized" and opp_table->enabled=false,
> hence the first sync always works even if OPP is equal to old one. Once
> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
> matter how many times syncing is called.
> 
> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012

Right, but how will this work from Resume ? Won't that be a no-op ?

Dmitry Osipenko Aug. 18, 2021, 4:12 a.m. UTC | #9

18.08.2021 06:55, Viresh Kumar пишет:
> On 17-08-21, 18:49, Dmitry Osipenko wrote:
>> 17.08.2021 10:55, Viresh Kumar пишет:
>> ...
>>>> +int dev_pm_opp_sync(struct device *dev)
>>>> +{
>>>> +	struct opp_table *opp_table;
>>>> +	struct dev_pm_opp *opp;
>>>> +	int ret = 0;
>>>> +
>>>> +	/* Device may not have OPP table */
>>>> +	opp_table = _find_opp_table(dev);
>>>> +	if (IS_ERR(opp_table))
>>>> +		return 0;
>>>> +
>>>> +	if (!_get_opp_count(opp_table))
>>>> +		goto put_table;
>>>> +
>>>> +	opp = _find_current_opp(dev, opp_table);
>>>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
>>>
>>> And I am not sure how this will end up working, since new OPP will be
>>> equal to old one. Since I see you call this from resume() at many
>>> places.
>>
>> Initially OPP table is "uninitialized" and opp_table->enabled=false,
>> hence the first sync always works even if OPP is equal to old one. Once
>> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
>> matter how many times syncing is called.
>>
>> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012
> 
> Right, but how will this work from Resume ? Won't that be a no-op ?

The first resume initializes the OPP state on sync, all further syncs on
resume are no-ops.

Dmitry Osipenko Aug. 18, 2021, 4:29 a.m. UTC | #10

18.08.2021 07:12, Dmitry Osipenko пишет:
> 18.08.2021 06:55, Viresh Kumar пишет:
>> On 17-08-21, 18:49, Dmitry Osipenko wrote:
>>> 17.08.2021 10:55, Viresh Kumar пишет:
>>> ...
>>>>> +int dev_pm_opp_sync(struct device *dev)
>>>>> +{
>>>>> +	struct opp_table *opp_table;
>>>>> +	struct dev_pm_opp *opp;
>>>>> +	int ret = 0;
>>>>> +
>>>>> +	/* Device may not have OPP table */
>>>>> +	opp_table = _find_opp_table(dev);
>>>>> +	if (IS_ERR(opp_table))
>>>>> +		return 0;
>>>>> +
>>>>> +	if (!_get_opp_count(opp_table))
>>>>> +		goto put_table;
>>>>> +
>>>>> +	opp = _find_current_opp(dev, opp_table);
>>>>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
>>>>
>>>> And I am not sure how this will end up working, since new OPP will be
>>>> equal to old one. Since I see you call this from resume() at many
>>>> places.
>>>
>>> Initially OPP table is "uninitialized" and opp_table->enabled=false,
>>> hence the first sync always works even if OPP is equal to old one. Once
>>> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
>>> matter how many times syncing is called.
>>>
>>> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012
>>
>> Right, but how will this work from Resume ? Won't that be a no-op ?
> 
> The first resume initializes the OPP state on sync, all further syncs on
> resume are no-ops.
> 

Notice that we use GENPD here. GENPD core takes care of storing PD's
performance state (voltage in case of Tegra) and dropping it to 0 after
rpm-suspend, GENPD core also restores the state before rpm-resume.

Dmitry Osipenko Aug. 18, 2021, 4:30 a.m. UTC | #11

18.08.2021 07:29, Dmitry Osipenko пишет:
> 18.08.2021 07:12, Dmitry Osipenko пишет:
>> 18.08.2021 06:55, Viresh Kumar пишет:
>>> On 17-08-21, 18:49, Dmitry Osipenko wrote:
>>>> 17.08.2021 10:55, Viresh Kumar пишет:
>>>> ...
>>>>>> +int dev_pm_opp_sync(struct device *dev)
>>>>>> +{
>>>>>> +	struct opp_table *opp_table;
>>>>>> +	struct dev_pm_opp *opp;
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	/* Device may not have OPP table */
>>>>>> +	opp_table = _find_opp_table(dev);
>>>>>> +	if (IS_ERR(opp_table))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	if (!_get_opp_count(opp_table))
>>>>>> +		goto put_table;
>>>>>> +
>>>>>> +	opp = _find_current_opp(dev, opp_table);
>>>>>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
>>>>>
>>>>> And I am not sure how this will end up working, since new OPP will be
>>>>> equal to old one. Since I see you call this from resume() at many
>>>>> places.
>>>>
>>>> Initially OPP table is "uninitialized" and opp_table->enabled=false,
>>>> hence the first sync always works even if OPP is equal to old one. Once
>>>> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
>>>> matter how many times syncing is called.
>>>>
>>>> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012
>>>
>>> Right, but how will this work from Resume ? Won't that be a no-op ?
>>
>> The first resume initializes the OPP state on sync, all further syncs on
>> resume are no-ops.
>>
> 
> Notice that we use GENPD here. GENPD core takes care of storing PD's
> performance state (voltage in case of Tegra) and dropping it to 0 after
> rpm-suspend, GENPD core also restores the state before rpm-resume.

By 'here' I mean in this series.

Viresh Kumar Aug. 18, 2021, 4:31 a.m. UTC | #12

On 18-08-21, 07:12, Dmitry Osipenko wrote:
> 18.08.2021 06:55, Viresh Kumar пишет:
> > On 17-08-21, 18:49, Dmitry Osipenko wrote:
> >> 17.08.2021 10:55, Viresh Kumar пишет:
> >> ...
> >>>> +int dev_pm_opp_sync(struct device *dev)
> >>>> +{
> >>>> +	struct opp_table *opp_table;
> >>>> +	struct dev_pm_opp *opp;
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	/* Device may not have OPP table */
> >>>> +	opp_table = _find_opp_table(dev);
> >>>> +	if (IS_ERR(opp_table))
> >>>> +		return 0;
> >>>> +
> >>>> +	if (!_get_opp_count(opp_table))
> >>>> +		goto put_table;
> >>>> +
> >>>> +	opp = _find_current_opp(dev, opp_table);
> >>>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
> >>>
> >>> And I am not sure how this will end up working, since new OPP will be
> >>> equal to old one. Since I see you call this from resume() at many
> >>> places.
> >>
> >> Initially OPP table is "uninitialized" and opp_table->enabled=false,
> >> hence the first sync always works even if OPP is equal to old one. Once
> >> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
> >> matter how many times syncing is called.
> >>
> >> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012
> > 
> > Right, but how will this work from Resume ? Won't that be a no-op ?
> 
> The first resume initializes the OPP state on sync, all further syncs on
> resume are no-ops.

But the OPPs should already be initialized as someone must have called
opp-set-rate earlier ? Why do this from resume and not probe ?

Viresh Kumar Aug. 18, 2021, 4:34 a.m. UTC | #13

On 18-08-21, 07:30, Dmitry Osipenko wrote:
> 18.08.2021 07:29, Dmitry Osipenko пишет:
> >> The first resume initializes the OPP state on sync, all further syncs on
> >> resume are no-ops.
> >>
> > 
> > Notice that we use GENPD here. GENPD core takes care of storing PD's
> > performance state (voltage in case of Tegra) and dropping it to 0 after
> > rpm-suspend, GENPD core also restores the state before rpm-resume.
> 
> By 'here' I mean in this series.

It is still not clear to me why you need to this on resume, and not
probe.

Dmitry Osipenko Aug. 18, 2021, 4:37 a.m. UTC | #14

18.08.2021 07:31, Viresh Kumar пишет:
> On 18-08-21, 07:12, Dmitry Osipenko wrote:
>> 18.08.2021 06:55, Viresh Kumar пишет:
>>> On 17-08-21, 18:49, Dmitry Osipenko wrote:
>>>> 17.08.2021 10:55, Viresh Kumar пишет:
>>>> ...
>>>>>> +int dev_pm_opp_sync(struct device *dev)
>>>>>> +{
>>>>>> +	struct opp_table *opp_table;
>>>>>> +	struct dev_pm_opp *opp;
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	/* Device may not have OPP table */
>>>>>> +	opp_table = _find_opp_table(dev);
>>>>>> +	if (IS_ERR(opp_table))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	if (!_get_opp_count(opp_table))
>>>>>> +		goto put_table;
>>>>>> +
>>>>>> +	opp = _find_current_opp(dev, opp_table);
>>>>>> +	ret = _set_opp(dev, opp_table, opp, opp->rate);
>>>>>
>>>>> And I am not sure how this will end up working, since new OPP will be
>>>>> equal to old one. Since I see you call this from resume() at many
>>>>> places.
>>>>
>>>> Initially OPP table is "uninitialized" and opp_table->enabled=false,
>>>> hence the first sync always works even if OPP is equal to old one. Once
>>>> OPP has been synced, all further syncs are NO-OPs, hence it doesn't
>>>> matter how many times syncing is called.
>>>>
>>>> https://elixir.bootlin.com/linux/v5.14-rc6/source/drivers/opp/core.c#L1012
>>>
>>> Right, but how will this work from Resume ? Won't that be a no-op ?
>>
>> The first resume initializes the OPP state on sync, all further syncs on
>> resume are no-ops.
> 
> But the OPPs should already be initialized as someone must have called
> opp-set-rate earlier ? Why do this from resume and not probe ?

This will set voltage level without having an actively used hardware.
Take a 3d driver for example, if you set the rate on probe and
rpm-resume will never be called, then the voltage will be set high,
while hardware is kept suspended if userspace will never wake it up by
executing a 3d job.

Viresh Kumar Aug. 18, 2021, 4:53 a.m. UTC | #15

On 18-08-21, 07:37, Dmitry Osipenko wrote:
> This will set voltage level without having an actively used hardware.
> Take a 3d driver for example, if you set the rate on probe and
> rpm-resume will never be called, then the voltage will be set high,
> while hardware is kept suspended if userspace will never wake it up by
> executing a 3d job.

What exactly are we looking to achieve with this stuff ? Cache the
current performance state with genpd (based on the state bootloader's
has set) ?

Or anything else as well ?

Dmitry Osipenko Aug. 18, 2021, 5:21 a.m. UTC | #16

18.08.2021 07:53, Viresh Kumar пишет:
> On 18-08-21, 07:37, Dmitry Osipenko wrote:
>> This will set voltage level without having an actively used hardware.
>> Take a 3d driver for example, if you set the rate on probe and
>> rpm-resume will never be called, then the voltage will be set high,
>> while hardware is kept suspended if userspace will never wake it up by
>> executing a 3d job.
> 
> What exactly are we looking to achieve with this stuff ? Cache the
> current performance state with genpd (based on the state bootloader's
> has set) ?

Yes, GENPD will cache the perf state across suspend/resume and initially
cached value is out of sync with h/w.

> Or anything else as well ?

Nothing else. But let me clarify it all again.

Initially the performance state of all GENPDs is 0 for all devices.

The clock rate is preinitialized for all devices to a some default rate
by clk driver, or by bootloader or by assigned-clocks in DT.

When device is rpm-resumed, the resume callback of a device driver
enables the clock.

Before clock is enabled, the voltage needs to be configured in
accordance to the clk rate.

So now we have a GENPD with pstate=0 on a first rpm-resume, which
doesn't match the h/w configuration. Calling dev_pm_opp_sync() sets the
pstate in accordance to the h/w config.

In a previous v7 I proposed to preset the rpm_pstate of GENPD (perf
level that is restored before device is rpm-resumed) from PD's
attach_dev callback, but Ulf didn't like that because it requires to use
and modify GENPD 'private' variables from a PD driver. We decided that
will be better to make device drivers to explicitly sync the perf state,
which I implemented in this v8.

Viresh Kumar Aug. 18, 2021, 5:58 a.m. UTC | #17

On 18-08-21, 08:21, Dmitry Osipenko wrote:
> Yes, GENPD will cache the perf state across suspend/resume and initially
> cached value is out of sync with h/w.
> 
> Nothing else. But let me clarify it all again.

Thanks for your explanation.

> Initially the performance state of all GENPDs is 0 for all devices.
> 
> The clock rate is preinitialized for all devices to a some default rate
> by clk driver, or by bootloader or by assigned-clocks in DT.
> 
> When device is rpm-resumed, the resume callback of a device driver
> enables the clock.
> 
> Before clock is enabled, the voltage needs to be configured in
> accordance to the clk rate.
> 
> So now we have a GENPD with pstate=0 on a first rpm-resume, which
> doesn't match the h/w configuration. Calling dev_pm_opp_sync() sets the
> pstate in accordance to the h/w config.

What about calling dev_pm_opp_set_rate(dev, clk_get_rate(dev)) here
instead ? That will work, right ? The advantage is it works without
any special routine to do so.

I also wonder looking at your gr3d.c changes, you set a set-opp
helper, but the driver doesn't call set_opp_rate at all. Who calls it
?

And if it is all about just syncing the genpd core, then can the genpd
core do something like what clk framework does? i.e. allow a new
optional genpd callback, get_performance_state() (just like
set_performance_state()), which can be called initially by the core to
get the performance to something other than zero. opp-set-rate is
there to set the performance state and enable the stuff as well.
That's why it looks incorrect in your case, where the function was
only required to be called once, and you are ending up calling it on
each resume. Limiting that with another local variable is bad as well.

> In a previous v7 I proposed to preset the rpm_pstate of GENPD (perf
> level that is restored before device is rpm-resumed) from PD's
> attach_dev callback, but Ulf didn't like that because it requires to use
> and modify GENPD 'private' variables from a PD driver. We decided that
> will be better to make device drivers to explicitly sync the perf state,
> which I implemented in this v8.

Viresh Kumar Aug. 18, 2021, 6 a.m. UTC | #18

On 18-08-21, 11:28, Viresh Kumar wrote:
> On 18-08-21, 08:21, Dmitry Osipenko wrote:
> > Yes, GENPD will cache the perf state across suspend/resume and initially
> > cached value is out of sync with h/w.
> > 
> > Nothing else. But let me clarify it all again.
> 
> Thanks for your explanation.
> 
> > Initially the performance state of all GENPDs is 0 for all devices.
> > 
> > The clock rate is preinitialized for all devices to a some default rate
> > by clk driver, or by bootloader or by assigned-clocks in DT.
> > 
> > When device is rpm-resumed, the resume callback of a device driver
> > enables the clock.
> > 
> > Before clock is enabled, the voltage needs to be configured in
> > accordance to the clk rate.
> > 
> > So now we have a GENPD with pstate=0 on a first rpm-resume, which
> > doesn't match the h/w configuration. Calling dev_pm_opp_sync() sets the
> > pstate in accordance to the h/w config.
> 
> What about calling dev_pm_opp_set_rate(dev, clk_get_rate(dev)) here
> instead ? That will work, right ? The advantage is it works without
> any special routine to do so.
> 
> I also wonder looking at your gr3d.c changes, you set a set-opp
> helper, but the driver doesn't call set_opp_rate at all. Who calls it
> ?
> 
> And if it is all about just syncing the genpd core, then can the genpd
> core do something like what clk framework does? i.e. allow a new
> optional genpd callback, get_performance_state() (just like
> set_performance_state()), which can be called initially by the core to
> get the performance to something other than zero. opp-set-rate is
> there to set the performance state and enable the stuff as well.
> That's why it looks incorrect in your case, where the function was
> only required to be called once, and you are ending up calling it on
> each resume. Limiting that with another local variable is bad as well.

Ulf, this last part is for you :)

Dmitry Osipenko Aug. 18, 2021, 6:22 a.m. UTC | #19

18.08.2021 08:58, Viresh Kumar пишет:
> On 18-08-21, 08:21, Dmitry Osipenko wrote:
>> Yes, GENPD will cache the perf state across suspend/resume and initially
>> cached value is out of sync with h/w.
>>
>> Nothing else. But let me clarify it all again.
> 
> Thanks for your explanation.
> 
>> Initially the performance state of all GENPDs is 0 for all devices.
>>
>> The clock rate is preinitialized for all devices to a some default rate
>> by clk driver, or by bootloader or by assigned-clocks in DT.
>>
>> When device is rpm-resumed, the resume callback of a device driver
>> enables the clock.
>>
>> Before clock is enabled, the voltage needs to be configured in
>> accordance to the clk rate.
>>
>> So now we have a GENPD with pstate=0 on a first rpm-resume, which
>> doesn't match the h/w configuration. Calling dev_pm_opp_sync() sets the
>> pstate in accordance to the h/w config.
> 
> What about calling dev_pm_opp_set_rate(dev, clk_get_rate(dev)) here
> instead ? That will work, right ? The advantage is it works without
> any special routine to do so.

It will work, but a dedicated helper is nicer.

> I also wonder looking at your gr3d.c changes, you set a set-opp
> helper, but the driver doesn't call set_opp_rate at all. Who calls it
> ?

dev_pm_opp_sync() calls it from _set_opp().

> And if it is all about just syncing the genpd core, then can the genpd
> core do something like what clk framework does? i.e. allow a new
> optional genpd callback, get_performance_state() (just like
> set_performance_state()), which can be called initially by the core to
> get the performance to something other than zero. opp-set-rate is
> there to set the performance state and enable the stuff as well.
> That's why it looks incorrect in your case, where the function was
> only required to be called once, and you are ending up calling it on
> each resume. Limiting that with another local variable is bad as well.

We discussed variant with get_performance_state() previously and Ulf
didn't like it either since it still requires to touch 'internals' of GENPD.

Viresh Kumar Aug. 18, 2021, 6:27 a.m. UTC | #20

On 18-08-21, 09:22, Dmitry Osipenko wrote:
> 18.08.2021 08:58, Viresh Kumar пишет:
> > What about calling dev_pm_opp_set_rate(dev, clk_get_rate(dev)) here
> > instead ? That will work, right ? The advantage is it works without
> > any special routine to do so.
> 
> It will work, but a dedicated helper is nicer.
> 
> > I also wonder looking at your gr3d.c changes, you set a set-opp
> > helper, but the driver doesn't call set_opp_rate at all. Who calls it
> > ?
> 
> dev_pm_opp_sync() calls it from _set_opp().

Okay, please use dev_pm_opp_set_rate() instead then. New helper just
adds to the confusion and isn't doing anything special apart from
doing clk_get_rate() for you.

> > And if it is all about just syncing the genpd core, then can the genpd
> > core do something like what clk framework does? i.e. allow a new
> > optional genpd callback, get_performance_state() (just like
> > set_performance_state()), which can be called initially by the core to
> > get the performance to something other than zero. opp-set-rate is
> > there to set the performance state and enable the stuff as well.
> > That's why it looks incorrect in your case, where the function was
> > only required to be called once, and you are ending up calling it on
> > each resume. Limiting that with another local variable is bad as well.
> 
> We discussed variant with get_performance_state() previously and Ulf
> didn't like it either since it still requires to touch 'internals' of GENPD.

Hmm, I wonder if that would be a problem since only genpd core is
going to call that routine anyway.

Ulf Hansson Aug. 18, 2021, 8:29 a.m. UTC | #21

On Wed, 18 Aug 2021 at 08:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-08-21, 09:22, Dmitry Osipenko wrote:
> > 18.08.2021 08:58, Viresh Kumar пишет:
> > > What about calling dev_pm_opp_set_rate(dev, clk_get_rate(dev)) here
> > > instead ? That will work, right ? The advantage is it works without
> > > any special routine to do so.
> >
> > It will work, but a dedicated helper is nicer.
> >
> > > I also wonder looking at your gr3d.c changes, you set a set-opp
> > > helper, but the driver doesn't call set_opp_rate at all. Who calls it
> > > ?
> >
> > dev_pm_opp_sync() calls it from _set_opp().
>
> Okay, please use dev_pm_opp_set_rate() instead then. New helper just
> adds to the confusion and isn't doing anything special apart from
> doing clk_get_rate() for you.
>
> > > And if it is all about just syncing the genpd core, then can the genpd
> > > core do something like what clk framework does? i.e. allow a new
> > > optional genpd callback, get_performance_state() (just like
> > > set_performance_state()), which can be called initially by the core to
> > > get the performance to something other than zero. opp-set-rate is
> > > there to set the performance state and enable the stuff as well.
> > > That's why it looks incorrect in your case, where the function was
> > > only required to be called once, and you are ending up calling it on
> > > each resume. Limiting that with another local variable is bad as well.
> >
> > We discussed variant with get_performance_state() previously and Ulf
> > didn't like it either since it still requires to touch 'internals' of GENPD.
>
> Hmm, I wonder if that would be a problem since only genpd core is
> going to call that routine anyway.

Me and Dmitry discussed adding a new genpd callback for this. I agreed
that it seems like a reasonable thing to add, if he insists.

The intent was to invoke the new callback from __genpd_dev_pm_attach()
when the device has been attached to its genpd. This allows the
callback, to invoke clk_get_rate() and then dev_pm_opp_set_rate(), to
update the vote according to the current state of the HW.

I am not sure if/why that approach seemed insufficient?

Another option to solve the problem, I think, is simply to patch
drivers to let them call dev_pm_opp_set_rate() during ->probe(), this
should synchronize the HW state too.

Dmitry, can you please elaborate on this?

Kind regards
Uffe

Ulf Hansson Aug. 18, 2021, 8:35 a.m. UTC | #22

On Tue, 17 Aug 2021 at 16:03, Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 02:04:38PM +0200, Ulf Hansson wrote:
> > On Tue, 17 Aug 2021 at 03:30, Dmitry Osipenko <digetx@gmail.com> wrote:
> > >
> > > Add runtime PM and OPP support to the Host1x driver. It's required for
> > > enabling system-wide DVFS and supporting dynamic power management using
> > > a generic power domain. For the starter we will keep host1x always-on
> > > because dynamic power management require a major refactoring of the driver
> > > code since lot's of code paths will need the RPM handling and we're going
> > > to remove some of these paths in the future. Host1x doesn't consume much
> > > power so it is good enough, we at least need to resume Host1x in order
> > > to initialize the power state.
> > >
> > > Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> > > Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> > > Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> > > Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> > > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > > ---
> >
> > [...]
> >
> > > +
> > >  static int host1x_probe(struct platform_device *pdev)
> > >  {
> > >         struct host1x *host;
> > > @@ -394,6 +423,10 @@ static int host1x_probe(struct platform_device *pdev)
> > >         /* set common host1x device data */
> > >         platform_set_drvdata(pdev, host);
> > >
> > > +       err = devm_tegra_core_dev_init_opp_table_simple(&pdev->dev);
> > > +       if (err)
> > > +               return err;
> > > +
> > >         host->regs = devm_ioremap_resource(&pdev->dev, regs);
> > >         if (IS_ERR(host->regs))
> > >                 return PTR_ERR(host->regs);
> > > @@ -423,12 +456,9 @@ static int host1x_probe(struct platform_device *pdev)
> > >                 return err;
> > >         }
> > >
> > > -       host->rst = devm_reset_control_get(&pdev->dev, "host1x");
> > > -       if (IS_ERR(host->rst)) {
> > > -               err = PTR_ERR(host->rst);
> > > -               dev_err(&pdev->dev, "failed to get reset: %d\n", err);
> > > +       err = host1x_get_resets(host);
> > > +       if (err)
> > >                 return err;
> > > -       }
> > >
> > >         err = host1x_iommu_init(host);
> > >         if (err < 0) {
> > > @@ -443,22 +473,10 @@ static int host1x_probe(struct platform_device *pdev)
> > >                 goto iommu_exit;
> > >         }
> > >
> > > -       err = clk_prepare_enable(host->clk);
> > > -       if (err < 0) {
> > > -               dev_err(&pdev->dev, "failed to enable clock\n");
> > > -               goto free_channels;
> > > -       }
> > > -
> > > -       err = reset_control_deassert(host->rst);
> > > -       if (err < 0) {
> > > -               dev_err(&pdev->dev, "failed to deassert reset: %d\n", err);
> > > -               goto unprepare_disable;
> > > -       }
> > > -
> >
> > Removing the clk_prepare_enable() and reset_control_deassert() from
> > host1x_probe(), might not be a good idea. See more about why, below.
> >
> > >         err = host1x_syncpt_init(host);
> > >         if (err) {
> > >                 dev_err(&pdev->dev, "failed to initialize syncpts\n");
> > > -               goto reset_assert;
> > > +               goto free_channels;
> > >         }
> > >
> > >         err = host1x_intr_init(host, syncpt_irq);
> > > @@ -467,10 +485,14 @@ static int host1x_probe(struct platform_device *pdev)
> > >                 goto deinit_syncpt;
> > >         }
> > >
> > > -       host1x_debug_init(host);
> > > +       pm_runtime_enable(&pdev->dev);
> > >
> > > -       if (host->info->has_hypervisor)
> > > -               host1x_setup_sid_table(host);
> > > +       /* the driver's code isn't ready yet for the dynamic RPM */
> > > +       err = pm_runtime_resume_and_get(&pdev->dev);
> >
> > If the driver is being built with the CONFIG_PM Kconfig option being
> > unset, pm_runtime_resume_and_get() will return 0 to indicate success -
> > and without calling the ->runtime_resume() callback.
> > In other words, the clock will remain gated and the reset will not be
> > deasserted, likely causing the driver to be malfunctioning.
> >
> > If the driver isn't ever being built with CONFIG_PM unset, feel free
> > to ignore my above comments.
> >
> > Otherwise, if it needs to work both with and without CONFIG_PM being
> > set, you may use the following pattern in host1x_probe() to deploy
> > runtime PM support:
> >
> > "Enable the needed resources to probe the device"
> > pm_runtime_get_noresume()
> > pm_runtime_set_active()
> > pm_runtime_enable()
> >
> > "Before successfully completing probe"
> > pm_runtime_put()
>
> We made a conscious decision a few years ago to have ARCH_TEGRA select
> PM on both 32-bit and 64-bit ARM, specifically to avoid the need to do
> this dance (though there are still a few drivers left that do this, I
> think).
>
> So I think this should be unnecessary. Unless perhaps if the sysfs PM
> controls have any influence on this. As far as I know, as long as the
> PM kconfig option is enabled, the sysfs control only influence the
> runtime behaviour (i.e. setting the sysfs PM control to "on" is going
> to force runtime PM to be resumed) but there's no way to disable
> runtime PM altogether via sysfs that would make the above necessary.

Thanks for clarifying! As I said, feel free to ignore my comments then.

For this and the other patches in the series, I assume you only need
to care about whether the driver is a cross SoC driver and used on
other platforms than Tegra then.

>
> Thierry

Kind regards
Uffe

Viresh Kumar Aug. 18, 2021, 9:14 a.m. UTC | #23

On 18-08-21, 10:29, Ulf Hansson wrote:
> Me and Dmitry discussed adding a new genpd callback for this. I agreed
> that it seems like a reasonable thing to add, if he insists.
> 
> The intent was to invoke the new callback from __genpd_dev_pm_attach()
> when the device has been attached to its genpd. This allows the
> callback, to invoke clk_get_rate() and then dev_pm_opp_set_rate(), to
> update the vote according to the current state of the HW.

I wouldn't call dev_pm_opp_set_rate() from there, since it means
configure and enable (both) for different resources, clk, regulator,
genpd, etc..

What we need here is just configure. So something like this then:

- genpd->get_performance_state()
  -> dev_pm_opp_get_current_opp() //New API
  -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);

This can be done just once from probe() then.

> I am not sure if/why that approach seemed insufficient?
> 
> Another option to solve the problem, I think, is simply to patch
> drivers to let them call dev_pm_opp_set_rate() during ->probe(), this
> should synchronize the HW state too.

Dmitry already mentioned that this will make the device start
consuming power, and he doesn't want that, else we need an explicit
disble call as well.

Ulf Hansson Aug. 18, 2021, 9:41 a.m. UTC | #24

On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-08-21, 10:29, Ulf Hansson wrote:
> > Me and Dmitry discussed adding a new genpd callback for this. I agreed
> > that it seems like a reasonable thing to add, if he insists.
> >
> > The intent was to invoke the new callback from __genpd_dev_pm_attach()
> > when the device has been attached to its genpd. This allows the
> > callback, to invoke clk_get_rate() and then dev_pm_opp_set_rate(), to
> > update the vote according to the current state of the HW.
>
> I wouldn't call dev_pm_opp_set_rate() from there, since it means
> configure and enable (both) for different resources, clk, regulator,
> genpd, etc..

Right, good point!

dev_pm_opp_set_rate() is best called from consumer drivers, as they
need to be in control.

>
> What we need here is just configure. So something like this then:
>
> - genpd->get_performance_state()
>   -> dev_pm_opp_get_current_opp() //New API
>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
>
> This can be done just once from probe() then.

How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?

>
> > I am not sure if/why that approach seemed insufficient?
> >
> > Another option to solve the problem, I think, is simply to patch
> > drivers to let them call dev_pm_opp_set_rate() during ->probe(), this
> > should synchronize the HW state too.
>
> Dmitry already mentioned that this will make the device start
> consuming power, and he doesn't want that, else we need an explicit
> disble call as well.

I am sure I understand the problem. When a device is getting probed,
it needs to consume power, how else can the corresponding driver
successfully probe it?

>
> --
> viresh

Kind regards
Uffe

Ulf Hansson Aug. 18, 2021, 9:42 a.m. UTC | #25

On Wed, 18 Aug 2021 at 11:41, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 18-08-21, 10:29, Ulf Hansson wrote:
> > > Me and Dmitry discussed adding a new genpd callback for this. I agreed
> > > that it seems like a reasonable thing to add, if he insists.
> > >
> > > The intent was to invoke the new callback from __genpd_dev_pm_attach()
> > > when the device has been attached to its genpd. This allows the
> > > callback, to invoke clk_get_rate() and then dev_pm_opp_set_rate(), to
> > > update the vote according to the current state of the HW.
> >
> > I wouldn't call dev_pm_opp_set_rate() from there, since it means
> > configure and enable (both) for different resources, clk, regulator,
> > genpd, etc..
>
> Right, good point!
>
> dev_pm_opp_set_rate() is best called from consumer drivers, as they
> need to be in control.
>
> >
> > What we need here is just configure. So something like this then:
> >
> > - genpd->get_performance_state()
> >   -> dev_pm_opp_get_current_opp() //New API
> >   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >
> > This can be done just once from probe() then.
>
> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
>
> >
> > > I am not sure if/why that approach seemed insufficient?
> > >
> > > Another option to solve the problem, I think, is simply to patch
> > > drivers to let them call dev_pm_opp_set_rate() during ->probe(), this
> > > should synchronize the HW state too.
> >
> > Dmitry already mentioned that this will make the device start
> > consuming power, and he doesn't want that, else we need an explicit
> > disble call as well.
>
> I am sure I understand the problem. When a device is getting probed,

/s/I am sure/I am not sure

> it needs to consume power, how else can the corresponding driver
> successfully probe it?
>
> >
> > --
> > viresh
>
> Kind regards
> Uffe

Viresh Kumar Aug. 18, 2021, 9:50 a.m. UTC | #26

On 18-08-21, 11:41, Ulf Hansson wrote:
> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > What we need here is just configure. So something like this then:
> >
> > - genpd->get_performance_state()
> >   -> dev_pm_opp_get_current_opp() //New API
> >   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >
> > This can be done just once from probe() then.
> 
> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?

The opp core already has a way of finding current OPP, that's what
Dmitry is trying to use here. It finds it using clk_get_rate(), if
that is zero, it picks the lowest freq possible.

> I am sure I understand the problem. When a device is getting probed,
> it needs to consume power, how else can the corresponding driver
> successfully probe it?

Dmitry can answer that better, but a device doesn't necessarily need
to consume energy in probe. It can consume bus clock, like APB we
have, but the more energy consuming stuff can be left disabled until
the time a user comes up. Probe will just end up registering the
driver and initializing it.

Ulf Hansson Aug. 18, 2021, 10:08 a.m. UTC | #27

On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-08-21, 11:41, Ulf Hansson wrote:
> > On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > What we need here is just configure. So something like this then:
> > >
> > > - genpd->get_performance_state()
> > >   -> dev_pm_opp_get_current_opp() //New API
> > >   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> > >
> > > This can be done just once from probe() then.
> >
> > How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
>
> The opp core already has a way of finding current OPP, that's what
> Dmitry is trying to use here. It finds it using clk_get_rate(), if
> that is zero, it picks the lowest freq possible.
>
> > I am sure I understand the problem. When a device is getting probed,
> > it needs to consume power, how else can the corresponding driver
> > successfully probe it?
>
> Dmitry can answer that better, but a device doesn't necessarily need
> to consume energy in probe. It can consume bus clock, like APB we
> have, but the more energy consuming stuff can be left disabled until
> the time a user comes up. Probe will just end up registering the
> driver and initializing it.

That's perfectly fine, as then it's likely that it won't vote for an
OPP, but can postpone that as well.

Perhaps the problem is rather that the HW may already carry a non-zero
vote made from a bootloader. If the consumer driver tries to clear
that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
still not lead to any updates of the performance state in genpd,
because genpd internally has initialized the performance-state to
zero.

Dmitry?

>
> --
> viresh

Kind regards
Uffe

Thierry Reding Aug. 18, 2021, 2:07 p.m. UTC | #28

On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
[...]
> +struct clk *tegra_clk_register(struct clk_hw *hw)
> +{
> +	struct platform_device *pdev;
> +	struct device *dev = NULL;
> +	struct device_node *np;
> +	const char *dev_name;
> +
> +	np = tegra_clk_get_of_node(hw);
> +
> +	if (!of_device_is_available(np))
> +		goto put_node;
> +
> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
> +	if (!dev_name)
> +		goto put_node;
> +
> +	pdev = of_platform_device_create(np, dev_name, NULL);
> +	if (!pdev) {
> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
> +		kfree(dev_name);
> +		goto put_node;
> +	}
> +
> +	dev = &pdev->dev;
> +	pm_runtime_enable(dev);
> +put_node:
> +	of_node_put(np);
> +
> +	return clk_register(dev, hw);
> +}

This looks wrong. Why do we need struct platform_device objects for each
of these clocks? That's going to be a massive amount of platform devices
and they will completely mess up sysfs.

Thierry

Dmitry Osipenko Aug. 18, 2021, 3:05 p.m. UTC | #29

18.08.2021 17:07, Thierry Reding пишет:
> On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
> [...]
>> +struct clk *tegra_clk_register(struct clk_hw *hw)
>> +{
>> +	struct platform_device *pdev;
>> +	struct device *dev = NULL;
>> +	struct device_node *np;
>> +	const char *dev_name;
>> +
>> +	np = tegra_clk_get_of_node(hw);
>> +
>> +	if (!of_device_is_available(np))
>> +		goto put_node;
>> +
>> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
>> +	if (!dev_name)
>> +		goto put_node;
>> +
>> +	pdev = of_platform_device_create(np, dev_name, NULL);
>> +	if (!pdev) {
>> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
>> +		kfree(dev_name);
>> +		goto put_node;
>> +	}
>> +
>> +	dev = &pdev->dev;
>> +	pm_runtime_enable(dev);
>> +put_node:
>> +	of_node_put(np);
>> +
>> +	return clk_register(dev, hw);
>> +}
> 
> This looks wrong. Why do we need struct platform_device objects for each
> of these clocks? That's going to be a massive amount of platform devices
> and they will completely mess up sysfs.

RPM works with a device. It's not a massive amount of devices, it's one
device for T20 and four devices for T30.

Dmitry Osipenko Aug. 18, 2021, 3:43 p.m. UTC | #30

18.08.2021 13:08, Ulf Hansson пишет:
> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> On 18-08-21, 11:41, Ulf Hansson wrote:
>>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>> What we need here is just configure. So something like this then:
>>>>
>>>> - genpd->get_performance_state()
>>>>   -> dev_pm_opp_get_current_opp() //New API
>>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
>>>>
>>>> This can be done just once from probe() then.
>>>
>>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
>>
>> The opp core already has a way of finding current OPP, that's what
>> Dmitry is trying to use here. It finds it using clk_get_rate(), if
>> that is zero, it picks the lowest freq possible.
>>
>>> I am sure I understand the problem. When a device is getting probed,
>>> it needs to consume power, how else can the corresponding driver
>>> successfully probe it?
>>
>> Dmitry can answer that better, but a device doesn't necessarily need
>> to consume energy in probe. It can consume bus clock, like APB we
>> have, but the more energy consuming stuff can be left disabled until
>> the time a user comes up. Probe will just end up registering the
>> driver and initializing it.
> 
> That's perfectly fine, as then it's likely that it won't vote for an
> OPP, but can postpone that as well.
> 
> Perhaps the problem is rather that the HW may already carry a non-zero
> vote made from a bootloader. If the consumer driver tries to clear
> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
> still not lead to any updates of the performance state in genpd,
> because genpd internally has initialized the performance-state to
> zero.

We don't need to discover internal SoC devices because we use
device-tree on ARM. For most devices power isn't required at a probe
time because probe function doesn't touch h/w at all, thus devices are
left in suspended state after probe.

We have three components comprising PM on Tegra:

1. Power gate
2. Clock state
3. Voltage state

GENPD on/off represents the 'power gate'.

Clock and reset are controlled by device drivers using clk and rst APIs.

Voltage state is represented by GENPD's performance level.

GENPD core assumes that at a first rpm-resume of a consumer device, its
genpd_performance=0. Not true for Tegra because h/w of the device is
preconfigured to a non-zero perf level initially, h/w may not support
zero level at all.

GENPD core assumes that consumer devices can work at any performance
level. Not true for Tegra because voltage needs to be set in accordance
to the clock rate before clock is enabled, otherwise h/w won't work
properly, perhaps clock may be unstable or h/w won't be latching.

Performance level should be set to 0 while device is suspended.
Performance level needs to be bumped on rpm-resume of a device in
accordance to h/w state before hardware is enabled.

Dmitry Osipenko Aug. 18, 2021, 3:46 p.m. UTC | #31

18.08.2021 18:43, Dmitry Osipenko пишет:
> 18.08.2021 13:08, Ulf Hansson пишет:
>> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>
>>> On 18-08-21, 11:41, Ulf Hansson wrote:
>>>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>>> What we need here is just configure. So something like this then:
>>>>>
>>>>> - genpd->get_performance_state()
>>>>>   -> dev_pm_opp_get_current_opp() //New API
>>>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
>>>>>
>>>>> This can be done just once from probe() then.
>>>>
>>>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
>>>
>>> The opp core already has a way of finding current OPP, that's what
>>> Dmitry is trying to use here. It finds it using clk_get_rate(), if
>>> that is zero, it picks the lowest freq possible.
>>>
>>>> I am sure I understand the problem. When a device is getting probed,
>>>> it needs to consume power, how else can the corresponding driver
>>>> successfully probe it?
>>>
>>> Dmitry can answer that better, but a device doesn't necessarily need
>>> to consume energy in probe. It can consume bus clock, like APB we
>>> have, but the more energy consuming stuff can be left disabled until
>>> the time a user comes up. Probe will just end up registering the
>>> driver and initializing it.
>>
>> That's perfectly fine, as then it's likely that it won't vote for an
>> OPP, but can postpone that as well.
>>
>> Perhaps the problem is rather that the HW may already carry a non-zero
>> vote made from a bootloader. If the consumer driver tries to clear
>> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
>> still not lead to any updates of the performance state in genpd,
>> because genpd internally has initialized the performance-state to
>> zero.
> 
> We don't need to discover internal SoC devices because we use
> device-tree on ARM. For most devices power isn't required at a probe
> time because probe function doesn't touch h/w at all, thus devices are
> left in suspended state after probe.
> 
> We have three components comprising PM on Tegra:
> 
> 1. Power gate
> 2. Clock state
> 3. Voltage state
> 
> GENPD on/off represents the 'power gate'.
> 
> Clock and reset are controlled by device drivers using clk and rst APIs.
> 
> Voltage state is represented by GENPD's performance level.

OPP framework couples the performance level with the clock rate.

> GENPD core assumes that at a first rpm-resume of a consumer device, its
> genpd_performance=0. Not true for Tegra because h/w of the device is
> preconfigured to a non-zero perf level initially, h/w may not support
> zero level at all.
> 
> GENPD core assumes that consumer devices can work at any performance
> level. Not true for Tegra because voltage needs to be set in accordance
> to the clock rate before clock is enabled, otherwise h/w won't work
> properly, perhaps clock may be unstable or h/w won't be latching.
> 
> Performance level should be set to 0 while device is suspended.
> Performance level needs to be bumped on rpm-resume of a device in
> accordance to h/w state before hardware is enabled.
>

Dmitry Osipenko Aug. 18, 2021, 3:55 p.m. UTC | #32

18.08.2021 12:41, Ulf Hansson пишет:
> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> On 18-08-21, 10:29, Ulf Hansson wrote:
>>> Me and Dmitry discussed adding a new genpd callback for this. I agreed
>>> that it seems like a reasonable thing to add, if he insists.

Either way gives the equal result. The new callback allows to remove the
boilerplate dev_pm_opp_set_rate(clk_get_rate() code from the rpm-resume
of consumer devices, that's it.

>>> The intent was to invoke the new callback from __genpd_dev_pm_attach()
>>> when the device has been attached to its genpd. This allows the
>>> callback, to invoke clk_get_rate() and then dev_pm_opp_set_rate(), to
>>> update the vote according to the current state of the HW.
>>
>> I wouldn't call dev_pm_opp_set_rate() from there, since it means
>> configure and enable (both) for different resources, clk, regulator,
>> genpd, etc..
> 
> Right, good point!
> 
> dev_pm_opp_set_rate() is best called from consumer drivers, as they
> need to be in control.
>> What we need here is just configure. So something like this then:
The intent wasn't to use dev_pm_opp_set_rate() from
__genpd_dev_pm_attach(), but to set genpd->rpm_pstate in accordance to
the h/w configuration.

On Tegra we have a chain of PDs and it's not trivial to convert the
device's OPP into pstate because only the parent domain can translate
the required OPP.

Viresh, please take a look at what I did in [1]. Maybe it could be done
in another way.

[1]
https://patchwork.ozlabs.org/project/linux-tegra/patch/20210701232728.23591-3-digetx@gmail.com/

Thierry Reding Aug. 18, 2021, 4:42 p.m. UTC | #33

On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
> 18.08.2021 17:07, Thierry Reding пишет:
> > On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
> > [...]
> >> +struct clk *tegra_clk_register(struct clk_hw *hw)
> >> +{
> >> +	struct platform_device *pdev;
> >> +	struct device *dev = NULL;
> >> +	struct device_node *np;
> >> +	const char *dev_name;
> >> +
> >> +	np = tegra_clk_get_of_node(hw);
> >> +
> >> +	if (!of_device_is_available(np))
> >> +		goto put_node;
> >> +
> >> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
> >> +	if (!dev_name)
> >> +		goto put_node;
> >> +
> >> +	pdev = of_platform_device_create(np, dev_name, NULL);
> >> +	if (!pdev) {
> >> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
> >> +		kfree(dev_name);
> >> +		goto put_node;
> >> +	}
> >> +
> >> +	dev = &pdev->dev;
> >> +	pm_runtime_enable(dev);
> >> +put_node:
> >> +	of_node_put(np);
> >> +
> >> +	return clk_register(dev, hw);
> >> +}
> > 
> > This looks wrong. Why do we need struct platform_device objects for each
> > of these clocks? That's going to be a massive amount of platform devices
> > and they will completely mess up sysfs.
> 
> RPM works with a device. It's not a massive amount of devices, it's one
> device for T20 and four devices for T30.

I'm still not sure I understand why we need to call RPM functions on a
clock. And even if they are few, it seems wrong to make these platform
devices.

Perhaps they can be simple struct device:s instead? Ideally they would
also be parented to the CAR so that they appear in the right place in
the sysfs hierarchy.

Thierry

Dmitry Osipenko Aug. 18, 2021, 5:11 p.m. UTC | #34

18.08.2021 19:42, Thierry Reding пишет:
> On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
>> 18.08.2021 17:07, Thierry Reding пишет:
>>> On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
>>> [...]
>>>> +struct clk *tegra_clk_register(struct clk_hw *hw)
>>>> +{
>>>> +	struct platform_device *pdev;
>>>> +	struct device *dev = NULL;
>>>> +	struct device_node *np;
>>>> +	const char *dev_name;
>>>> +
>>>> +	np = tegra_clk_get_of_node(hw);
>>>> +
>>>> +	if (!of_device_is_available(np))
>>>> +		goto put_node;
>>>> +
>>>> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
>>>> +	if (!dev_name)
>>>> +		goto put_node;
>>>> +
>>>> +	pdev = of_platform_device_create(np, dev_name, NULL);
>>>> +	if (!pdev) {
>>>> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
>>>> +		kfree(dev_name);
>>>> +		goto put_node;
>>>> +	}
>>>> +
>>>> +	dev = &pdev->dev;
>>>> +	pm_runtime_enable(dev);
>>>> +put_node:
>>>> +	of_node_put(np);
>>>> +
>>>> +	return clk_register(dev, hw);
>>>> +}
>>>
>>> This looks wrong. Why do we need struct platform_device objects for each
>>> of these clocks? That's going to be a massive amount of platform devices
>>> and they will completely mess up sysfs.
>>
>> RPM works with a device. It's not a massive amount of devices, it's one
>> device for T20 and four devices for T30.
> 
> I'm still not sure I understand why we need to call RPM functions on a
> clock. And even if they are few, it seems wrong to make these platform
> devices.

Before clock is enabled, we need to raise core voltage. After clock is
disabled, the voltage should be dropped. CCF+RPM takes care of handling
this for us.

> Perhaps they can be simple struct device:s instead? Ideally they would
> also be parented to the CAR so that they appear in the right place in
> the sysfs hierarchy.

Could you please clarify what do you mean by 'simple struct device:s'?
These clock devices should be OF devices with a of_node and etc,
otherwise we can't use OPP framework.

We don't have driver for CAR to bind. I guess we could try to add a
'dummy' CAR driver that will create sub-devices for the rpm-clocks, is
this what you're wanting?

Dmitry Osipenko Aug. 18, 2021, 5:24 p.m. UTC | #35

18.08.2021 11:35, Ulf Hansson пишет:
> Thanks for clarifying! As I said, feel free to ignore my comments then.
> 
> For this and the other patches in the series, I assume you only need
> to care about whether the driver is a cross SoC driver and used on
> other platforms than Tegra then.

Yes, and all drivers touched by this series are Tegra-only drivers.

Viresh Kumar Aug. 19, 2021, 6:16 a.m. UTC | #36

On 18-08-21, 18:55, Dmitry Osipenko wrote:
> 18.08.2021 12:41, Ulf Hansson пишет:
> 
> Either way gives the equal result. The new callback allows to remove the
> boilerplate dev_pm_opp_set_rate(clk_get_rate() code from the rpm-resume
> of consumer devices, that's it.

It may not be equal, as dev_pm_opp_set_rate() may do additional stuff,
now or in a later implementation. Currently it only does
regulator_enable() as a special case, but it can be clk_enable() as
well. Also, this tries to solve the problem in a tricky/hacky way,
while all you wanted was to make the genpd aware of what the
performance state should be.

Your driver can break tomorrow if we started to do more stuff from
this API at another time.

> > dev_pm_opp_set_rate() is best called from consumer drivers, as they
> > need to be in control.
> >> What we need here is just configure. So something like this then:
> The intent wasn't to use dev_pm_opp_set_rate() from
> __genpd_dev_pm_attach(), but to set genpd->rpm_pstate in accordance to
> the h/w configuration.

Right.

> On Tegra we have a chain of PDs and it's not trivial to convert the
> device's OPP into pstate because only the parent domain can translate
> the required OPP.

The driver should just be required to make a call, and OPP/genpd core
should return it a value. This is already done today while setting the
pstate for a device. The same frameworks must be able to supply a
value to be used for the device.

> Viresh, please take a look at what I did in [1]. Maybe it could be done
> in another way.

I looked into this and looked like too much trouble. The
implementation needs to be simple. I am not sure I understand all the
problems you faced while doing that, would be better to start with a
simpler implementation of get_performance_state() kind of API for
genpd, after the domain is attached and its OPP table is initialized.

Note, that the OPP table isn't required to be fully initialized for
the device at this point, we can parse the DT as well if needed be.

Ulf Hansson Aug. 19, 2021, 1:07 p.m. UTC | #37

On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 18.08.2021 13:08, Ulf Hansson пишет:
> > On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>
> >> On 18-08-21, 11:41, Ulf Hansson wrote:
> >>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>> What we need here is just configure. So something like this then:
> >>>>
> >>>> - genpd->get_performance_state()
> >>>>   -> dev_pm_opp_get_current_opp() //New API
> >>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >>>>
> >>>> This can be done just once from probe() then.
> >>>
> >>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
> >>
> >> The opp core already has a way of finding current OPP, that's what
> >> Dmitry is trying to use here. It finds it using clk_get_rate(), if
> >> that is zero, it picks the lowest freq possible.
> >>
> >>> I am sure I understand the problem. When a device is getting probed,
> >>> it needs to consume power, how else can the corresponding driver
> >>> successfully probe it?
> >>
> >> Dmitry can answer that better, but a device doesn't necessarily need
> >> to consume energy in probe. It can consume bus clock, like APB we
> >> have, but the more energy consuming stuff can be left disabled until
> >> the time a user comes up. Probe will just end up registering the
> >> driver and initializing it.
> >
> > That's perfectly fine, as then it's likely that it won't vote for an
> > OPP, but can postpone that as well.
> >
> > Perhaps the problem is rather that the HW may already carry a non-zero
> > vote made from a bootloader. If the consumer driver tries to clear
> > that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
> > still not lead to any updates of the performance state in genpd,
> > because genpd internally has initialized the performance-state to
> > zero.
>
> We don't need to discover internal SoC devices because we use
> device-tree on ARM. For most devices power isn't required at a probe
> time because probe function doesn't touch h/w at all, thus devices are
> left in suspended state after probe.
>
> We have three components comprising PM on Tegra:
>
> 1. Power gate
> 2. Clock state
> 3. Voltage state
>
> GENPD on/off represents the 'power gate'.
>
> Clock and reset are controlled by device drivers using clk and rst APIs.
>
> Voltage state is represented by GENPD's performance level.
>
> GENPD core assumes that at a first rpm-resume of a consumer device, its
> genpd_performance=0. Not true for Tegra because h/w of the device is
> preconfigured to a non-zero perf level initially, h/w may not support
> zero level at all.

I think you may be misunderstanding genpd's behaviour around this, but
let me elaborate.

In genpd_runtime_resume(), we try to restore the performance state for
the device that genpd_runtime_suspend() *may* have dropped earlier.
That means, if genpd_runtime_resume() is called prior
genpd_runtime_suspend() for the first time, it means that
genpd_runtime_resume() will *not* restore a performance state, but
instead just leave the performance state as is for the device (see
genpd_restore_performance_state()).

In other words, a consumer driver may use the following sequence to
set an initial performance state for the device during ->probe():

...
rate = clk_get_rate()
dev_pm_opp_set_rate(rate)

pm_runtime_enable()
pm_runtime_resume_and_get()
...

Note that, it's the consumer driver's responsibility to manage device
specific resources, in its ->runtime_suspend|resume() callbacks.
Typically that means dealing with clock gating/ungating, for example.

In the other scenario where a consumer driver prefers to *not* call
pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
to power on the device to complete probing, then we don't want to vote
for an OPP at all - and we also want the performance state for the
device in genpd to be set to zero. Correct?

Is this the main problem you are trying to solve, because I think this
doesn't work out of the box as of today?

There is another concern though, but perhaps it's not a problem after
all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
like clock/regulators. That could certainly be problematic, in
particular if the device and its genpd have OPP tables associated with
it and the consumer driver wants to follow the above sequence in
probe.

Viresh, can you please chime in here and elaborate on some of the
magic happening behind dev_pm_opp_set_rate() API - is there a problem
here or not?

>
> GENPD core assumes that consumer devices can work at any performance
> level. Not true for Tegra because voltage needs to be set in accordance
> to the clock rate before clock is enabled, otherwise h/w won't work
> properly, perhaps clock may be unstable or h/w won't be latching.

Correct. Genpd relies on the callers to use the OPP framework if there
are constraints like you describe above.

That said, it's not forbidden for a consumer driver to call
dev_pm_genpd_set_performance_state() directly, but then it better
knows exactly what it's doing.

>
> Performance level should be set to 0 while device is suspended.

Do you mean system suspend or runtime suspend? Or both?

> Performance level needs to be bumped on rpm-resume of a device in
> accordance to h/w state before hardware is enabled.

Assuming there was a performance state set for the device when
genpd_runtime_suspend() was called, genpd_runtime_resume() will
restore that state according to the sequence you described.

Kind regards
Uffe

Thierry Reding Aug. 19, 2021, 1:21 p.m. UTC | #38

On Tue, Aug 17, 2021 at 04:27:39AM +0300, Dmitry Osipenko wrote:
> The PWM on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now PWM must be resumed using
> runtime PM API in order to initialize the PWM power state. The PWM clock
> rate must be changed using OPP API that will reconfigure the power domain
> performance state in accordance to the rate. Add runtime PM and OPP
> support to the PWM driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/pwm/pwm-tegra.c | 104 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 85 insertions(+), 19 deletions(-)

Can this be safely applied independently of the rest of the series, or
are there any dependencies on earlier patches?

Thierry

Ulf Hansson Aug. 19, 2021, 2:04 p.m. UTC | #39

On Thu, 19 Aug 2021 at 15:21, Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Tue, Aug 17, 2021 at 04:27:39AM +0300, Dmitry Osipenko wrote:
> > The PWM on Tegra belongs to the core power domain and we're going to
> > enable GENPD support for the core domain. Now PWM must be resumed using
> > runtime PM API in order to initialize the PWM power state. The PWM clock
> > rate must be changed using OPP API that will reconfigure the power domain
> > performance state in accordance to the rate. Add runtime PM and OPP
> > support to the PWM driver.
> >
> > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > ---
> >  drivers/pwm/pwm-tegra.c | 104 ++++++++++++++++++++++++++++++++--------
> >  1 file changed, 85 insertions(+), 19 deletions(-)
>
> Can this be safely applied independently of the rest of the series, or
> are there any dependencies on earlier patches?

Just to make sure we don't rush something in, I would rather withhold
all runtime PM related patches in the series, until we have agreed on
how to fix the in genpd/opp core parts. Simply, because those may very
well affect the deployments in the drivers.

>
> Thierry

Kind regards
Uffe

Ulf Hansson Aug. 19, 2021, 2:55 p.m. UTC | #40

On Thu, 19 Aug 2021 at 08:17, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-08-21, 18:55, Dmitry Osipenko wrote:
> > 18.08.2021 12:41, Ulf Hansson пишет:
> >
> > Either way gives the equal result. The new callback allows to remove the
> > boilerplate dev_pm_opp_set_rate(clk_get_rate() code from the rpm-resume
> > of consumer devices, that's it.
>
> It may not be equal, as dev_pm_opp_set_rate() may do additional stuff,
> now or in a later implementation. Currently it only does
> regulator_enable() as a special case, but it can be clk_enable() as
> well. Also, this tries to solve the problem in a tricky/hacky way,
> while all you wanted was to make the genpd aware of what the
> performance state should be.
>
> Your driver can break tomorrow if we started to do more stuff from
> this API at another time.
>
> > > dev_pm_opp_set_rate() is best called from consumer drivers, as they
> > > need to be in control.
> > >> What we need here is just configure. So something like this then:
> > The intent wasn't to use dev_pm_opp_set_rate() from
> > __genpd_dev_pm_attach(), but to set genpd->rpm_pstate in accordance to
> > the h/w configuration.
>
> Right.
>
> > On Tegra we have a chain of PDs and it's not trivial to convert the
> > device's OPP into pstate because only the parent domain can translate
> > the required OPP.
>
> The driver should just be required to make a call, and OPP/genpd core
> should return it a value. This is already done today while setting the
> pstate for a device. The same frameworks must be able to supply a
> value to be used for the device.

Right, that sounds reasonable.

We already have pm_genpd_opp_to_performance_state() which translates
an OPP to a performance state. This function invokes the
->opp_to_performance_state() for a genpd. Maybe we need to allow a
genpd to not have ->opp_to_performance_state() callback assigned
though, but continue up in the hierarchy to see if the parent has the
callback assigned, to make this work for Tegra?

Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(),
allowing us to pass the device instead of the genpd. But that's a
minor thing.

Finally, the precondition to use the above, is to first get a handle
to an OPP table. This is where I am struggling to find a generic
solution, because I guess that would be platform or even consumer
driver specific for how to do this. And at what point should we do
this?

>
> > Viresh, please take a look at what I did in [1]. Maybe it could be done
> > in another way.
>
> I looked into this and looked like too much trouble. The
> implementation needs to be simple. I am not sure I understand all the
> problems you faced while doing that, would be better to start with a
> simpler implementation of get_performance_state() kind of API for
> genpd, after the domain is attached and its OPP table is initialized.
>
> Note, that the OPP table isn't required to be fully initialized for
> the device at this point, we can parse the DT as well if needed be.

Sure, but as I indicated above, you need some kind of input data to
figure out what OPP table to pick, before you can translate that into
a performance state. Is that always the clock rate, for example?

Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or
what do you think? Do you have other suggestions?

>
> --
> viresh

Kind regards
Uffe

Thierry Reding Aug. 19, 2021, 4:17 p.m. UTC | #41

On Thu, Aug 19, 2021 at 04:04:50PM +0200, Ulf Hansson wrote:
> On Thu, 19 Aug 2021 at 15:21, Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > On Tue, Aug 17, 2021 at 04:27:39AM +0300, Dmitry Osipenko wrote:
> > > The PWM on Tegra belongs to the core power domain and we're going to
> > > enable GENPD support for the core domain. Now PWM must be resumed using
> > > runtime PM API in order to initialize the PWM power state. The PWM clock
> > > rate must be changed using OPP API that will reconfigure the power domain
> > > performance state in accordance to the rate. Add runtime PM and OPP
> > > support to the PWM driver.
> > >
> > > Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> > > ---
> > >  drivers/pwm/pwm-tegra.c | 104 ++++++++++++++++++++++++++++++++--------
> > >  1 file changed, 85 insertions(+), 19 deletions(-)
> >
> > Can this be safely applied independently of the rest of the series, or
> > are there any dependencies on earlier patches?
> 
> Just to make sure we don't rush something in, I would rather withhold
> all runtime PM related patches in the series, until we have agreed on
> how to fix the in genpd/opp core parts. Simply, because those may very
> well affect the deployments in the drivers.

Okay, understood. I didn't realize this may have an impact on how
drivers need to cooperate. I'll hold off on applying any of these
patches until the discussion has settled, then.

Thierry

Thierry Reding Aug. 19, 2021, 4:54 p.m. UTC | #42

On Wed, Aug 18, 2021 at 08:11:03PM +0300, Dmitry Osipenko wrote:
> 18.08.2021 19:42, Thierry Reding пишет:
> > On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
> >> 18.08.2021 17:07, Thierry Reding пишет:
> >>> On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
> >>> [...]
> >>>> +struct clk *tegra_clk_register(struct clk_hw *hw)
> >>>> +{
> >>>> +	struct platform_device *pdev;
> >>>> +	struct device *dev = NULL;
> >>>> +	struct device_node *np;
> >>>> +	const char *dev_name;
> >>>> +
> >>>> +	np = tegra_clk_get_of_node(hw);
> >>>> +
> >>>> +	if (!of_device_is_available(np))
> >>>> +		goto put_node;
> >>>> +
> >>>> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
> >>>> +	if (!dev_name)
> >>>> +		goto put_node;
> >>>> +
> >>>> +	pdev = of_platform_device_create(np, dev_name, NULL);
> >>>> +	if (!pdev) {
> >>>> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
> >>>> +		kfree(dev_name);
> >>>> +		goto put_node;
> >>>> +	}
> >>>> +
> >>>> +	dev = &pdev->dev;
> >>>> +	pm_runtime_enable(dev);
> >>>> +put_node:
> >>>> +	of_node_put(np);
> >>>> +
> >>>> +	return clk_register(dev, hw);
> >>>> +}
> >>>
> >>> This looks wrong. Why do we need struct platform_device objects for each
> >>> of these clocks? That's going to be a massive amount of platform devices
> >>> and they will completely mess up sysfs.
> >>
> >> RPM works with a device. It's not a massive amount of devices, it's one
> >> device for T20 and four devices for T30.
> > 
> > I'm still not sure I understand why we need to call RPM functions on a
> > clock. And even if they are few, it seems wrong to make these platform
> > devices.
> 
> Before clock is enabled, we need to raise core voltage. After clock is
> disabled, the voltage should be dropped. CCF+RPM takes care of handling
> this for us.

That's the part that I do understand. What I don't understand is why a
clock needs to be runtime suspend/resumed. Typically we suspend/resume
devices, and doing so typically involves disabling/enabling clocks. So
I don't understand why the clocks themselves now need to be runtime
suspended/resumed.

> > Perhaps they can be simple struct device:s instead? Ideally they would
> > also be parented to the CAR so that they appear in the right place in
> > the sysfs hierarchy.
> 
> Could you please clarify what do you mean by 'simple struct device:s'?
> These clock devices should be OF devices with a of_node and etc,
> otherwise we can't use OPP framework.

Perhaps I misunderstand the goal of the OPP framework. My understanding
was that this was to attach a table of operating points with a device so
that appropriate operating points could be selected and switched to when
the workload changes.

Typically these operating points would be roughly a clock rate and a
corresponding voltage for a regulator, so that when a certain clock rate
is requested, the regulator can be set to the matching voltage.

Hm... so is it that each of these clocks that you want to create a
platform device for has its own regulator? Because the patch series only
mentions the CORE domain, so I assumed that we would accumulate all the
clock rates for the clocks that are part of that CORE domain and then
derive a voltage to be supplied to that CORE domain.

But perhaps I just don't understand correctly how this is tied together.

> We don't have driver for CAR to bind. I guess we could try to add a
> 'dummy' CAR driver that will create sub-devices for the rpm-clocks, is
> this what you're wanting?

I got confused by the "tegra-clock" driver that this series was adding.
This is actually a driver that will bind to the virtual clocks rather
than the CAR device itself.

For some reason I had assumed that you wanted to create a CAR driver in
order to get at the struct device embedded in the CAR's platform device
and use that as the parent for all these clocks.

So even if we absolutely need some struct device for these clocks, maybe
adding that CAR driver and making the clock struct device:s children of
the CAR device will help keep a bit of a proper hierarchy in sysfs.

Thierry

Thierry Reding Aug. 19, 2021, 5:03 p.m. UTC | #43

On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote:
> The SDHCI on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now SDHCI must be resumed using
> runtime PM API in order to initialize the SDHCI power state. The SDHCI
> clock rate must be changed using OPP API that will reconfigure the power
> domain performance state in accordance to the rate. Add runtime PM and OPP
> support to the SDHCI driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/mmc/host/sdhci-tegra.c | 146 ++++++++++++++++++++++++---------
>  1 file changed, 105 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> index 387ce9cdbd7c..a3583359c972 100644
> --- a/drivers/mmc/host/sdhci-tegra.c
> +++ b/drivers/mmc/host/sdhci-tegra.c
> @@ -15,6 +15,8 @@
>  #include <linux/of.h>
>  #include <linux/of_device.h>
>  #include <linux/pinctrl/consumer.h>
> +#include <linux/pm_opp.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/regulator/consumer.h>
>  #include <linux/reset.h>
>  #include <linux/mmc/card.h>
> @@ -24,6 +26,8 @@
>  #include <linux/gpio/consumer.h>
>  #include <linux/ktime.h>
>  
> +#include <soc/tegra/common.h>
> +
>  #include "sdhci-pltfm.h"
>  #include "cqhci.h"
>  
> @@ -123,6 +127,12 @@
>  					 SDHCI_TRNS_BLK_CNT_EN | \
>  					 SDHCI_TRNS_DMA)
>  
> +enum {
> +	TEGRA_CLK_BULK_SDHCI,
> +	TEGRA_CLK_BULK_TMCLK,
> +	TEGRA_CLK_BULK_NUM,
> +};
> +
>  struct sdhci_tegra_soc_data {
>  	const struct sdhci_pltfm_data *pdata;
>  	u64 dma_mask;
> @@ -171,6 +181,8 @@ struct sdhci_tegra {
>  	bool enable_hwcq;
>  	unsigned long curr_clk_rate;
>  	u8 tuned_tap_delay;
> +
> +	struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM];

This doesn't seem worth it to me. There's a lot of churn in this driver
that's only needed to convert this to the clk_bulk API and it makes the
code a lot more difficult to read, in my opinion.

It looks like the only benefit that this gives us is that runtime
suspend and resume become a few lines shorter.

Thierry

Dmitry Osipenko Aug. 19, 2021, 10:09 p.m. UTC | #44

19.08.2021 19:54, Thierry Reding пишет:
> On Wed, Aug 18, 2021 at 08:11:03PM +0300, Dmitry Osipenko wrote:
>> 18.08.2021 19:42, Thierry Reding пишет:
>>> On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
>>>> 18.08.2021 17:07, Thierry Reding пишет:
>>>>> On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
>>>>> [...]
>>>>>> +struct clk *tegra_clk_register(struct clk_hw *hw)
>>>>>> +{
>>>>>> +	struct platform_device *pdev;
>>>>>> +	struct device *dev = NULL;
>>>>>> +	struct device_node *np;
>>>>>> +	const char *dev_name;
>>>>>> +
>>>>>> +	np = tegra_clk_get_of_node(hw);
>>>>>> +
>>>>>> +	if (!of_device_is_available(np))
>>>>>> +		goto put_node;
>>>>>> +
>>>>>> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
>>>>>> +	if (!dev_name)
>>>>>> +		goto put_node;
>>>>>> +
>>>>>> +	pdev = of_platform_device_create(np, dev_name, NULL);
>>>>>> +	if (!pdev) {
>>>>>> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
>>>>>> +		kfree(dev_name);
>>>>>> +		goto put_node;
>>>>>> +	}
>>>>>> +
>>>>>> +	dev = &pdev->dev;
>>>>>> +	pm_runtime_enable(dev);
>>>>>> +put_node:
>>>>>> +	of_node_put(np);
>>>>>> +
>>>>>> +	return clk_register(dev, hw);
>>>>>> +}
>>>>>
>>>>> This looks wrong. Why do we need struct platform_device objects for each
>>>>> of these clocks? That's going to be a massive amount of platform devices
>>>>> and they will completely mess up sysfs.
>>>>
>>>> RPM works with a device. It's not a massive amount of devices, it's one
>>>> device for T20 and four devices for T30.
>>>
>>> I'm still not sure I understand why we need to call RPM functions on a
>>> clock. And even if they are few, it seems wrong to make these platform
>>> devices.
>>
>> Before clock is enabled, we need to raise core voltage. After clock is
>> disabled, the voltage should be dropped. CCF+RPM takes care of handling
>> this for us.
> 
> That's the part that I do understand. What I don't understand is why a
> clock needs to be runtime suspend/resumed. Typically we suspend/resume
> devices, and doing so typically involves disabling/enabling clocks. So
> I don't understand why the clocks themselves now need to be runtime
> suspended/resumed.

CCF provides RPM management for a device that backs clock. When clock is enabled, it resumes the backing device.

RPM, GENPD and OPP frameworks work with a device. We use all these frameworks here. Since we don't have a dedicated device for a PLL clock, we need to create it in order to leverage the existing generic kernel APIs.

In this case clocks are not runtime suspended/resumed, the device which backs clock is suspended/resumed.

>>> Perhaps they can be simple struct device:s instead? Ideally they would
>>> also be parented to the CAR so that they appear in the right place in
>>> the sysfs hierarchy.
>>
>> Could you please clarify what do you mean by 'simple struct device:s'?
>> These clock devices should be OF devices with a of_node and etc,
>> otherwise we can't use OPP framework.
> 
> Perhaps I misunderstand the goal of the OPP framework. My understanding
> was that this was to attach a table of operating points with a device so
> that appropriate operating points could be selected and switched to when
> the workload changes.
> 
> Typically these operating points would be roughly a clock rate and a
> corresponding voltage for a regulator, so that when a certain clock rate
> is requested, the regulator can be set to the matching voltage.
> 
> Hm... so is it that each of these clocks that you want to create a
> platform device for has its own regulator? Because the patch series only
> mentions the CORE domain, so I assumed that we would accumulate all the
> clock rates for the clocks that are part of that CORE domain and then
> derive a voltage to be supplied to that CORE domain.
> 
> But perhaps I just don't understand correctly how this is tied together.

We don't use regulators, we use power domain that controls regulator. GENPD takes care of accumulating performance requests on a per-device basis.

I'm creating platform device for the clocks that require DVFS. These clocks don't use regulator, they are attached to the CORE domain. GENPD framework manages the performance state, aggregating perf votes from each device, i.e. from each clock individually.

You want to reinvent another layer of aggregation on top of GENPD. This doesn't worth the effort, we won't get anything from it, it should be a lot of extra complexity for nothing. We will also lose from it because pm_genpd_summary won't show you a per-device info.

domain                          status          children                           performance
    /device                                             runtime status
----------------------------------------------------------------------------------------------
heg                             on                                                 1000000
    /devices/soc0/50000000.host1x                       active                     1000000
    /devices/soc0/50000000.host1x/54140000.gr2d         suspended                  0
mpe                             off-0                                              0
vdec                            off-0                                              0
    /devices/soc0/6001a000.vde                          suspended                  0
venc                            off-0                                              0
3d1                             off-0                                              0
    /devices/genpd:1:54180000.gr3d                      suspended                  0
3d0                             off-0                                              0
    /devices/genpd:0:54180000.gr3d                      suspended                  0
core-domain                     on                                                 1000000
                                                3d0, 3d1, venc, vdec, mpe, heg
    /devices/soc0/7d000000.usb                          active                     1000000
    /devices/soc0/78000400.mmc                          active                     950000
    /devices/soc0/7000f400.memory-controller            unsupported                1000000
    /devices/soc0/7000a000.pwm                          active                     1000000
    /devices/soc0/60006000.clock/tegra_clk_pll_c        active                     1000000
    /devices/soc0/60006000.clock/tegra_clk_pll_e        suspended                  0
    /devices/soc0/60006000.clock/tegra_clk_pll_m        active                     1000000
    /devices/soc0/60006000.clock/tegra_clk_sclk         active                     1000000


>> We don't have driver for CAR to bind. I guess we could try to add a
>> 'dummy' CAR driver that will create sub-devices for the rpm-clocks, is
>> this what you're wanting?
> 
> I got confused by the "tegra-clock" driver that this series was adding.
> This is actually a driver that will bind to the virtual clocks rather
> than the CAR device itself.
> 
> For some reason I had assumed that you wanted to create a CAR driver in
> order to get at the struct device embedded in the CAR's platform device
> and use that as the parent for all these clocks.
> 
> So even if we absolutely need some struct device for these clocks, maybe
> adding that CAR driver and making the clock struct device:s children of
> the CAR device will help keep a bit of a proper hierarchy in sysfs.

Alright, that's easy to do. We will have to move out some clk data out of __init then. I already implemented it as you may see in the above PD summary.

Dmitry Osipenko Aug. 19, 2021, 10:37 p.m. UTC | #45

19.08.2021 20:03, Thierry Reding пишет:
> On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote:
>> The SDHCI on Tegra belongs to the core power domain and we're going to
>> enable GENPD support for the core domain. Now SDHCI must be resumed using
>> runtime PM API in order to initialize the SDHCI power state. The SDHCI
>> clock rate must be changed using OPP API that will reconfigure the power
>> domain performance state in accordance to the rate. Add runtime PM and OPP
>> support to the SDHCI driver.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/mmc/host/sdhci-tegra.c | 146 ++++++++++++++++++++++++---------
>>  1 file changed, 105 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
>> index 387ce9cdbd7c..a3583359c972 100644
>> --- a/drivers/mmc/host/sdhci-tegra.c
>> +++ b/drivers/mmc/host/sdhci-tegra.c
>> @@ -15,6 +15,8 @@
>>  #include <linux/of.h>
>>  #include <linux/of_device.h>
>>  #include <linux/pinctrl/consumer.h>
>> +#include <linux/pm_opp.h>
>> +#include <linux/pm_runtime.h>
>>  #include <linux/regulator/consumer.h>
>>  #include <linux/reset.h>
>>  #include <linux/mmc/card.h>
>> @@ -24,6 +26,8 @@
>>  #include <linux/gpio/consumer.h>
>>  #include <linux/ktime.h>
>>  
>> +#include <soc/tegra/common.h>
>> +
>>  #include "sdhci-pltfm.h"
>>  #include "cqhci.h"
>>  
>> @@ -123,6 +127,12 @@
>>  					 SDHCI_TRNS_BLK_CNT_EN | \
>>  					 SDHCI_TRNS_DMA)
>>  
>> +enum {
>> +	TEGRA_CLK_BULK_SDHCI,
>> +	TEGRA_CLK_BULK_TMCLK,
>> +	TEGRA_CLK_BULK_NUM,
>> +};
>> +
>>  struct sdhci_tegra_soc_data {
>>  	const struct sdhci_pltfm_data *pdata;
>>  	u64 dma_mask;
>> @@ -171,6 +181,8 @@ struct sdhci_tegra {
>>  	bool enable_hwcq;
>>  	unsigned long curr_clk_rate;
>>  	u8 tuned_tap_delay;
>> +
>> +	struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM];
> 
> This doesn't seem worth it to me. There's a lot of churn in this driver
> that's only needed to convert this to the clk_bulk API and it makes the
> code a lot more difficult to read, in my opinion.
> 
> It looks like the only benefit that this gives us is that runtime
> suspend and resume become a few lines shorter.

The driver probe code looks cleaner with that. You should be looking at
the final result and not at the patch to see it.

Viresh Kumar Aug. 20, 2021, 5:07 a.m. UTC | #46

On 19-08-21, 22:35, Dmitry Osipenko wrote:
> 19.08.2021 16:07, Ulf Hansson пишет:
> > In the other scenario where a consumer driver prefers to *not* call
> > pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
> > to power on the device to complete probing, then we don't want to vote
> > for an OPP at all - and we also want the performance state for the
> > device in genpd to be set to zero. Correct?
> 
> Yes
> 
> > Is this the main problem you are trying to solve, because I think this
> > doesn't work out of the box as of today?
> 
> The main problem is that the restored performance state is zero for the
> first genpd_runtime_resume(), while it's not zero from the h/w perspective.

This is exactly why I have been advocating that the genpd needs to
sync up with the hardware before any calls are made to it from the
consumer driver. Just what clock framework does to get the clock rate.

> > There is another concern though, but perhaps it's not a problem after
> > all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
> > like clock/regulators. That could certainly be problematic, in
> > particular if the device and its genpd have OPP tables associated with
> > it and the consumer driver wants to follow the above sequence in
> > probe.
> 
> dev_pm_opp_set_rate() won't enable clocks and regulators, but it may

It does enable regulators right now, it may choose to enable clocks
later on, no guarantees.

> change the clock rate and voltage. This is also platform/driver specific
> because it's up to OPP user how to configure OPP table. On Tegra we only
> assign clock to OPP table, regulators are unused.

Right, over that platforms can set their own version of set-opp
callback, where all this is done from a platform specific callback.

> > Viresh, can you please chime in here and elaborate on some of the
> > magic happening behind dev_pm_opp_set_rate() API - is there a problem
> > here or not?

It configures clock, regulators, genpds, any required OPPs, + it
enables regulators right now.

Viresh Kumar Aug. 20, 2021, 5:18 a.m. UTC | #47

On 19-08-21, 16:55, Ulf Hansson wrote:
> Right, that sounds reasonable.
> 
> We already have pm_genpd_opp_to_performance_state() which translates
> an OPP to a performance state. This function invokes the
> ->opp_to_performance_state() for a genpd. Maybe we need to allow a
> genpd to not have ->opp_to_performance_state() callback assigned
> though, but continue up in the hierarchy to see if the parent has the
> callback assigned, to make this work for Tegra?
> 
> Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(),
> allowing us to pass the device instead of the genpd. But that's a
> minor thing.

I am not concerned a lot about how it gets implemented, and am not
sure as well, as I haven't looked into these details since sometime.
Any reasonable thing will be accepted, as simple as that.

> Finally, the precondition to use the above, is to first get a handle
> to an OPP table. This is where I am struggling to find a generic
> solution, because I guess that would be platform or even consumer
> driver specific for how to do this. And at what point should we do
> this?

Hmm, I am not very clear with the whole picture at this point of time.

Dmitry, can you try to frame a sequence of events/calls/etc that will
define what kind of devices we are looking at here, and how this can
be made to work ?

> > > Viresh, please take a look at what I did in [1]. Maybe it could be done
> > > in another way.
> >
> > I looked into this and looked like too much trouble. The
> > implementation needs to be simple. I am not sure I understand all the
> > problems you faced while doing that, would be better to start with a
> > simpler implementation of get_performance_state() kind of API for
> > genpd, after the domain is attached and its OPP table is initialized.
> >
> > Note, that the OPP table isn't required to be fully initialized for
> > the device at this point, we can parse the DT as well if needed be.
> 
> Sure, but as I indicated above, you need some kind of input data to
> figure out what OPP table to pick, before you can translate that into
> a performance state. Is that always the clock rate, for example?

Eventually it can be clock, bandwidth, or pstate of anther genpd, not
sure what all we are looking for now. It should be just clock right
now as far as I can imagine :)

> Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or
> what do you think? Do you have other suggestions?

We already have similar APIs, so that won't be a problem. We also have
a mechanism inside the OPP core, frequency based, which is used to
guess the current OPP. Maybe we can enhance and use that directly
here.

Thierry Reding Aug. 20, 2021, 11:35 a.m. UTC | #48

On Fri, Aug 20, 2021 at 01:37:13AM +0300, Dmitry Osipenko wrote:
> 19.08.2021 20:03, Thierry Reding пишет:
> > On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote:
> >> The SDHCI on Tegra belongs to the core power domain and we're going to
> >> enable GENPD support for the core domain. Now SDHCI must be resumed using
> >> runtime PM API in order to initialize the SDHCI power state. The SDHCI
> >> clock rate must be changed using OPP API that will reconfigure the power
> >> domain performance state in accordance to the rate. Add runtime PM and OPP
> >> support to the SDHCI driver.
> >>
> >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >> ---
> >>  drivers/mmc/host/sdhci-tegra.c | 146 ++++++++++++++++++++++++---------
> >>  1 file changed, 105 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
> >> index 387ce9cdbd7c..a3583359c972 100644
> >> --- a/drivers/mmc/host/sdhci-tegra.c
> >> +++ b/drivers/mmc/host/sdhci-tegra.c
> >> @@ -15,6 +15,8 @@
> >>  #include <linux/of.h>
> >>  #include <linux/of_device.h>
> >>  #include <linux/pinctrl/consumer.h>
> >> +#include <linux/pm_opp.h>
> >> +#include <linux/pm_runtime.h>
> >>  #include <linux/regulator/consumer.h>
> >>  #include <linux/reset.h>
> >>  #include <linux/mmc/card.h>
> >> @@ -24,6 +26,8 @@
> >>  #include <linux/gpio/consumer.h>
> >>  #include <linux/ktime.h>
> >>  
> >> +#include <soc/tegra/common.h>
> >> +
> >>  #include "sdhci-pltfm.h"
> >>  #include "cqhci.h"
> >>  
> >> @@ -123,6 +127,12 @@
> >>  					 SDHCI_TRNS_BLK_CNT_EN | \
> >>  					 SDHCI_TRNS_DMA)
> >>  
> >> +enum {
> >> +	TEGRA_CLK_BULK_SDHCI,
> >> +	TEGRA_CLK_BULK_TMCLK,
> >> +	TEGRA_CLK_BULK_NUM,
> >> +};
> >> +
> >>  struct sdhci_tegra_soc_data {
> >>  	const struct sdhci_pltfm_data *pdata;
> >>  	u64 dma_mask;
> >> @@ -171,6 +181,8 @@ struct sdhci_tegra {
> >>  	bool enable_hwcq;
> >>  	unsigned long curr_clk_rate;
> >>  	u8 tuned_tap_delay;
> >> +
> >> +	struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM];
> > 
> > This doesn't seem worth it to me. There's a lot of churn in this driver
> > that's only needed to convert this to the clk_bulk API and it makes the
> > code a lot more difficult to read, in my opinion.
> > 
> > It looks like the only benefit that this gives us is that runtime
> > suspend and resume become a few lines shorter.
> 
> The driver probe code looks cleaner with that. You should be looking at
> the final result and not at the patch to see it.

I did look at the final result and didn't find it cleaner at all. =)

Thierry

Thierry Reding Aug. 20, 2021, 11:42 a.m. UTC | #49

On Fri, Aug 20, 2021 at 01:09:46AM +0300, Dmitry Osipenko wrote:
> 19.08.2021 19:54, Thierry Reding пишет:
> > On Wed, Aug 18, 2021 at 08:11:03PM +0300, Dmitry Osipenko wrote:
> >> 18.08.2021 19:42, Thierry Reding пишет:
> >>> On Wed, Aug 18, 2021 at 06:05:21PM +0300, Dmitry Osipenko wrote:
> >>>> 18.08.2021 17:07, Thierry Reding пишет:
> >>>>> On Tue, Aug 17, 2021 at 04:27:27AM +0300, Dmitry Osipenko wrote:
> >>>>> [...]
> >>>>>> +struct clk *tegra_clk_register(struct clk_hw *hw)
> >>>>>> +{
> >>>>>> +	struct platform_device *pdev;
> >>>>>> +	struct device *dev = NULL;
> >>>>>> +	struct device_node *np;
> >>>>>> +	const char *dev_name;
> >>>>>> +
> >>>>>> +	np = tegra_clk_get_of_node(hw);
> >>>>>> +
> >>>>>> +	if (!of_device_is_available(np))
> >>>>>> +		goto put_node;
> >>>>>> +
> >>>>>> +	dev_name = kasprintf(GFP_KERNEL, "tegra_clk_%s", hw->init->name);
> >>>>>> +	if (!dev_name)
> >>>>>> +		goto put_node;
> >>>>>> +
> >>>>>> +	pdev = of_platform_device_create(np, dev_name, NULL);
> >>>>>> +	if (!pdev) {
> >>>>>> +		pr_err("%s: failed to create device for %pOF\n", __func__, np);
> >>>>>> +		kfree(dev_name);
> >>>>>> +		goto put_node;
> >>>>>> +	}
> >>>>>> +
> >>>>>> +	dev = &pdev->dev;
> >>>>>> +	pm_runtime_enable(dev);
> >>>>>> +put_node:
> >>>>>> +	of_node_put(np);
> >>>>>> +
> >>>>>> +	return clk_register(dev, hw);
> >>>>>> +}
> >>>>>
> >>>>> This looks wrong. Why do we need struct platform_device objects for each
> >>>>> of these clocks? That's going to be a massive amount of platform devices
> >>>>> and they will completely mess up sysfs.
> >>>>
> >>>> RPM works with a device. It's not a massive amount of devices, it's one
> >>>> device for T20 and four devices for T30.
> >>>
> >>> I'm still not sure I understand why we need to call RPM functions on a
> >>> clock. And even if they are few, it seems wrong to make these platform
> >>> devices.
> >>
> >> Before clock is enabled, we need to raise core voltage. After clock is
> >> disabled, the voltage should be dropped. CCF+RPM takes care of handling
> >> this for us.
> > 
> > That's the part that I do understand. What I don't understand is why a
> > clock needs to be runtime suspend/resumed. Typically we suspend/resume
> > devices, and doing so typically involves disabling/enabling clocks. So
> > I don't understand why the clocks themselves now need to be runtime
> > suspended/resumed.
> 
> CCF provides RPM management for a device that backs clock. When clock
> is enabled, it resumes the backing device.
> 
> RPM, GENPD and OPP frameworks work with a device. We use all these
> frameworks here. Since we don't have a dedicated device for a PLL
> clock, we need to create it in order to leverage the existing generic
> kernel APIs.
> 
> In this case clocks are not runtime suspended/resumed, the device
> which backs clock is suspended/resumed.
> 
> >>> Perhaps they can be simple struct device:s instead? Ideally they would
> >>> also be parented to the CAR so that they appear in the right place in
> >>> the sysfs hierarchy.
> >>
> >> Could you please clarify what do you mean by 'simple struct device:s'?
> >> These clock devices should be OF devices with a of_node and etc,
> >> otherwise we can't use OPP framework.
> > 
> > Perhaps I misunderstand the goal of the OPP framework. My understanding
> > was that this was to attach a table of operating points with a device so
> > that appropriate operating points could be selected and switched to when
> > the workload changes.
> > 
> > Typically these operating points would be roughly a clock rate and a
> > corresponding voltage for a regulator, so that when a certain clock rate
> > is requested, the regulator can be set to the matching voltage.
> > 
> > Hm... so is it that each of these clocks that you want to create a
> > platform device for has its own regulator? Because the patch series only
> > mentions the CORE domain, so I assumed that we would accumulate all the
> > clock rates for the clocks that are part of that CORE domain and then
> > derive a voltage to be supplied to that CORE domain.
> > 
> > But perhaps I just don't understand correctly how this is tied together.
> 
> We don't use regulators, we use power domain that controls regulator.
> GENPD takes care of accumulating performance requests on a per-device
> basis.
> 
> I'm creating platform device for the clocks that require DVFS. These
> clocks don't use regulator, they are attached to the CORE domain.
> GENPD framework manages the performance state, aggregating perf votes
> from each device, i.e. from each clock individually.
> 
> You want to reinvent another layer of aggregation on top of GENPD.
> This doesn't worth the effort, we won't get anything from it, it
> should be a lot of extra complexity for nothing. We will also lose
> from it because pm_genpd_summary won't show you a per-device info.
> 
> domain                          status          children                           performance
>     /device                                             runtime status
> ----------------------------------------------------------------------------------------------
> heg                             on                                                 1000000
>     /devices/soc0/50000000.host1x                       active                     1000000
>     /devices/soc0/50000000.host1x/54140000.gr2d         suspended                  0
> mpe                             off-0                                              0
> vdec                            off-0                                              0
>     /devices/soc0/6001a000.vde                          suspended                  0
> venc                            off-0                                              0
> 3d1                             off-0                                              0
>     /devices/genpd:1:54180000.gr3d                      suspended                  0
> 3d0                             off-0                                              0
>     /devices/genpd:0:54180000.gr3d                      suspended                  0
> core-domain                     on                                                 1000000
>                                                 3d0, 3d1, venc, vdec, mpe, heg
>     /devices/soc0/7d000000.usb                          active                     1000000
>     /devices/soc0/78000400.mmc                          active                     950000
>     /devices/soc0/7000f400.memory-controller            unsupported                1000000
>     /devices/soc0/7000a000.pwm                          active                     1000000
>     /devices/soc0/60006000.clock/tegra_clk_pll_c        active                     1000000
>     /devices/soc0/60006000.clock/tegra_clk_pll_e        suspended                  0
>     /devices/soc0/60006000.clock/tegra_clk_pll_m        active                     1000000
>     /devices/soc0/60006000.clock/tegra_clk_sclk         active                     1000000
> 

I suppose if there's really no good way of doing this other than
providing a struct device, then so be it. I think the cleaned up sysfs
shown in the summary above looks much better than what the original
would've looked like.

Perhaps an additional tweak to that would be to not create platform
devices. Instead, just create struct device. Those really have
everything you need (.of_node, and can be used with RPM and GENPD). As I
mentioned earlier, platform device implies a CPU-memory-mapped bus,
which this clearly isn't. It's kind of a separate "bus" if you want, so
just using struct device directly seems more appropriate.

We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
for an example of how that was done. I think you can do something
similar here.

Thierry

Ulf Hansson Aug. 20, 2021, 12:42 p.m. UTC | #50

On Thu, 19 Aug 2021 at 21:35, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 19.08.2021 16:07, Ulf Hansson пишет:
> > On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 18.08.2021 13:08, Ulf Hansson пишет:
> >>> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>>
> >>>> On 18-08-21, 11:41, Ulf Hansson wrote:
> >>>>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >>>>>> What we need here is just configure. So something like this then:
> >>>>>>
> >>>>>> - genpd->get_performance_state()
> >>>>>>   -> dev_pm_opp_get_current_opp() //New API
> >>>>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
> >>>>>>
> >>>>>> This can be done just once from probe() then.
> >>>>>
> >>>>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
> >>>>
> >>>> The opp core already has a way of finding current OPP, that's what
> >>>> Dmitry is trying to use here. It finds it using clk_get_rate(), if
> >>>> that is zero, it picks the lowest freq possible.
> >>>>
> >>>>> I am sure I understand the problem. When a device is getting probed,
> >>>>> it needs to consume power, how else can the corresponding driver
> >>>>> successfully probe it?
> >>>>
> >>>> Dmitry can answer that better, but a device doesn't necessarily need
> >>>> to consume energy in probe. It can consume bus clock, like APB we
> >>>> have, but the more energy consuming stuff can be left disabled until
> >>>> the time a user comes up. Probe will just end up registering the
> >>>> driver and initializing it.
> >>>
> >>> That's perfectly fine, as then it's likely that it won't vote for an
> >>> OPP, but can postpone that as well.
> >>>
> >>> Perhaps the problem is rather that the HW may already carry a non-zero
> >>> vote made from a bootloader. If the consumer driver tries to clear
> >>> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
> >>> still not lead to any updates of the performance state in genpd,
> >>> because genpd internally has initialized the performance-state to
> >>> zero.
> >>
> >> We don't need to discover internal SoC devices because we use
> >> device-tree on ARM. For most devices power isn't required at a probe
> >> time because probe function doesn't touch h/w at all, thus devices are
> >> left in suspended state after probe.
> >>
> >> We have three components comprising PM on Tegra:
> >>
> >> 1. Power gate
> >> 2. Clock state
> >> 3. Voltage state
> >>
> >> GENPD on/off represents the 'power gate'.
> >>
> >> Clock and reset are controlled by device drivers using clk and rst APIs.
> >>
> >> Voltage state is represented by GENPD's performance level.
> >>
> >> GENPD core assumes that at a first rpm-resume of a consumer device, its
> >> genpd_performance=0. Not true for Tegra because h/w of the device is
> >> preconfigured to a non-zero perf level initially, h/w may not support
> >> zero level at all.
> >
> > I think you may be misunderstanding genpd's behaviour around this, but
> > let me elaborate.
> >
> > In genpd_runtime_resume(), we try to restore the performance state for
> > the device that genpd_runtime_suspend() *may* have dropped earlier.
> > That means, if genpd_runtime_resume() is called prior
> > genpd_runtime_suspend() for the first time, it means that
> > genpd_runtime_resume() will *not* restore a performance state, but
> > instead just leave the performance state as is for the device (see
> > genpd_restore_performance_state()).
> >
> > In other words, a consumer driver may use the following sequence to
> > set an initial performance state for the device during ->probe():
> >
> > ...
> > rate = clk_get_rate()
> > dev_pm_opp_set_rate(rate)
> >
> > pm_runtime_enable()
> > pm_runtime_resume_and_get()
> > ...
> >
> > Note that, it's the consumer driver's responsibility to manage device
> > specific resources, in its ->runtime_suspend|resume() callbacks.
> > Typically that means dealing with clock gating/ungating, for example.
> >
> > In the other scenario where a consumer driver prefers to *not* call
> > pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
> > to power on the device to complete probing, then we don't want to vote
> > for an OPP at all - and we also want the performance state for the
> > device in genpd to be set to zero. Correct?
>
> Yes
>
> > Is this the main problem you are trying to solve, because I think this
> > doesn't work out of the box as of today?
>
> The main problem is that the restored performance state is zero for the
> first genpd_runtime_resume(), while it's not zero from the h/w perspective.

This should not be a problem, but can be handled by the consumer driver.

genpd_runtime_resume() calls genpd_restore_performance_state() to
restore a performance state for the device. However, in the scenario
you describe, "gpd_data->rpm_pstate" is zero, which makes
genpd_restore_performance_state() to just leave the device's
performance state as is - it will *not* restore the performance state
to zero.

To make the consumer driver deal with this, it would need to call
dev_pm_opp_set_rate() from within its ->runtime_resume() callback.

>
> > There is another concern though, but perhaps it's not a problem after
> > all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
> > like clock/regulators. That could certainly be problematic, in
> > particular if the device and its genpd have OPP tables associated with
> > it and the consumer driver wants to follow the above sequence in
> > probe.
>
> dev_pm_opp_set_rate() won't enable clocks and regulators, but it may
> change the clock rate and voltage. This is also platform/driver specific
> because it's up to OPP user how to configure OPP table. On Tegra we only
> assign clock to OPP table, regulators are unused.
>
> > Viresh, can you please chime in here and elaborate on some of the
> > magic happening behind dev_pm_opp_set_rate() API - is there a problem
> > here or not?
> >
> >>
> >> GENPD core assumes that consumer devices can work at any performance
> >> level. Not true for Tegra because voltage needs to be set in accordance
> >> to the clock rate before clock is enabled, otherwise h/w won't work
> >> properly, perhaps clock may be unstable or h/w won't be latching.
> >
> > Correct. Genpd relies on the callers to use the OPP framework if there
> > are constraints like you describe above.
> >
> > That said, it's not forbidden for a consumer driver to call
> > dev_pm_genpd_set_performance_state() directly, but then it better
> > knows exactly what it's doing.
> >
> >>
> >> Performance level should be set to 0 while device is suspended.
> >
> > Do you mean system suspend or runtime suspend? Or both?
>
> Runtime suspend.

Alright. So that's already taken care of for us in genpd_runtime_suspend().

Or perhaps you have discovered some problem with this?

>
> >> Performance level needs to be bumped on rpm-resume of a device in
> >> accordance to h/w state before hardware is enabled.
> >
> > Assuming there was a performance state set for the device when
> > genpd_runtime_suspend() was called, genpd_runtime_resume() will
> > restore that state according to the sequence you described.
>
> What do you think about adding API that will allow drivers to explicitly
> set the restored performance state of a power domain?
>
> Another option could be to change the GENPD core, making it to set the
> rpm_pstate when dev_pm_genpd_set_performance_state(dev) is invoked and
> device is rpm-suspended, instead of calling the
> genpd->set_performance_state callback.
>
> Then drivers will be able to sync the perf state at a probe time.
>
> What do you think?

I don't think it's needed, see my reply earlier above. However your
change touches another problem though, see below.

>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index a934c679e6ce..cc15ab9eacc9 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -435,7 +435,7 @@ static void genpd_restore_performance_state(struct
> device *dev,
>  int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int
> state)
>  {
>         struct generic_pm_domain *genpd;
> -       int ret;
> +       int ret = 0;
>
>         genpd = dev_to_genpd_safe(dev);
>         if (!genpd)
> @@ -446,7 +446,10 @@ int dev_pm_genpd_set_performance_state(struct
> device *dev, unsigned int state)
>                 return -EINVAL;
>
>         genpd_lock(genpd);
> -       ret = genpd_set_performance_state(dev, state);
> +       if (pm_runtime_suspended(dev))
> +               dev_gpd_data(dev)->rpm_pstate = state;
> +       else
> +               ret = genpd_set_performance_state(dev, state);
>         genpd_unlock(genpd);

This doesn't work for all cases. For example, when a consumer driver
deploys runtime PM support in its ->probe() according to the below
sequence:

...
dev_pm_opp_set_rate(rate)
pm_runtime_get_noresume()
pm_runtime_set_active()
pm_runtime_enable()
...
pm_runtime_put()
...

We need to call genpd_set_performance_state() independently of whether
the device is runtime suspended or not.

Although, it actually seems like good idea to update
dev_gpd_data(dev)->rpm_pstate = state here, as to make sure
genpd_runtime_resume() doesn't restore an old/invalid value that was
saved while dropping the performance state vote for the device in
genpd_runtime_suspend() earlier.

Let me send a patch for this shortly, to close this window of a possible error.

>
>         return ret;
>
>

Kind regards
Uffe

Ulf Hansson Aug. 20, 2021, 12:57 p.m. UTC | #51

On Fri, 20 Aug 2021 at 07:18, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 19-08-21, 16:55, Ulf Hansson wrote:
> > Right, that sounds reasonable.
> >
> > We already have pm_genpd_opp_to_performance_state() which translates
> > an OPP to a performance state. This function invokes the
> > ->opp_to_performance_state() for a genpd. Maybe we need to allow a
> > genpd to not have ->opp_to_performance_state() callback assigned
> > though, but continue up in the hierarchy to see if the parent has the
> > callback assigned, to make this work for Tegra?
> >
> > Perhaps we should add an API dev_pm_genpd_opp_to_performance_state(),
> > allowing us to pass the device instead of the genpd. But that's a
> > minor thing.
>
> I am not concerned a lot about how it gets implemented, and am not
> sure as well, as I haven't looked into these details since sometime.
> Any reasonable thing will be accepted, as simple as that.
>
> > Finally, the precondition to use the above, is to first get a handle
> > to an OPP table. This is where I am struggling to find a generic
> > solution, because I guess that would be platform or even consumer
> > driver specific for how to do this. And at what point should we do
> > this?
>
> Hmm, I am not very clear with the whole picture at this point of time.
>
> Dmitry, can you try to frame a sequence of events/calls/etc that will
> define what kind of devices we are looking at here, and how this can
> be made to work ?
>
> > > > Viresh, please take a look at what I did in [1]. Maybe it could be done
> > > > in another way.
> > >
> > > I looked into this and looked like too much trouble. The
> > > implementation needs to be simple. I am not sure I understand all the
> > > problems you faced while doing that, would be better to start with a
> > > simpler implementation of get_performance_state() kind of API for
> > > genpd, after the domain is attached and its OPP table is initialized.
> > >
> > > Note, that the OPP table isn't required to be fully initialized for
> > > the device at this point, we can parse the DT as well if needed be.
> >
> > Sure, but as I indicated above, you need some kind of input data to
> > figure out what OPP table to pick, before you can translate that into
> > a performance state. Is that always the clock rate, for example?
>
> Eventually it can be clock, bandwidth, or pstate of anther genpd, not
> sure what all we are looking for now. It should be just clock right
> now as far as I can imagine :)
>
> > Perhaps, we should start with adding a dev_pm_opp_get_from_rate() or
> > what do you think? Do you have other suggestions?
>
> We already have similar APIs, so that won't be a problem. We also have
> a mechanism inside the OPP core, frequency based, which is used to
> guess the current OPP. Maybe we can enhance and use that directly
> here.

After reading the last reply from Dmitry, I am starting to think that
the problem he is facing can be described and solved in a much easier
way.

If I am correct, it looks like we don't need to add APIs to get OPPs
for a clock rate or set initial performance state values according to
the HW in genpd.

See my other response to Dmitry, let's see where that leads us.

Kind regards
Uffe

Ulf Hansson Aug. 20, 2021, 1:08 p.m. UTC | #52

[...]

> >
> > I'm creating platform device for the clocks that require DVFS. These
> > clocks don't use regulator, they are attached to the CORE domain.
> > GENPD framework manages the performance state, aggregating perf votes
> > from each device, i.e. from each clock individually.
> >
> > You want to reinvent another layer of aggregation on top of GENPD.
> > This doesn't worth the effort, we won't get anything from it, it
> > should be a lot of extra complexity for nothing. We will also lose
> > from it because pm_genpd_summary won't show you a per-device info.
> >
> > domain                          status          children                           performance
> >     /device                                             runtime status
> > ----------------------------------------------------------------------------------------------
> > heg                             on                                                 1000000
> >     /devices/soc0/50000000.host1x                       active                     1000000
> >     /devices/soc0/50000000.host1x/54140000.gr2d         suspended                  0
> > mpe                             off-0                                              0
> > vdec                            off-0                                              0
> >     /devices/soc0/6001a000.vde                          suspended                  0
> > venc                            off-0                                              0
> > 3d1                             off-0                                              0
> >     /devices/genpd:1:54180000.gr3d                      suspended                  0
> > 3d0                             off-0                                              0
> >     /devices/genpd:0:54180000.gr3d                      suspended                  0
> > core-domain                     on                                                 1000000
> >                                                 3d0, 3d1, venc, vdec, mpe, heg
> >     /devices/soc0/7d000000.usb                          active                     1000000
> >     /devices/soc0/78000400.mmc                          active                     950000
> >     /devices/soc0/7000f400.memory-controller            unsupported                1000000
> >     /devices/soc0/7000a000.pwm                          active                     1000000
> >     /devices/soc0/60006000.clock/tegra_clk_pll_c        active                     1000000
> >     /devices/soc0/60006000.clock/tegra_clk_pll_e        suspended                  0
> >     /devices/soc0/60006000.clock/tegra_clk_pll_m        active                     1000000
> >     /devices/soc0/60006000.clock/tegra_clk_sclk         active                     1000000
> >
>
> I suppose if there's really no good way of doing this other than
> providing a struct device, then so be it. I think the cleaned up sysfs
> shown in the summary above looks much better than what the original
> would've looked like.
>
> Perhaps an additional tweak to that would be to not create platform
> devices. Instead, just create struct device. Those really have
> everything you need (.of_node, and can be used with RPM and GENPD). As I
> mentioned earlier, platform device implies a CPU-memory-mapped bus,
> which this clearly isn't. It's kind of a separate "bus" if you want, so
> just using struct device directly seems more appropriate.

Just a heads up. If you don't use a platform device or have a driver
associated with it for probing, you need to manage the attachment to
genpd yourself. That means calling one of the dev_pm_domain_attach*()
APIs, but that's perfectly fine, ofcourse.

>
> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
> for an example of how that was done. I think you can do something
> similar here.
>
> Thierry

Kind regards
Uffe

Dmitry Osipenko Aug. 21, 2021, 5:34 p.m. UTC | #53

20.08.2021 15:42, Ulf Hansson пишет:
> On Thu, 19 Aug 2021 at 21:35, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 19.08.2021 16:07, Ulf Hansson пишет:
>>> On Wed, 18 Aug 2021 at 17:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> 18.08.2021 13:08, Ulf Hansson пишет:
>>>>> On Wed, 18 Aug 2021 at 11:50, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>>>>
>>>>>> On 18-08-21, 11:41, Ulf Hansson wrote:
>>>>>>> On Wed, 18 Aug 2021 at 11:14, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>>>>>>> What we need here is just configure. So something like this then:
>>>>>>>>
>>>>>>>> - genpd->get_performance_state()
>>>>>>>>   -> dev_pm_opp_get_current_opp() //New API
>>>>>>>>   -> dev_pm_genpd_set_performance_state(dev, current_opp->pstate);
>>>>>>>>
>>>>>>>> This can be done just once from probe() then.
>>>>>>>
>>>>>>> How would dev_pm_opp_get_current_opp() work? Do you have a suggestion?
>>>>>>
>>>>>> The opp core already has a way of finding current OPP, that's what
>>>>>> Dmitry is trying to use here. It finds it using clk_get_rate(), if
>>>>>> that is zero, it picks the lowest freq possible.
>>>>>>
>>>>>>> I am sure I understand the problem. When a device is getting probed,
>>>>>>> it needs to consume power, how else can the corresponding driver
>>>>>>> successfully probe it?
>>>>>>
>>>>>> Dmitry can answer that better, but a device doesn't necessarily need
>>>>>> to consume energy in probe. It can consume bus clock, like APB we
>>>>>> have, but the more energy consuming stuff can be left disabled until
>>>>>> the time a user comes up. Probe will just end up registering the
>>>>>> driver and initializing it.
>>>>>
>>>>> That's perfectly fine, as then it's likely that it won't vote for an
>>>>> OPP, but can postpone that as well.
>>>>>
>>>>> Perhaps the problem is rather that the HW may already carry a non-zero
>>>>> vote made from a bootloader. If the consumer driver tries to clear
>>>>> that vote (calling dev_pm_opp_set_rate(dev, 0), for example), it would
>>>>> still not lead to any updates of the performance state in genpd,
>>>>> because genpd internally has initialized the performance-state to
>>>>> zero.
>>>>
>>>> We don't need to discover internal SoC devices because we use
>>>> device-tree on ARM. For most devices power isn't required at a probe
>>>> time because probe function doesn't touch h/w at all, thus devices are
>>>> left in suspended state after probe.
>>>>
>>>> We have three components comprising PM on Tegra:
>>>>
>>>> 1. Power gate
>>>> 2. Clock state
>>>> 3. Voltage state
>>>>
>>>> GENPD on/off represents the 'power gate'.
>>>>
>>>> Clock and reset are controlled by device drivers using clk and rst APIs.
>>>>
>>>> Voltage state is represented by GENPD's performance level.
>>>>
>>>> GENPD core assumes that at a first rpm-resume of a consumer device, its
>>>> genpd_performance=0. Not true for Tegra because h/w of the device is
>>>> preconfigured to a non-zero perf level initially, h/w may not support
>>>> zero level at all.
>>>
>>> I think you may be misunderstanding genpd's behaviour around this, but
>>> let me elaborate.
>>>
>>> In genpd_runtime_resume(), we try to restore the performance state for
>>> the device that genpd_runtime_suspend() *may* have dropped earlier.
>>> That means, if genpd_runtime_resume() is called prior
>>> genpd_runtime_suspend() for the first time, it means that
>>> genpd_runtime_resume() will *not* restore a performance state, but
>>> instead just leave the performance state as is for the device (see
>>> genpd_restore_performance_state()).
>>>
>>> In other words, a consumer driver may use the following sequence to
>>> set an initial performance state for the device during ->probe():
>>>
>>> ...
>>> rate = clk_get_rate()
>>> dev_pm_opp_set_rate(rate)
>>>
>>> pm_runtime_enable()
>>> pm_runtime_resume_and_get()
>>> ...
>>>
>>> Note that, it's the consumer driver's responsibility to manage device
>>> specific resources, in its ->runtime_suspend|resume() callbacks.
>>> Typically that means dealing with clock gating/ungating, for example.
>>>
>>> In the other scenario where a consumer driver prefers to *not* call
>>> pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
>>> to power on the device to complete probing, then we don't want to vote
>>> for an OPP at all - and we also want the performance state for the
>>> device in genpd to be set to zero. Correct?
>>
>> Yes
>>
>>> Is this the main problem you are trying to solve, because I think this
>>> doesn't work out of the box as of today?
>>
>> The main problem is that the restored performance state is zero for the
>> first genpd_runtime_resume(), while it's not zero from the h/w perspective.
> 
> This should not be a problem, but can be handled by the consumer driver.
> 
> genpd_runtime_resume() calls genpd_restore_performance_state() to
> restore a performance state for the device. However, in the scenario
> you describe, "gpd_data->rpm_pstate" is zero, which makes
> genpd_restore_performance_state() to just leave the device's
> performance state as is - it will *not* restore the performance state
> to zero.
> 
> To make the consumer driver deal with this, it would need to call
> dev_pm_opp_set_rate() from within its ->runtime_resume() callback.
> 
>>
>>> There is another concern though, but perhaps it's not a problem after
>>> all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
>>> like clock/regulators. That could certainly be problematic, in
>>> particular if the device and its genpd have OPP tables associated with
>>> it and the consumer driver wants to follow the above sequence in
>>> probe.
>>
>> dev_pm_opp_set_rate() won't enable clocks and regulators, but it may
>> change the clock rate and voltage. This is also platform/driver specific
>> because it's up to OPP user how to configure OPP table. On Tegra we only
>> assign clock to OPP table, regulators are unused.
>>
>>> Viresh, can you please chime in here and elaborate on some of the
>>> magic happening behind dev_pm_opp_set_rate() API - is there a problem
>>> here or not?
>>>
>>>>
>>>> GENPD core assumes that consumer devices can work at any performance
>>>> level. Not true for Tegra because voltage needs to be set in accordance
>>>> to the clock rate before clock is enabled, otherwise h/w won't work
>>>> properly, perhaps clock may be unstable or h/w won't be latching.
>>>
>>> Correct. Genpd relies on the callers to use the OPP framework if there
>>> are constraints like you describe above.
>>>
>>> That said, it's not forbidden for a consumer driver to call
>>> dev_pm_genpd_set_performance_state() directly, but then it better
>>> knows exactly what it's doing.
>>>
>>>>
>>>> Performance level should be set to 0 while device is suspended.
>>>
>>> Do you mean system suspend or runtime suspend? Or both?
>>
>> Runtime suspend.
> 
> Alright. So that's already taken care of for us in genpd_runtime_suspend().
> 
> Or perhaps you have discovered some problem with this?
> 
>>
>>>> Performance level needs to be bumped on rpm-resume of a device in
>>>> accordance to h/w state before hardware is enabled.
>>>
>>> Assuming there was a performance state set for the device when
>>> genpd_runtime_suspend() was called, genpd_runtime_resume() will
>>> restore that state according to the sequence you described.
>>
>> What do you think about adding API that will allow drivers to explicitly
>> set the restored performance state of a power domain?
>>
>> Another option could be to change the GENPD core, making it to set the
>> rpm_pstate when dev_pm_genpd_set_performance_state(dev) is invoked and
>> device is rpm-suspended, instead of calling the
>> genpd->set_performance_state callback.
>>
>> Then drivers will be able to sync the perf state at a probe time.
>>
>> What do you think?
> 
> I don't think it's needed, see my reply earlier above. However your
> change touches another problem though, see below.
> 
>>
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index a934c679e6ce..cc15ab9eacc9 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -435,7 +435,7 @@ static void genpd_restore_performance_state(struct
>> device *dev,
>>  int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int
>> state)
>>  {
>>         struct generic_pm_domain *genpd;
>> -       int ret;
>> +       int ret = 0;
>>
>>         genpd = dev_to_genpd_safe(dev);
>>         if (!genpd)
>> @@ -446,7 +446,10 @@ int dev_pm_genpd_set_performance_state(struct
>> device *dev, unsigned int state)
>>                 return -EINVAL;
>>
>>         genpd_lock(genpd);
>> -       ret = genpd_set_performance_state(dev, state);
>> +       if (pm_runtime_suspended(dev))
>> +               dev_gpd_data(dev)->rpm_pstate = state;
>> +       else
>> +               ret = genpd_set_performance_state(dev, state);
>>         genpd_unlock(genpd);
> 
> This doesn't work for all cases. For example, when a consumer driver
> deploys runtime PM support in its ->probe() according to the below
> sequence:
> 
> ...
> dev_pm_opp_set_rate(rate)
> pm_runtime_get_noresume()
> pm_runtime_set_active()
> pm_runtime_enable()
> ...
> pm_runtime_put()
> ...
> 
> We need to call genpd_set_performance_state() independently of whether
> the device is runtime suspended or not.

I don't see where is the problem in yours example.

pm_runtime_suspended() = false while RPM is disabled. When device is
resumed, the rpm_pstate=0, so it won't change the pstate on resume.

> Although, it actually seems like good idea to update
> dev_gpd_data(dev)->rpm_pstate = state here, as to make sure
> genpd_runtime_resume() doesn't restore an old/invalid value that was
> saved while dropping the performance state vote for the device in
> genpd_runtime_suspend() earlier.
> 
> Let me send a patch for this shortly, to close this window of a possible error.

It will also remove the need to resume device just to change the clock
rate, like I needed to do it in the PWM patch of this series.

Dmitry Osipenko Aug. 21, 2021, 5:45 p.m. UTC | #54

20.08.2021 16:08, Ulf Hansson пишет:
...
>> I suppose if there's really no good way of doing this other than
>> providing a struct device, then so be it. I think the cleaned up sysfs
>> shown in the summary above looks much better than what the original
>> would've looked like.
>>
>> Perhaps an additional tweak to that would be to not create platform
>> devices. Instead, just create struct device. Those really have
>> everything you need (.of_node, and can be used with RPM and GENPD). As I
>> mentioned earlier, platform device implies a CPU-memory-mapped bus,
>> which this clearly isn't. It's kind of a separate "bus" if you want, so
>> just using struct device directly seems more appropriate.
> 
> Just a heads up. If you don't use a platform device or have a driver
> associated with it for probing, you need to manage the attachment to
> genpd yourself. That means calling one of the dev_pm_domain_attach*()
> APIs, but that's perfectly fine, ofcourse.
> 
>>
>> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
>> for an example of how that was done. I think you can do something
>> similar here.

We need a platform device because we have a platform device driver that
must be bound to the device, otherwise PMC driver state won't be synced
since it it's synced after all drivers of devices that reference PMC
node in DT are probed.

Ulf Hansson Aug. 23, 2021, 10:46 a.m. UTC | #55

[...]

> >>>> We have three components comprising PM on Tegra:
> >>>>
> >>>> 1. Power gate
> >>>> 2. Clock state
> >>>> 3. Voltage state
> >>>>
> >>>> GENPD on/off represents the 'power gate'.
> >>>>
> >>>> Clock and reset are controlled by device drivers using clk and rst APIs.
> >>>>
> >>>> Voltage state is represented by GENPD's performance level.
> >>>>
> >>>> GENPD core assumes that at a first rpm-resume of a consumer device, its
> >>>> genpd_performance=0. Not true for Tegra because h/w of the device is
> >>>> preconfigured to a non-zero perf level initially, h/w may not support
> >>>> zero level at all.
> >>>
> >>> I think you may be misunderstanding genpd's behaviour around this, but
> >>> let me elaborate.
> >>>
> >>> In genpd_runtime_resume(), we try to restore the performance state for
> >>> the device that genpd_runtime_suspend() *may* have dropped earlier.
> >>> That means, if genpd_runtime_resume() is called prior
> >>> genpd_runtime_suspend() for the first time, it means that
> >>> genpd_runtime_resume() will *not* restore a performance state, but
> >>> instead just leave the performance state as is for the device (see
> >>> genpd_restore_performance_state()).
> >>>
> >>> In other words, a consumer driver may use the following sequence to
> >>> set an initial performance state for the device during ->probe():
> >>>
> >>> ...
> >>> rate = clk_get_rate()
> >>> dev_pm_opp_set_rate(rate)
> >>>
> >>> pm_runtime_enable()
> >>> pm_runtime_resume_and_get()
> >>> ...
> >>>
> >>> Note that, it's the consumer driver's responsibility to manage device
> >>> specific resources, in its ->runtime_suspend|resume() callbacks.
> >>> Typically that means dealing with clock gating/ungating, for example.
> >>>
> >>> In the other scenario where a consumer driver prefers to *not* call
> >>> pm_runtime_resume_and_get() in its ->probe(), because it doesn't need
> >>> to power on the device to complete probing, then we don't want to vote
> >>> for an OPP at all - and we also want the performance state for the
> >>> device in genpd to be set to zero. Correct?
> >>
> >> Yes
> >>
> >>> Is this the main problem you are trying to solve, because I think this
> >>> doesn't work out of the box as of today?
> >>
> >> The main problem is that the restored performance state is zero for the
> >> first genpd_runtime_resume(), while it's not zero from the h/w perspective.
> >
> > This should not be a problem, but can be handled by the consumer driver.
> >
> > genpd_runtime_resume() calls genpd_restore_performance_state() to
> > restore a performance state for the device. However, in the scenario
> > you describe, "gpd_data->rpm_pstate" is zero, which makes
> > genpd_restore_performance_state() to just leave the device's
> > performance state as is - it will *not* restore the performance state
> > to zero.
> >
> > To make the consumer driver deal with this, it would need to call
> > dev_pm_opp_set_rate() from within its ->runtime_resume() callback.
> >
> >>
> >>> There is another concern though, but perhaps it's not a problem after
> >>> all. Viresh told us that dev_pm_opp_set_rate() may turn on resources
> >>> like clock/regulators. That could certainly be problematic, in
> >>> particular if the device and its genpd have OPP tables associated with
> >>> it and the consumer driver wants to follow the above sequence in
> >>> probe.
> >>
> >> dev_pm_opp_set_rate() won't enable clocks and regulators, but it may
> >> change the clock rate and voltage. This is also platform/driver specific
> >> because it's up to OPP user how to configure OPP table. On Tegra we only
> >> assign clock to OPP table, regulators are unused.
> >>
> >>> Viresh, can you please chime in here and elaborate on some of the
> >>> magic happening behind dev_pm_opp_set_rate() API - is there a problem
> >>> here or not?
> >>>
> >>>>
> >>>> GENPD core assumes that consumer devices can work at any performance
> >>>> level. Not true for Tegra because voltage needs to be set in accordance
> >>>> to the clock rate before clock is enabled, otherwise h/w won't work
> >>>> properly, perhaps clock may be unstable or h/w won't be latching.
> >>>
> >>> Correct. Genpd relies on the callers to use the OPP framework if there
> >>> are constraints like you describe above.
> >>>
> >>> That said, it's not forbidden for a consumer driver to call
> >>> dev_pm_genpd_set_performance_state() directly, but then it better
> >>> knows exactly what it's doing.
> >>>
> >>>>
> >>>> Performance level should be set to 0 while device is suspended.
> >>>
> >>> Do you mean system suspend or runtime suspend? Or both?
> >>
> >> Runtime suspend.
> >
> > Alright. So that's already taken care of for us in genpd_runtime_suspend().
> >
> > Or perhaps you have discovered some problem with this?
> >
> >>
> >>>> Performance level needs to be bumped on rpm-resume of a device in
> >>>> accordance to h/w state before hardware is enabled.
> >>>
> >>> Assuming there was a performance state set for the device when
> >>> genpd_runtime_suspend() was called, genpd_runtime_resume() will
> >>> restore that state according to the sequence you described.
> >>
> >> What do you think about adding API that will allow drivers to explicitly
> >> set the restored performance state of a power domain?
> >>
> >> Another option could be to change the GENPD core, making it to set the
> >> rpm_pstate when dev_pm_genpd_set_performance_state(dev) is invoked and
> >> device is rpm-suspended, instead of calling the
> >> genpd->set_performance_state callback.
> >>
> >> Then drivers will be able to sync the perf state at a probe time.
> >>
> >> What do you think?
> >
> > I don't think it's needed, see my reply earlier above. However your
> > change touches another problem though, see below.
> >
> >>
> >> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> >> index a934c679e6ce..cc15ab9eacc9 100644
> >> --- a/drivers/base/power/domain.c
> >> +++ b/drivers/base/power/domain.c
> >> @@ -435,7 +435,7 @@ static void genpd_restore_performance_state(struct
> >> device *dev,
> >>  int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int
> >> state)
> >>  {
> >>         struct generic_pm_domain *genpd;
> >> -       int ret;
> >> +       int ret = 0;
> >>
> >>         genpd = dev_to_genpd_safe(dev);
> >>         if (!genpd)
> >> @@ -446,7 +446,10 @@ int dev_pm_genpd_set_performance_state(struct
> >> device *dev, unsigned int state)
> >>                 return -EINVAL;
> >>
> >>         genpd_lock(genpd);
> >> -       ret = genpd_set_performance_state(dev, state);
> >> +       if (pm_runtime_suspended(dev))
> >> +               dev_gpd_data(dev)->rpm_pstate = state;
> >> +       else
> >> +               ret = genpd_set_performance_state(dev, state);
> >>         genpd_unlock(genpd);
> >
> > This doesn't work for all cases. For example, when a consumer driver
> > deploys runtime PM support in its ->probe() according to the below
> > sequence:
> >
> > ...
> > dev_pm_opp_set_rate(rate)
> > pm_runtime_get_noresume()
> > pm_runtime_set_active()
> > pm_runtime_enable()
> > ...
> > pm_runtime_put()
> > ...
> >
> > We need to call genpd_set_performance_state() independently of whether
> > the device is runtime suspended or not.
>
> I don't see where is the problem in yours example.
>
> pm_runtime_suspended() = false while RPM is disabled. When device is
> resumed, the rpm_pstate=0, so it won't change the pstate on resume.

Yes, you are certainly correct, my bad! I mixed it up with
pm_runtime_status_suspended(), which only cares about the status.

So, after a second thought, your suggestion sounds very much
reasonable to me! I have also tried to consider all different
scenarios, including the system suspend/resume path, but I think it
should be fine.

I also think that a patch like the above should be considered as a
fix, because it actually fixes a problem, according to what I said in
my earlier reply, below.

Fixes : 5937c3ce2122 ("PM: domains: Drop/restore performance state
votes for devices at runtime PM").

>
> > Although, it actually seems like good idea to update
> > dev_gpd_data(dev)->rpm_pstate = state here, as to make sure
> > genpd_runtime_resume() doesn't restore an old/invalid value that was
> > saved while dropping the performance state vote for the device in
> > genpd_runtime_suspend() earlier.
> >
> > Let me send a patch for this shortly, to close this window of a possible error.
>
> It will also remove the need to resume device just to change the clock
> rate, like I needed to do it in the PWM patch of this series.

Do you want to send the patch formally? Or do you prefer it if I do it?

Kind regards
Uffe

Thierry Reding Aug. 23, 2021, 2:33 p.m. UTC | #56

On Sat, Aug 21, 2021 at 08:45:54PM +0300, Dmitry Osipenko wrote:
> 20.08.2021 16:08, Ulf Hansson пишет:
> ...
> >> I suppose if there's really no good way of doing this other than
> >> providing a struct device, then so be it. I think the cleaned up sysfs
> >> shown in the summary above looks much better than what the original
> >> would've looked like.
> >>
> >> Perhaps an additional tweak to that would be to not create platform
> >> devices. Instead, just create struct device. Those really have
> >> everything you need (.of_node, and can be used with RPM and GENPD). As I
> >> mentioned earlier, platform device implies a CPU-memory-mapped bus,
> >> which this clearly isn't. It's kind of a separate "bus" if you want, so
> >> just using struct device directly seems more appropriate.
> > 
> > Just a heads up. If you don't use a platform device or have a driver
> > associated with it for probing, you need to manage the attachment to
> > genpd yourself. That means calling one of the dev_pm_domain_attach*()
> > APIs, but that's perfectly fine, ofcourse.
> > 
> >>
> >> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
> >> for an example of how that was done. I think you can do something
> >> similar here.
> 
> We need a platform device because we have a platform device driver that
> must be bound to the device, otherwise PMC driver state won't be synced
> since it it's synced after all drivers of devices that reference PMC
> node in DT are probed.

I think the causality is the wrong way around. It's more likely that you
added the platform driver because you have a platform device that you
want to bind against.

You can have drivers bind to other types of devices, although it's a bit
more work than abusing platform devices for it.

There's the "auxiliary" bus that seems like it would be a somewhat
better fit (see Documentation/driver-api/auxiliary_bus.rst), though it
doesn't look like this fits the purpose exactly. I think a custom bus
(or perhaps something that could be deployed more broadly across CCF)
would be more appropriate.

Looking around, it seems like clk/imx and clk/samsung abuse the platform
bus in a similar way, so they would benefit from a "clk" bus as well.

Thierry

Dmitry Osipenko Aug. 23, 2021, 3:54 p.m. UTC | #57

23.08.2021 13:46, Ulf Hansson пишет:
>>> ...
>>> dev_pm_opp_set_rate(rate)
>>> pm_runtime_get_noresume()
>>> pm_runtime_set_active()
>>> pm_runtime_enable()
>>> ...
>>> pm_runtime_put()
>>> ...
>>>
>>> We need to call genpd_set_performance_state() independently of whether
>>> the device is runtime suspended or not.
>>
>> I don't see where is the problem in yours example.
>>
>> pm_runtime_suspended() = false while RPM is disabled. When device is
>> resumed, the rpm_pstate=0, so it won't change the pstate on resume.
> 
> Yes, you are certainly correct, my bad! I mixed it up with
> pm_runtime_status_suspended(), which only cares about the status.
> 
> So, after a second thought, your suggestion sounds very much
> reasonable to me! I have also tried to consider all different
> scenarios, including the system suspend/resume path, but I think it
> should be fine.

It could be improved slightly to cover more cases.

> I also think that a patch like the above should be considered as a
> fix, because it actually fixes a problem, according to what I said in
> my earlier reply, below.
> 
> Fixes : 5937c3ce2122 ("PM: domains: Drop/restore performance state
> votes for devices at runtime PM").
> 
>>
>>> Although, it actually seems like good idea to update
>>> dev_gpd_data(dev)->rpm_pstate = state here, as to make sure
>>> genpd_runtime_resume() doesn't restore an old/invalid value that was
>>> saved while dropping the performance state vote for the device in
>>> genpd_runtime_suspend() earlier.
>>>
>>> Let me send a patch for this shortly, to close this window of a possible error.
>>
>> It will also remove the need to resume device just to change the clock
>> rate, like I needed to do it in the PWM patch of this series.
> 
> Do you want to send the patch formally? Or do you prefer it if I do it?

I'll send the patch.

Dmitry Osipenko Aug. 23, 2021, 6:54 p.m. UTC | #58

23.08.2021 17:33, Thierry Reding пишет:
> On Sat, Aug 21, 2021 at 08:45:54PM +0300, Dmitry Osipenko wrote:
>> 20.08.2021 16:08, Ulf Hansson пишет:
>> ...
>>>> I suppose if there's really no good way of doing this other than
>>>> providing a struct device, then so be it. I think the cleaned up sysfs
>>>> shown in the summary above looks much better than what the original
>>>> would've looked like.
>>>>
>>>> Perhaps an additional tweak to that would be to not create platform
>>>> devices. Instead, just create struct device. Those really have
>>>> everything you need (.of_node, and can be used with RPM and GENPD). As I
>>>> mentioned earlier, platform device implies a CPU-memory-mapped bus,
>>>> which this clearly isn't. It's kind of a separate "bus" if you want, so
>>>> just using struct device directly seems more appropriate.
>>>
>>> Just a heads up. If you don't use a platform device or have a driver
>>> associated with it for probing, you need to manage the attachment to
>>> genpd yourself. That means calling one of the dev_pm_domain_attach*()
>>> APIs, but that's perfectly fine, ofcourse.
>>>
>>>>
>>>> We did something similar for XUSB pads, see drivers/phy/tegra/xusb.[ch]
>>>> for an example of how that was done. I think you can do something
>>>> similar here.
>>
>> We need a platform device because we have a platform device driver that
>> must be bound to the device, otherwise PMC driver state won't be synced
>> since it it's synced after all drivers of devices that reference PMC
>> node in DT are probed.
> 
> I think the causality is the wrong way around. It's more likely that you
> added the platform driver because you have a platform device that you
> want to bind against.
> 
> You can have drivers bind to other types of devices, although it's a bit
> more work than abusing platform devices for it.
> 
> There's the "auxiliary" bus that seems like it would be a somewhat
> better fit (see Documentation/driver-api/auxiliary_bus.rst), though it
> doesn't look like this fits the purpose exactly. I think a custom bus
> (or perhaps something that could be deployed more broadly across CCF)
> would be more appropriate.
> 
> Looking around, it seems like clk/imx and clk/samsung abuse the platform
> bus in a similar way, so they would benefit from a "clk" bus as well.

It may be nice to have a dedicated clk bus, but this is too much effort
for nearly nothing in our case. It shouldn't be a problem to convert
drivers to use clk bus once it will be implemented. It shouldn't be a
part of this series, IMO.

Dmitry Osipenko Aug. 23, 2021, 8:24 p.m. UTC | #59

20.08.2021 15:57, Ulf Hansson пишет:
...
>> We already have similar APIs, so that won't be a problem. We also have
>> a mechanism inside the OPP core, frequency based, which is used to
>> guess the current OPP. Maybe we can enhance and use that directly
>> here.
> 
> After reading the last reply from Dmitry, I am starting to think that
> the problem he is facing can be described and solved in a much easier
> way.
> 
> If I am correct, it looks like we don't need to add APIs to get OPPs
> for a clock rate or set initial performance state values according to
> the HW in genpd.
> 
> See my other response to Dmitry, let's see where that leads us.

I'm going to start preparing v9 with GENPD performance state syncing moved into driver's probe where appropriate.

It's not clear to me whether it will be okay to add a generic OPP syncing by clock rate or should it be a Tegra-specific helper. Viresh, what do you think about this generic OPP helper:

/**
 * dev_pm_opp_sync_with_clk_rate() - Sync OPP state with clock rate
 * @dev:	device for which we do this operation
 *
 * Sync OPP table state with the current clock rate of device.
 *
 * Return: 0 on success or a negative error value.
 */
int dev_pm_opp_sync_with_clk_rate(struct device *dev)
{
	struct opp_table *opp_table;
	int ret = 0;

	/* Device may not have OPP table */
	opp_table = _find_opp_table(dev);
	if (IS_ERR(opp_table))
		return 0;

	/* Device may not use clock */
	if (IS_ERR(opp_table->clk))
		goto put_table;

	/* Device may have empty OPP table */
	if (!_get_opp_count(opp_table))
		goto put_table;

	ret = dev_pm_opp_set_rate(dev, clk_get_rate(opp_table->clk));
put_table:
	/* Drop reference taken by _find_opp_table() */
	dev_pm_opp_put_opp_table(opp_table);

	return ret;
}
EXPORT_SYMBOL_GPL(dev_pm_opp_sync_with_clk_rate);

Viresh Kumar Aug. 24, 2021, 3:04 a.m. UTC | #60

On 23-08-21, 23:24, Dmitry Osipenko wrote:
> It's not clear to me whether it will be okay to add a generic OPP syncing by clock rate or should it be a Tegra-specific helper. Viresh, what do you think about this generic OPP helper:
> 
> /**
>  * dev_pm_opp_sync_with_clk_rate() - Sync OPP state with clock rate
>  * @dev:	device for which we do this operation
>  *
>  * Sync OPP table state with the current clock rate of device.
>  *
>  * Return: 0 on success or a negative error value.
>  */
> int dev_pm_opp_sync_with_clk_rate(struct device *dev)
> {
> 	struct opp_table *opp_table;
> 	int ret = 0;
> 
> 	/* Device may not have OPP table */
> 	opp_table = _find_opp_table(dev);
> 	if (IS_ERR(opp_table))
> 		return 0;
> 
> 	/* Device may not use clock */
> 	if (IS_ERR(opp_table->clk))
> 		goto put_table;
> 
> 	/* Device may have empty OPP table */
> 	if (!_get_opp_count(opp_table))
> 		goto put_table;
> 
> 	ret = dev_pm_opp_set_rate(dev, clk_get_rate(opp_table->clk));
> put_table:
> 	/* Drop reference taken by _find_opp_table() */
> 	dev_pm_opp_put_opp_table(opp_table);
> 
> 	return ret;
> }
> EXPORT_SYMBOL_GPL(dev_pm_opp_sync_with_clk_rate);

I am not sure why you still need this, hope we were going another way
? Anyway I will have a look at what you have posted now.

Dmitry Osipenko Aug. 25, 2021, 9:45 a.m. UTC | #61

20.08.2021 14:35, Thierry Reding пишет:
> On Fri, Aug 20, 2021 at 01:37:13AM +0300, Dmitry Osipenko wrote:
>> 19.08.2021 20:03, Thierry Reding пишет:
>>> On Tue, Aug 17, 2021 at 04:27:40AM +0300, Dmitry Osipenko wrote:
>>>> The SDHCI on Tegra belongs to the core power domain and we're going to
>>>> enable GENPD support for the core domain. Now SDHCI must be resumed using
>>>> runtime PM API in order to initialize the SDHCI power state. The SDHCI
>>>> clock rate must be changed using OPP API that will reconfigure the power
>>>> domain performance state in accordance to the rate. Add runtime PM and OPP
>>>> support to the SDHCI driver.
>>>>
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>  drivers/mmc/host/sdhci-tegra.c | 146 ++++++++++++++++++++++++---------
>>>>  1 file changed, 105 insertions(+), 41 deletions(-)
>>>>
>>>> diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
>>>> index 387ce9cdbd7c..a3583359c972 100644
>>>> --- a/drivers/mmc/host/sdhci-tegra.c
>>>> +++ b/drivers/mmc/host/sdhci-tegra.c
>>>> @@ -15,6 +15,8 @@
>>>>  #include <linux/of.h>
>>>>  #include <linux/of_device.h>
>>>>  #include <linux/pinctrl/consumer.h>
>>>> +#include <linux/pm_opp.h>
>>>> +#include <linux/pm_runtime.h>
>>>>  #include <linux/regulator/consumer.h>
>>>>  #include <linux/reset.h>
>>>>  #include <linux/mmc/card.h>
>>>> @@ -24,6 +26,8 @@
>>>>  #include <linux/gpio/consumer.h>
>>>>  #include <linux/ktime.h>
>>>>  
>>>> +#include <soc/tegra/common.h>
>>>> +
>>>>  #include "sdhci-pltfm.h"
>>>>  #include "cqhci.h"
>>>>  
>>>> @@ -123,6 +127,12 @@
>>>>  					 SDHCI_TRNS_BLK_CNT_EN | \
>>>>  					 SDHCI_TRNS_DMA)
>>>>  
>>>> +enum {
>>>> +	TEGRA_CLK_BULK_SDHCI,
>>>> +	TEGRA_CLK_BULK_TMCLK,
>>>> +	TEGRA_CLK_BULK_NUM,
>>>> +};
>>>> +
>>>>  struct sdhci_tegra_soc_data {
>>>>  	const struct sdhci_pltfm_data *pdata;
>>>>  	u64 dma_mask;
>>>> @@ -171,6 +181,8 @@ struct sdhci_tegra {
>>>>  	bool enable_hwcq;
>>>>  	unsigned long curr_clk_rate;
>>>>  	u8 tuned_tap_delay;
>>>> +
>>>> +	struct clk_bulk_data clocks[TEGRA_CLK_BULK_NUM];
>>>
>>> This doesn't seem worth it to me. There's a lot of churn in this driver
>>> that's only needed to convert this to the clk_bulk API and it makes the
>>> code a lot more difficult to read, in my opinion.
>>>
>>> It looks like the only benefit that this gives us is that runtime
>>> suspend and resume become a few lines shorter.
>>
>> The driver probe code looks cleaner with that. You should be looking at
>> the final result and not at the patch to see it.
> 
> I did look at the final result and didn't find it cleaner at all. =)

There is UAF bug in this patch that was spotted by kasan. The
sdhci_tegra_soc_data isn't resource-managed, but clk_bulk_data is. I'm
now thinking that it should be okay to keep tmclk always-on, so I'll
replace the bulk clks back with the sdhci clk in v9.

Viresh Kumar Aug. 26, 2021, 2:54 a.m. UTC | #62

On 25-08-21, 18:41, Dmitry Osipenko wrote:
> Thinking a bit more about this, I got a nicer variant which actually works in all cases for Tegra.
> 
> Viresh / Ulf, what do you think about this:

This is what I have been suggesting from day 1 :)

https://lore.kernel.org/linux-staging/20210818055849.ybfajzu75ecpdrbn@vireshk-i7/

 "
  And if it is all about just syncing the genpd core, then can the
  genpd core do something like what clk framework does? i.e. allow a
  new optional genpd callback, get_performance_state() (just like
  set_performance_state()), which can be called initially by the core
  to get the performance to something other than zero.
 "

Looks good to me :)

Viresh Kumar Aug. 26, 2021, 2:55 a.m. UTC | #63

On 26-08-21, 08:24, Viresh Kumar wrote:
> On 25-08-21, 18:41, Dmitry Osipenko wrote:
> > Thinking a bit more about this, I got a nicer variant which actually works in all cases for Tegra.
> > 
> > Viresh / Ulf, what do you think about this:
> 
> This is what I have been suggesting from day 1 :)
> 
> https://lore.kernel.org/linux-staging/20210818055849.ybfajzu75ecpdrbn@vireshk-i7/
> 
>  "
>   And if it is all about just syncing the genpd core, then can the
>   genpd core do something like what clk framework does? i.e. allow a
>   new optional genpd callback, get_performance_state() (just like
>   set_performance_state()), which can be called initially by the core
>   to get the performance to something other than zero.
>  "
> 
> Looks good to me :)

When you refresh this stuff, please send only 3-4 patches to update
the core stuff and show an example. Once we finalize with the
interface, you can update all the users. Else this is just noise for
everyone else.

[v8,00/34] NVIDIA Tegra power management patches for 5.16

Message

Comments