mbox series

[v13,00/35] NVIDIA Tegra power management patches for 5.16

Message ID 20210926224058.1252-1-digetx@gmail.com
Headers show
Series NVIDIA Tegra power management patches for 5.16 | expand

Message

Dmitry Osipenko Sept. 26, 2021, 10:40 p.m. UTC
This series adds runtime PM support to Tegra drivers and enables core
voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.

All patches in this series are interdependent and should go via Tegra tree.

Changelog:

v13: - Fixed compile-test error reported by build bot by reverting the
       mmc/ patch to v11. The sdhci_suspend/resume_host() functions aren't
       available with the disabled CONFIG_PM_SLEEP, some code needs the
       ifdef.

     - Added last r-b from Rob Herring for the DT patches.

     - Corrected clk/ PM domain-support patch by not using the
       devm_tegra_core_dev_init_opp_table_common() helper, which I
       utilized in v12. The clk driver implements its own power domain
       state syncing and common helper shouldn't be used. This fixes driver
       probing for some clocks on some devices. It was reported by
       Svyatoslav Ryhel for PLLE OPP error on T30 Asus Transformer tablet.

v12: - Added r-b from Rob Herring to the host1x binding patch.

     - Added acks from Hans Verkuil to the video decoder patches.

     - In the v11 changelog I forgot to mention that the clk-binding
       patch was also changed with a corrected regex pattern and removed
       'clocks' sub-node. This patch needs r-b or ack too.

     - Added new "Rename 3d power domains" patch to match the DT schema
       naming requirement. Thanks to David Heidelberg for spotting this
       problem.

     - Replaced #ifdef CONFIG_PM_SLEEP with maybe_unused in the MMC patch
       to make code cleaner.

v11: - Added acks and r-b from Rob Herring, Mark Brown and Miquel Raynal
       that were given to v8.

     - Corrected order of the new memory controller reset entry in
       device-trees and host1x DT binding patch, which was requested by
       Rob Herring.

     - Switched consumer drivers to use power domain state syncing done
       by new Tegra's common OPP-initialization helper.

     - Made use of new devm_pm_runtime_enable() helper that was added to
       v5.15 kernel, where appropriate.

     - Added "fuse: Use resource-managed helpers" patch.

     - Converted Tegra20/30 clk drivers to a proper platform drivers,
       which was requested by Thierry Reding.

     - Removed clk-bulk API usage from the MMC patch, which was requested
       by Thierry Reding.

     - Changed CORE power domain name to "core" in a new patch
       "Change name of core power domain".

     - Misc small fixes for problems that I found since v8, like couple
       typos in error code paths and restored working RPM for Tegra DRM
       UAPI v1 that was removed in v8 by accident.

v9-v10: Figured out remaining GENPD API changes with Ulf Hansson and
        Viresh Kumar. The OPP-sync helper that was used in v8 isn't needed
        anymore because GENPD API now allows consumer drivers to
        init rpm_pstate of power domains.

v8: - Added new generic dev_pm_opp_sync() helper that syncs OPP state with
      hardware. All drivers changed to use it. This replaces GENPD attach_dev
      callback hacks that were used in v7.

    - Added new patch patch "soc/tegra: regulators: Prepare for suspend"
      that fixes dying Tegra20 SoC after enabling VENC power domain during
      resume from suspend. It matches to what downstream kernel does on
      suspend/resume.

    - After a second thought, I dropped patches which added RPM to memory
      drivers since hardware is always-on and RPM not needed.

    - Replaced the "dummy host1x driver" patch with new "Disable unused
      host1x hardware" patch, since it's a cleaner solution.

Dmitry Osipenko (35):
  opp: Change type of dev_pm_opp_attach_genpd(names) argument
  soc/tegra: Add devm_tegra_core_dev_init_opp_table_common()
  soc/tegra: pmc: Disable PMC state syncing
  soc/tegra: Don't print error message when OPPs not available
  dt-bindings: clock: tegra-car: Document new clock sub-nodes
  clk: tegra: Support runtime PM and power domain
  dt-bindings: host1x: Document OPP and power domain properties
  dt-bindings: host1x: Document Memory Client resets of Host1x, GR2D and
    GR3D
  gpu: host1x: Add runtime PM and OPP support
  gpu: host1x: Add host1x_channel_stop()
  drm/tegra: dc: Support OPP and SoC core voltage scaling
  drm/tegra: hdmi: Add OPP support
  drm/tegra: gr2d: Support generic power domain and runtime PM
  drm/tegra: gr3d: Support generic power domain and runtime PM
  drm/tegra: vic: Support system suspend
  usb: chipidea: tegra: Add runtime PM and OPP support
  bus: tegra-gmi: Add runtime PM and OPP support
  pwm: tegra: Add runtime PM and OPP support
  mmc: sdhci-tegra: Add runtime PM and OPP support
  mtd: rawnand: tegra: Add runtime PM and OPP support
  spi: tegra20-slink: Add OPP support
  media: dt: bindings: tegra-vde: Convert to schema
  media: dt: bindings: tegra-vde: Document OPP and power domain
  media: staging: tegra-vde: Support generic power domain
  soc/tegra: fuse: Reset hardware
  soc/tegra: fuse: Use resource-managed helpers
  soc/tegra: regulators: Prepare for suspend
  soc/tegra: pmc: Rename 3d power domains
  soc/tegra: pmc: Rename core power domain
  soc/tegra: pmc: Enable core domain support for Tegra20 and Tegra30
  ARM: tegra: Add OPP tables and power domains to Tegra20 device-trees
  ARM: tegra: Add OPP tables and power domains to Tegra30 device-trees
  ARM: tegra: Add Memory Client resets to Tegra20 GR2D, GR3D and Host1x
  ARM: tegra: Add Memory Client resets to Tegra30 GR2D, GR3D and Host1x
  ARM: tegra20/30: Disable unused host1x hardware

 .../bindings/clock/nvidia,tegra20-car.yaml    |   37 +
 .../display/tegra/nvidia,tegra20-host1x.txt   |   53 +
 .../bindings/media/nvidia,tegra-vde.txt       |   64 -
 .../bindings/media/nvidia,tegra-vde.yaml      |  119 ++
 .../boot/dts/tegra20-acer-a500-picasso.dts    |    1 +
 arch/arm/boot/dts/tegra20-colibri.dtsi        |    3 +-
 arch/arm/boot/dts/tegra20-harmony.dts         |    3 +-
 arch/arm/boot/dts/tegra20-paz00.dts           |    1 +
 .../arm/boot/dts/tegra20-peripherals-opp.dtsi |  941 +++++++++++
 arch/arm/boot/dts/tegra20-seaboard.dts        |    3 +-
 arch/arm/boot/dts/tegra20-tamonten.dtsi       |    3 +-
 arch/arm/boot/dts/tegra20-trimslice.dts       |    9 +
 arch/arm/boot/dts/tegra20-ventana.dts         |    1 +
 arch/arm/boot/dts/tegra20.dtsi                |  116 +-
 .../tegra30-asus-nexus7-grouper-common.dtsi   |    1 +
 arch/arm/boot/dts/tegra30-beaver.dts          |    1 +
 arch/arm/boot/dts/tegra30-cardhu.dtsi         |    1 +
 arch/arm/boot/dts/tegra30-colibri.dtsi        |   17 +-
 arch/arm/boot/dts/tegra30-ouya.dts            |    1 +
 .../arm/boot/dts/tegra30-peripherals-opp.dtsi | 1412 +++++++++++++++++
 arch/arm/boot/dts/tegra30.dtsi                |  175 +-
 drivers/bus/tegra-gmi.c                       |   52 +-
 drivers/clk/tegra/Makefile                    |    1 +
 drivers/clk/tegra/clk-device.c                |  230 +++
 drivers/clk/tegra/clk-pll.c                   |    2 +-
 drivers/clk/tegra/clk-super.c                 |    2 +-
 drivers/clk/tegra/clk-tegra20.c               |   77 +-
 drivers/clk/tegra/clk-tegra30.c               |  116 +-
 drivers/clk/tegra/clk.c                       |   75 +-
 drivers/clk/tegra/clk.h                       |    2 +
 drivers/gpu/drm/tegra/dc.c                    |   74 +
 drivers/gpu/drm/tegra/dc.h                    |    2 +
 drivers/gpu/drm/tegra/gr2d.c                  |  155 +-
 drivers/gpu/drm/tegra/gr3d.c                  |  388 ++++-
 drivers/gpu/drm/tegra/hdmi.c                  |   16 +-
 drivers/gpu/drm/tegra/vic.c                   |    4 +
 drivers/gpu/host1x/channel.c                  |    8 +
 drivers/gpu/host1x/debug.c                    |   15 +
 drivers/gpu/host1x/dev.c                      |  151 +-
 drivers/gpu/host1x/dev.h                      |    3 +-
 drivers/gpu/host1x/hw/channel_hw.c            |   44 +-
 drivers/gpu/host1x/intr.c                     |    3 -
 drivers/gpu/host1x/syncpt.c                   |    5 +-
 drivers/mmc/host/sdhci-tegra.c                |   82 +-
 drivers/mtd/nand/raw/tegra_nand.c             |   55 +-
 drivers/opp/core.c                            |    6 +-
 drivers/pwm/pwm-tegra.c                       |   88 +-
 drivers/soc/tegra/common.c                    |    4 +-
 drivers/soc/tegra/fuse/fuse-tegra.c           |   51 +-
 drivers/soc/tegra/fuse/fuse-tegra20.c         |   33 +-
 drivers/soc/tegra/fuse/fuse.h                 |    1 +
 drivers/soc/tegra/pmc.c                       |   27 +-
 drivers/soc/tegra/regulators-tegra20.c        |   99 ++
 drivers/soc/tegra/regulators-tegra30.c        |  122 ++
 drivers/spi/spi-tegra20-slink.c               |   10 +-
 drivers/staging/media/tegra-vde/vde.c         |   57 +-
 drivers/usb/chipidea/ci_hdrc_tegra.c          |   53 +-
 include/linux/host1x.h                        |    1 +
 include/linux/pm_opp.h                        |    8 +-
 include/soc/tegra/common.h                    |   24 +
 60 files changed, 4751 insertions(+), 357 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/media/nvidia,tegra-vde.txt
 create mode 100644 Documentation/devicetree/bindings/media/nvidia,tegra-vde.yaml
 create mode 100644 drivers/clk/tegra/clk-device.c

Comments

Dmitry Osipenko Sept. 30, 2021, 12:40 a.m. UTC | #1
27.09.2021 01:40, Dmitry Osipenko пишет:
> The Tegra USB controller belongs to the core power domain and we're going
> to enable GENPD support for the core domain. Now USB controller must be
> resumed using runtime PM API in order to initialize the USB power state.
> We already support runtime PM for the CI device, but CI's PM is separated
> from the RPM managed by tegra-usb driver. Add runtime PM and OPP support
> to the driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/usb/chipidea/ci_hdrc_tegra.c | 53 ++++++++++++++++++++++++----
>  1 file changed, 46 insertions(+), 7 deletions(-)

Peter Chen, could you please ack this patch? Thanks in advance!
Peter Chen Sept. 30, 2021, 2:06 p.m. UTC | #2
On 21-09-27 01:40:39, Dmitry Osipenko wrote:
> The Tegra USB controller belongs to the core power domain and we're going
> to enable GENPD support for the core domain. Now USB controller must be
> resumed using runtime PM API in order to initialize the USB power state.
> We already support runtime PM for the CI device, but CI's PM is separated
> from the RPM managed by tegra-usb driver. Add runtime PM and OPP support
> to the driver.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/usb/chipidea/ci_hdrc_tegra.c | 53 ++++++++++++++++++++++++----
>  1 file changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/usb/chipidea/ci_hdrc_tegra.c b/drivers/usb/chipidea/ci_hdrc_tegra.c
> index 60361141ac04..3142ef7ebe42 100644
> --- a/drivers/usb/chipidea/ci_hdrc_tegra.c
> +++ b/drivers/usb/chipidea/ci_hdrc_tegra.c
> @@ -7,6 +7,7 @@
>  #include <linux/io.h>
>  #include <linux/module.h>
>  #include <linux/of_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>  
>  #include <linux/usb.h>
> @@ -15,6 +16,8 @@
>  #include <linux/usb/of.h>
>  #include <linux/usb/phy.h>
>  
> +#include <soc/tegra/common.h>
> +
>  #include "../host/ehci.h"
>  
>  #include "ci.h"
> @@ -278,6 +281,8 @@ static int tegra_usb_probe(struct platform_device *pdev)
>  	if (!usb)
>  		return -ENOMEM;
>  
> +	platform_set_drvdata(pdev, usb);
> +
>  	soc = of_device_get_match_data(&pdev->dev);
>  	if (!soc) {
>  		dev_err(&pdev->dev, "failed to match OF data\n");
> @@ -296,11 +301,17 @@ static int tegra_usb_probe(struct platform_device *pdev)
>  		return err;
>  	}
>  
> -	err = clk_prepare_enable(usb->clk);
> -	if (err < 0) {
> -		dev_err(&pdev->dev, "failed to enable clock: %d\n", err);
> +	err = devm_pm_runtime_enable(&pdev->dev);
> +	if (err)
> +		return err;
> +
> +	err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> +	if (err)
> +		return err;
> +
> +	err = pm_runtime_resume_and_get(&pdev->dev);
> +	if (err)
>  		return err;
> -	}
>  
>  	if (device_property_present(&pdev->dev, "nvidia,needs-double-reset"))
>  		usb->needs_double_reset = true;
> @@ -320,8 +331,6 @@ static int tegra_usb_probe(struct platform_device *pdev)
>  	if (err)
>  		goto fail_power_off;
>  
> -	platform_set_drvdata(pdev, usb);
> -
>  	/* setup and register ChipIdea HDRC device */
>  	usb->soc = soc;
>  	usb->data.name = "tegra-usb";
> @@ -350,7 +359,8 @@ static int tegra_usb_probe(struct platform_device *pdev)
>  phy_shutdown:
>  	usb_phy_shutdown(usb->phy);
>  fail_power_off:
> -	clk_disable_unprepare(usb->clk);
> +	pm_runtime_put(&pdev->dev);
> +
>  	return err;
>  }
>  
> @@ -360,15 +370,44 @@ static int tegra_usb_remove(struct platform_device *pdev)
>  
>  	ci_hdrc_remove_device(usb->dev);
>  	usb_phy_shutdown(usb->phy);
> +	pm_runtime_put(&pdev->dev);
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused tegra_usb_runtime_resume(struct device *dev)
> +{
> +	struct tegra_usb *usb = dev_get_drvdata(dev);
> +	int err;
> +
> +	err = clk_prepare_enable(usb->clk);
> +	if (err < 0) {
> +		dev_err(dev, "failed to enable clock: %d\n", err);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused tegra_usb_runtime_suspend(struct device *dev)
> +{
> +	struct tegra_usb *usb = dev_get_drvdata(dev);
> +
>  	clk_disable_unprepare(usb->clk);
>  
>  	return 0;
>  }
>  
> +static const struct dev_pm_ops tegra_usb_pm = {
> +	SET_RUNTIME_PM_OPS(tegra_usb_runtime_suspend, tegra_usb_runtime_resume,
> +			   NULL)
> +};
> +
>  static struct platform_driver tegra_usb_driver = {
>  	.driver = {
>  		.name = "tegra-usb",
>  		.of_match_table = tegra_usb_of_match,
> +		.pm = &tegra_usb_pm,
>  	},
>  	.probe = tegra_usb_probe,
>  	.remove = tegra_usb_remove,
> -- 
> 2.32.0
> 

I got below compile error if only compile this file, I think previous patches
should include the definition, if that, feel free to add my ack to this
patch.

Acked-by: Peter Chen <peter.chen@kernel.org>

drivers/usb/chipidea/ci_hdrc_tegra.c:308:8: error: implicit declaration of function ‘devm_tegra_core_dev_init_opp_table_common’;
did you mean ‘devm_tegra_core_dev_init_opp_table’? [-Werror=implicit-function-declaration]
  308 |  err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |        devm_tegra_core_dev_init_opp_table
Dmitry Osipenko Sept. 30, 2021, 2:08 p.m. UTC | #3
30.09.2021 17:06, Peter Chen пишет:
> On 21-09-27 01:40:39, Dmitry Osipenko wrote:
>> The Tegra USB controller belongs to the core power domain and we're going
>> to enable GENPD support for the core domain. Now USB controller must be
>> resumed using runtime PM API in order to initialize the USB power state.
>> We already support runtime PM for the CI device, but CI's PM is separated
>> from the RPM managed by tegra-usb driver. Add runtime PM and OPP support
>> to the driver.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/usb/chipidea/ci_hdrc_tegra.c | 53 ++++++++++++++++++++++++----
>>  1 file changed, 46 insertions(+), 7 deletions(-)
...
> 
> I got below compile error if only compile this file, I think previous patches
> should include the definition, if that, feel free to add my ack to this
> patch.
> 
> Acked-by: Peter Chen <peter.chen@kernel.org>
> 
> drivers/usb/chipidea/ci_hdrc_tegra.c:308:8: error: implicit declaration of function ‘devm_tegra_core_dev_init_opp_table_common’;
> did you mean ‘devm_tegra_core_dev_init_opp_table’? [-Werror=implicit-function-declaration]
>   308 |  err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
>       |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       |        devm_tegra_core_dev_init_opp_table

That's correct, devm_tegra_core_dev_init_opp_table_common() is added by
an earlier patch of this series. Thank you!
Ulf Hansson Oct. 1, 2021, 12:32 p.m. UTC | #4
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> The Clock-and-Reset controller resides in a core power domain on NVIDIA
> Tegra SoCs.  In order to support voltage scaling of the core power domain,
> we hook up DVFS-capable clocks to the core GENPD for managing of the
> GENPD's performance state based on the clock changes.
>
> Some clocks don't have any specific physical hardware unit that backs
> them, like root PLLs and system clock and they have theirs own voltage
> requirements.  This patch adds new clk-device driver that backs the clocks
> and provides runtime PM functionality for them.  A virtual clk-device is
> created for each such DVFS-capable clock at the clock's registration time
> by the new tegra_clk_register() helper.  Driver changes clock's device
> GENPD performance state based on clk-rate notifications.
>
> In result we have this sequence of events:
>
>   1. Clock driver creates virtual device for selective clocks, enables
>      runtime PM for the created device and registers the clock.
>   2. Clk-device driver starts to listen to clock rate changes.
>   3. Something changes clk rate or enables/disables clk.
>   4. CCF core propagates the change through the clk tree.
>   5. Clk-device driver gets clock rate-change notification or GENPD core
>      handles prepare/unprepare of the clock.
>   6. Clk-device driver changes GENPD performance state on clock rate
>      change.
>   7. GENPD driver changes voltage regulator state change.
>   8. The regulator state is committed to hardware via I2C.
>
> We rely on fact that DVFS is not needed for Tegra I2C and that Tegra I2C
> driver already keeps clock always-prepared.  Hence I2C subsystem stays
> independent from the clk power management and there are no deadlock spots
> in the sequence.
>
> Currently all clocks are registered very early during kernel boot when the
> device driver core isn't available yet.  The clk-device can't be created
> at that time.  This patch splits the registration of the clocks in two
> phases:
>
>   1. Register all essential clocks which don't use RPM and are needed
>      during early boot.
>
>   2. Register at a later boot time the rest of clocks.
>
> This patch adds power management support for Tegra20 and Tegra30 clocks.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/clk/tegra/Makefile      |   1 +
>  drivers/clk/tegra/clk-device.c  | 230 ++++++++++++++++++++++++++++++++
>  drivers/clk/tegra/clk-pll.c     |   2 +-
>  drivers/clk/tegra/clk-super.c   |   2 +-
>  drivers/clk/tegra/clk-tegra20.c |  77 ++++++++---
>  drivers/clk/tegra/clk-tegra30.c | 116 +++++++++++-----
>  drivers/clk/tegra/clk.c         |  75 ++++++++++-
>  drivers/clk/tegra/clk.h         |   2 +
>  8 files changed, 451 insertions(+), 54 deletions(-)
>  create mode 100644 drivers/clk/tegra/clk-device.c
>
> diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
> index 7b1816856eb5..a0715cdfc1a4 100644
> --- a/drivers/clk/tegra/Makefile
> +++ b/drivers/clk/tegra/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y                                  += clk.o
>  obj-y                                  += clk-audio-sync.o
> +obj-y                                  += clk-device.o
>  obj-y                                  += clk-dfll.o
>  obj-y                                  += clk-divider.o
>  obj-y                                  += clk-periph.o
> diff --git a/drivers/clk/tegra/clk-device.c b/drivers/clk/tegra/clk-device.c
> new file mode 100644
> index 000000000000..830bc0ba25d3
> --- /dev/null
> +++ b/drivers/clk/tegra/clk-device.c
> @@ -0,0 +1,230 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/clk.h>
> +#include <linux/clk-provider.h>
> +#include <linux/mutex.h>
> +#include <linux/of_device.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_opp.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/slab.h>
> +
> +#include <soc/tegra/common.h>
> +
> +#include "clk.h"
> +
> +/*
> + * This driver manages performance state of the core power domain for the
> + * independent PLLs and system clocks.  We created a virtual clock device
> + * for such clocks, see tegra_clk_dev_register().
> + */
> +
> +struct tegra_clk_device {
> +       struct notifier_block clk_nb;
> +       struct device *dev;
> +       struct clk_hw *hw;
> +       struct mutex lock;
> +};
> +
> +static int tegra_clock_set_pd_state(struct tegra_clk_device *clk_dev,
> +                                   unsigned long rate)
> +{
> +       struct device *dev = clk_dev->dev;
> +       struct dev_pm_opp *opp;
> +       unsigned int pstate;
> +
> +       opp = dev_pm_opp_find_freq_ceil(dev, &rate);
> +       if (opp == ERR_PTR(-ERANGE)) {
> +               dev_dbg(dev, "failed to find ceil OPP for %luHz\n", rate);
> +               opp = dev_pm_opp_find_freq_floor(dev, &rate);
> +       }
> +
> +       if (IS_ERR(opp)) {
> +               dev_err(dev, "failed to find OPP for %luHz: %pe\n", rate, opp);
> +               return PTR_ERR(opp);
> +       }
> +
> +       pstate = dev_pm_opp_get_required_pstate(opp, 0);
> +       dev_pm_opp_put(opp);
> +
> +       return dev_pm_genpd_set_performance_state(dev, pstate);

The above code certainly looks like it can be made generic through a
common opp helper. I know we have discussed this before, so I am not
saying you should change right now.

Let's instead see what I think (and Viresh), when I have reviewed the
entire series.

> +}
> +
> +static int tegra_clock_change_notify(struct notifier_block *nb,
> +                                    unsigned long msg, void *data)
> +{
> +       struct clk_notifier_data *cnd = data;
> +       struct tegra_clk_device *clk_dev;
> +       int err = 0;
> +
> +       clk_dev = container_of(nb, struct tegra_clk_device, clk_nb);
> +
> +       mutex_lock(&clk_dev->lock);
> +       switch (msg) {
> +       case PRE_RATE_CHANGE:
> +               if (cnd->new_rate > cnd->old_rate)
> +                       err = tegra_clock_set_pd_state(clk_dev, cnd->new_rate);
> +               break;
> +
> +       case ABORT_RATE_CHANGE:
> +               err = tegra_clock_set_pd_state(clk_dev, cnd->old_rate);
> +               break;
> +
> +       case POST_RATE_CHANGE:
> +               if (cnd->new_rate < cnd->old_rate)
> +                       err = tegra_clock_set_pd_state(clk_dev, cnd->new_rate);
> +               break;
> +
> +       default:
> +               break;
> +       }
> +       mutex_unlock(&clk_dev->lock);
> +
> +       return notifier_from_errno(err);
> +}
> +
> +static int tegra_clock_sync_pd_state(struct tegra_clk_device *clk_dev)
> +{
> +       unsigned long rate;
> +       int ret = 0;
> +
> +       mutex_lock(&clk_dev->lock);
> +
> +       if (!pm_runtime_status_suspended(clk_dev->dev)) {
> +               rate = clk_hw_get_rate(clk_dev->hw);
> +               ret = tegra_clock_set_pd_state(clk_dev, rate);

Don't we need to sync the performance state even when the device is
runtime suspended?

Perhaps the clock, via a child-clock for example, can get
prepared/enabled (hence its device gets runtime resumed) before there
is a clock rate update for it. Then there is no performance state set
for it, right? Or maybe that isn't a problem?

> +       }
> +
> +       mutex_unlock(&clk_dev->lock);
> +
> +       return ret;
> +}
> +
> +static int tegra_clock_probe(struct platform_device *pdev)
> +{
> +       struct tegra_core_opp_params opp_params = {};
> +       struct tegra_clk_device *clk_dev;
> +       struct device *dev = &pdev->dev;
> +       struct clk *clk;
> +       int err;
> +
> +       if (!dev->pm_domain)
> +               return -EINVAL;
> +
> +       clk_dev = devm_kzalloc(dev, sizeof(*clk_dev), GFP_KERNEL);
> +       if (!clk_dev)
> +               return -ENOMEM;
> +
> +       clk = devm_clk_get(dev, NULL);
> +       if (IS_ERR(clk))
> +               return PTR_ERR(clk);
> +
> +       clk_dev->dev = dev;
> +       clk_dev->hw = __clk_get_hw(clk);
> +       clk_dev->clk_nb.notifier_call = tegra_clock_change_notify;
> +       mutex_init(&clk_dev->lock);
> +
> +       platform_set_drvdata(pdev, clk_dev);
> +
> +       /*
> +        * Runtime PM was already enabled for this device by the parent clk
> +        * driver and power domain state should be synced under clk_dev lock,
> +        * hence we don't use the common OPP helper that initializes OPP
> +        * state. For some clocks common OPP helper may fail to find ceil
> +        * rate, it's handled by this driver.
> +        */
> +       err = devm_tegra_core_dev_init_opp_table(dev, &opp_params);
> +       if (err)
> +               return err;
> +
> +       err = clk_notifier_register(clk, &clk_dev->clk_nb);
> +       if (err) {
> +               dev_err(dev, "failed to register clk notifier: %d\n", err);
> +               return err;
> +       }
> +
> +       /*
> +        * The driver is attaching to a potentially active/resumed clock, hence
> +        * we need to sync the power domain performance state in a accordance to
> +        * the clock rate if clock is resumed.
> +        */
> +       err = tegra_clock_sync_pd_state(clk_dev);
> +       if (err)
> +               goto unreg_clk;
> +
> +       return 0;
> +
> +unreg_clk:
> +       clk_notifier_unregister(clk, &clk_dev->clk_nb);
> +
> +       return err;
> +}
> +
> +static __maybe_unused int tegra_clock_pm_suspend(struct device *dev)
> +{
> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
> +
> +       /*
> +        * Power management of the clock is entangled with the Tegra PMC
> +        * GENPD because PMC driver enables/disables clocks for toggling
> +        * of the PD's on/off state.
> +        *
> +        * The PMC GENPD is resumed in NOIRQ phase, before RPM of the clocks
> +        * becomes available, hence PMC can't use clocks at the early resume
> +        * phase if RPM is involved. For example when 3d clock is enabled,
> +        * it may enable the parent PLL clock that needs to be RPM-resumed.
> +        *
> +        * Secondly, the PLL clocks may be enabled by the low level suspend
> +        * code, so we need to assume that PLL is in enabled state during
> +        * suspend.
> +        *
> +        * We will keep PLLs and system clock resumed during suspend time.
> +        * All PLLs on all SoCs are low power and system clock is always-on,
> +        * so practically not much is changed here.
> +        */
> +
> +       return clk_prepare(clk_dev->hw->clk);

I am trying to understand, more exactly, what you intend to achieve
with the clk_prepare() here. It looks a bit weird, to me. Can you try
to elaborate a bit more on the use case?

Is this rather about making sure that the clock's corresponding PM
domain stays powered on during system suspend? In that case, I think
there may be an alternative option....

> +}
> +
> +static __maybe_unused int tegra_clock_pm_resume(struct device *dev)
> +{
> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
> +
> +       clk_unprepare(clk_dev->hw->clk);
> +
> +       return 0;
> +}
> +
> +static void tegra_clock_shutdown(struct platform_device *pdev)
> +{
> +       struct tegra_clk_device *clk_dev = platform_get_drvdata(pdev);
> +
> +       clk_prepare(clk_dev->hw->clk);
> +}
> +
> +static const struct dev_pm_ops tegra_clock_pm = {
> +       SET_SYSTEM_SLEEP_PM_OPS(tegra_clock_pm_suspend,
> +                               tegra_clock_pm_resume)
> +};

[...]

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 12:50 p.m. UTC | #5
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Only couple drivers need to get the -ENODEV error code and majority of
> drivers need to explicitly initialize the performance state. Add new
> common helper which sets up OPP table for these drivers.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  include/soc/tegra/common.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/include/soc/tegra/common.h b/include/soc/tegra/common.h
> index af41ad80ec21..5b4a042f60fb 100644
> --- a/include/soc/tegra/common.h
> +++ b/include/soc/tegra/common.h
> @@ -39,4 +39,28 @@ devm_tegra_core_dev_init_opp_table(struct device *dev,
>  }
>  #endif
>
> +/*
> + * This function should be invoked with the enabled runtime PM of the device
> + * in order to initialize performance state properly. Most of Tegra devices
> + * are assumed to be suspended at a probe time and GENPD require RPM to be
> + * enabled to set up the rpm-resume state, otherwise device is active and
> + * performance state is applied immediately. Note that it will initialize
> + * OPP bandwidth if it's wired in a device-tree for this device, which is
> + * undesirable for a suspended device.
> + */
> +static inline int
> +devm_tegra_core_dev_init_opp_table_common(struct device *dev)
> +{
> +       struct tegra_core_opp_params opp_params = {};
> +       int err;
> +
> +       opp_params.init_state = true;
> +
> +       err = devm_tegra_core_dev_init_opp_table(dev, &opp_params);
> +       if (err != -ENODEV)
> +               return err;
> +
> +       return 0;
> +}

Just want to share a few thoughts around these functions.

So, I assume it's fine to call
devm_tegra_core_dev_init_opp_table_common() or
devm_tegra_core_dev_init_opp_table() from consumer drivers during
->probe(), as long as those drivers are tegra specific, which I assume
all are in the series!?

My point is, a cross SoC consumer driver that needs to initiate OPP
tables can get rather messy, if it would need to make one specific
function call per SoC.

That said, I hope we can tackle this as a separate/future problem, so
the series can get merged as is.

> +
>  #endif /* __SOC_TEGRA_COMMON_H__ */
> --
> 2.32.0
>

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 1:24 p.m. UTC | #6
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add runtime PM and OPP support to the Host1x driver. For the starter we
> will keep host1x always-on because dynamic power management require a major
> refactoring of the driver code since lot's of code paths are missing the
> RPM handling and we're going to remove some of these paths in the future.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---

[...]

> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -6,14 +6,18 @@
>   */
>
>  #include <linux/clk.h>
> +#include <linux/delay.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/io.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/of_device.h>
>  #include <linux/of.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/slab.h>
>
> +#include <soc/tegra/common.h>
> +
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/host1x.h>
>  #undef CREATE_TRACE_POINTS
> @@ -190,6 +194,9 @@ static void host1x_setup_sid_table(struct host1x *host)
>         const struct host1x_info *info = host->info;
>         unsigned int i;
>
> +       if (!info->has_hypervisor)
> +               return;
> +
>         for (i = 0; i < info->num_sid_entries; i++) {
>                 const struct host1x_sid_entry *entry = &info->sid_table[i];
>
> @@ -347,6 +354,27 @@ static void host1x_iommu_exit(struct host1x *host)
>         }
>  }
>
> +static int host1x_get_resets(struct host1x *host)
> +{
> +       int err;
> +
> +       host->resets[0].id = "mc";
> +       host->resets[1].id = "host1x";
> +       host->nresets = ARRAY_SIZE(host->resets);
> +
> +       err = devm_reset_control_bulk_get_optional_exclusive_released(
> +                               host->dev, host->nresets, host->resets);
> +       if (err) {
> +               dev_err(host->dev, "failed to get reset: %d\n", err);
> +               return err;
> +       }
> +
> +       if (WARN_ON(!host->resets[1].rstc))
> +               return -ENOENT;
> +
> +       return 0;
> +}
> +
>  static int host1x_probe(struct platform_device *pdev)
>  {
>         struct host1x *host;
> @@ -423,12 +451,9 @@ static int host1x_probe(struct platform_device *pdev)
>                 return err;
>         }
>
> -       host->rst = devm_reset_control_get(&pdev->dev, "host1x");
> -       if (IS_ERR(host->rst)) {
> -               err = PTR_ERR(host->rst);
> -               dev_err(&pdev->dev, "failed to get reset: %d\n", err);
> +       err = host1x_get_resets(host);
> +       if (err)
>                 return err;
> -       }
>
>         err = host1x_iommu_init(host);
>         if (err < 0) {
> @@ -443,22 +468,10 @@ static int host1x_probe(struct platform_device *pdev)
>                 goto iommu_exit;
>         }
>
> -       err = clk_prepare_enable(host->clk);
> -       if (err < 0) {
> -               dev_err(&pdev->dev, "failed to enable clock\n");
> -               goto free_channels;
> -       }
> -
> -       err = reset_control_deassert(host->rst);
> -       if (err < 0) {
> -               dev_err(&pdev->dev, "failed to deassert reset: %d\n", err);
> -               goto unprepare_disable;
> -       }
> -
>         err = host1x_syncpt_init(host);
>         if (err) {
>                 dev_err(&pdev->dev, "failed to initialize syncpts\n");
> -               goto reset_assert;
> +               goto free_channels;
>         }
>
>         err = host1x_intr_init(host, syncpt_irq);
> @@ -467,10 +480,18 @@ static int host1x_probe(struct platform_device *pdev)
>                 goto deinit_syncpt;
>         }
>
> -       host1x_debug_init(host);
> +       pm_runtime_enable(&pdev->dev);
> +
> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> +       if (err)
> +               goto pm_disable;
>
> -       if (host->info->has_hypervisor)
> -               host1x_setup_sid_table(host);
> +       /* the driver's code isn't ready yet for the dynamic RPM */
> +       err = pm_runtime_resume_and_get(&pdev->dev);
> +       if (err)
> +               goto pm_disable;
> +
> +       host1x_debug_init(host);
>
>         err = host1x_register(host);
>         if (err < 0)
> @@ -486,13 +507,14 @@ static int host1x_probe(struct platform_device *pdev)
>         host1x_unregister(host);
>  deinit_debugfs:
>         host1x_debug_deinit(host);
> +
> +       pm_runtime_put(&pdev->dev);

pm_runtime_put() is asynchronous, so it may not actually succeed to
trigger the ->runtime_suspend() callback to be invoked. Thus, this
could end up that we leave clocks prepared/enabled when ->probe()
fails, for example.

I guess pm_runtime_put_sync_suspend() is slightly better.

Another option is to call pm_runtime_force_suspend(), but then you
must skip the call pm_runtime_disable() afterwards, as that has
already been done inside that function.

> +pm_disable:
> +       pm_runtime_disable(&pdev->dev);
> +
>         host1x_intr_deinit(host);
>  deinit_syncpt:
>         host1x_syncpt_deinit(host);
> -reset_assert:
> -       reset_control_assert(host->rst);
> -unprepare_disable:
> -       clk_disable_unprepare(host->clk);
>  free_channels:
>         host1x_channel_list_free(&host->channel_list);
>  iommu_exit:
> @@ -507,19 +529,94 @@ static int host1x_remove(struct platform_device *pdev)
>
>         host1x_unregister(host);
>         host1x_debug_deinit(host);
> +
> +       pm_runtime_put(&pdev->dev);

Similar comment as in ->probe().

> +       pm_runtime_disable(&pdev->dev);
> +
>         host1x_intr_deinit(host);
>         host1x_syncpt_deinit(host);
> -       reset_control_assert(host->rst);
> -       clk_disable_unprepare(host->clk);
>         host1x_iommu_exit(host);
>
>         return 0;
>  }
>
> +
> +       host1x_setup_sid_table(host);
> +       host1x_syncpt_restore(host);
> +       host1x_intr_start(host);
> +
> +       return 0;
> +
> +disable_clk:
> +       clk_disable_unprepare(host->clk);
> +release_reset:
> +       reset_control_bulk_release(host->nresets, host->resets);
> +
> +       return err;
> +}
> +
> +static const struct dev_pm_ops host1x_pm = {
> +       SET_RUNTIME_PM_OPS(host1x_runtime_suspend, host1x_runtime_resume,
> +                          NULL)
> +       /* TODO: add system suspend-resume once driver will be ready for that */
> +};

[...]

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 1:27 p.m. UTC | #7
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add OPP and SoC core voltage scaling support to the display controller
> driver. This is required for enabling system-wide DVFS on pre-Tegra186
> SoCs.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/gpu/drm/tegra/dc.c | 74 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/tegra/dc.h |  2 ++
>  2 files changed, 76 insertions(+)
>
> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
> index a29d64f87563..d4047a14e2b6 100644
> --- a/drivers/gpu/drm/tegra/dc.c
> +++ b/drivers/gpu/drm/tegra/dc.c
> @@ -11,9 +11,12 @@
>  #include <linux/interconnect.h>
>  #include <linux/module.h>
>  #include <linux/of_device.h>
> +#include <linux/pm_domain.h>
> +#include <linux/pm_opp.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>
> +#include <soc/tegra/common.h>
>  #include <soc/tegra/pmc.h>
>
>  #include <drm/drm_atomic.h>
> @@ -1762,6 +1765,47 @@ int tegra_dc_state_setup_clock(struct tegra_dc *dc,
>         return 0;
>  }
>
> +static void tegra_dc_update_voltage_state(struct tegra_dc *dc,
> +                                         struct tegra_dc_state *state)
> +{
> +       unsigned long rate, pstate;
> +       struct dev_pm_opp *opp;
> +       int err;
> +
> +       if (!dc->has_opp_table)
> +               return;
> +
> +       /* calculate actual pixel clock rate which depends on internal divider */
> +       rate = DIV_ROUND_UP(clk_get_rate(dc->clk) * 2, state->div + 2);
> +
> +       /* find suitable OPP for the rate */
> +       opp = dev_pm_opp_find_freq_ceil(dc->dev, &rate);
> +
> +       if (opp == ERR_PTR(-ERANGE))
> +               opp = dev_pm_opp_find_freq_floor(dc->dev, &rate);
> +
> +       if (IS_ERR(opp)) {
> +               dev_err(dc->dev, "failed to find OPP for %luHz: %pe\n",
> +                       rate, opp);
> +               return;
> +       }
> +
> +       pstate = dev_pm_opp_get_required_pstate(opp, 0);
> +       dev_pm_opp_put(opp);
> +
> +       /*
> +        * The minimum core voltage depends on the pixel clock rate (which
> +        * depends on internal clock divider of the CRTC) and not on the
> +        * rate of the display controller clock. This is why we're not using
> +        * dev_pm_opp_set_rate() API and instead controlling the power domain
> +        * directly.
> +        */
> +       err = dev_pm_genpd_set_performance_state(dc->dev, pstate);
> +       if (err)
> +               dev_err(dc->dev, "failed to set power domain state to %lu: %d\n",
> +                       pstate, err);

Yeah, the above code looks very similar to the code I pointed to in
patch6. Perhaps we need to discuss with Viresh, whether it makes sense
to fold in a patch adding an opp helper function after all, to avoid
the open coding.

Viresh?

[...]

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 1:39 p.m. UTC | #8
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add runtime power management and support generic power domains.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--

[...]

>  static int gr2d_remove(struct platform_device *pdev)
> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>                 return err;
>         }
>
> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> +       pm_runtime_disable(&pdev->dev);

There is no guarantee that the ->runtime_suspend() has been invoked
here, which means that clock may be left prepared/enabled beyond this
point.

I suggest you call pm_runtime_force_suspend(), instead of
pm_runtime_disable(), to make sure that gets done.

[...]

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 2:06 p.m. UTC | #9
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> Add runtime power management and support generic power domains.
>
> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/gpu/drm/tegra/gr3d.c | 388 ++++++++++++++++++++++++++++++-----

[...]

> +
> +static int gr3d_probe(struct platform_device *pdev)
> +{
> +       struct host1x_syncpt **syncpts;
> +       struct gr3d *gr3d;
> +       unsigned int i;
> +       int err;
> +
> +       gr3d = devm_kzalloc(&pdev->dev, sizeof(*gr3d), GFP_KERNEL);
> +       if (!gr3d)
> +               return -ENOMEM;
> +
> +       platform_set_drvdata(pdev, gr3d);
> +
> +       gr3d->soc = of_device_get_match_data(&pdev->dev);
> +
> +       syncpts = devm_kzalloc(&pdev->dev, sizeof(*syncpts), GFP_KERNEL);
> +       if (!syncpts)
> +               return -ENOMEM;
> +
> +       err = gr3d_get_clocks(&pdev->dev, gr3d);
> +       if (err)
> +               return err;
> +
> +       err = gr3d_get_resets(&pdev->dev, gr3d);
> +       if (err)
> +               return err;
> +
> +       err = gr3d_init_power(&pdev->dev, gr3d);
> +       if (err)
> +               return err;
> +
>         INIT_LIST_HEAD(&gr3d->client.base.list);
>         gr3d->client.base.ops = &gr3d_client_ops;
>         gr3d->client.base.dev = &pdev->dev;
> @@ -352,20 +552,36 @@ static int gr3d_probe(struct platform_device *pdev)
>         gr3d->client.version = gr3d->soc->version;
>         gr3d->client.ops = &gr3d_ops;
>
> +       pm_runtime_enable(&pdev->dev);
> +       pm_runtime_use_autosuspend(&pdev->dev);
> +       pm_runtime_set_autosuspend_delay(&pdev->dev, 200);
> +
> +       err = devm_pm_opp_register_set_opp_helper(&pdev->dev, gr3d_set_opp);
> +       if (err)
> +               goto disable_rpm;
> +
> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> +       if (err)
> +               goto disable_rpm;
> +
>         err = host1x_client_register(&gr3d->client.base);
>         if (err < 0) {
>                 dev_err(&pdev->dev, "failed to register host1x client: %d\n",
>                         err);
> -               return err;
> +               goto disable_rpm;
>         }
>
>         /* initialize address register map */
>         for (i = 0; i < ARRAY_SIZE(gr3d_addr_regs); i++)
>                 set_bit(gr3d_addr_regs[i], gr3d->addr_regs);
>
> -       platform_set_drvdata(pdev, gr3d);
> -
>         return 0;
> +
> +disable_rpm:
> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> +       pm_runtime_disable(&pdev->dev);

Similar comment as for patch13.

> +
> +       return err;
>  }
>
>  static int gr3d_remove(struct platform_device *pdev)
> @@ -380,23 +596,83 @@ static int gr3d_remove(struct platform_device *pdev)
>                 return err;
>         }
>
> -       if (gr3d->clk_secondary) {
> -               reset_control_assert(gr3d->rst_secondary);
> -               tegra_powergate_power_off(TEGRA_POWERGATE_3D1);
> -               clk_disable_unprepare(gr3d->clk_secondary);
> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> +       pm_runtime_disable(&pdev->dev);

Similar comment as for patch13. You may want to use
pm_runtime_force_suspend() in favor of pm_runtime_disable().

> +
> +       return 0;
> +}

[...]

I was looking for a call to dev_pm_opp_set_rate(), but couldn't find
it. Isn't that needed when changing the rate of the clock?

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 2:18 p.m. UTC | #10
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> The GMI bus on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now GMI must be resumed using
> runtime PM API in order to initialize the GMI power state. Add runtime PM
> and OPP support to the GMI driver.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/bus/tegra-gmi.c | 52 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 46 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/bus/tegra-gmi.c b/drivers/bus/tegra-gmi.c
> index a6570789f7af..72ef8a8c236b 100644
> --- a/drivers/bus/tegra-gmi.c
> +++ b/drivers/bus/tegra-gmi.c
> @@ -13,8 +13,11 @@
>  #include <linux/io.h>
>  #include <linux/module.h>
>  #include <linux/of_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>
> +#include <soc/tegra/common.h>
> +
>  #define TEGRA_GMI_CONFIG               0x00
>  #define TEGRA_GMI_CONFIG_GO            BIT(31)
>  #define TEGRA_GMI_BUS_WIDTH_32BIT      BIT(30)
> @@ -54,9 +57,9 @@ static int tegra_gmi_enable(struct tegra_gmi *gmi)
>  {
>         int err;
>
> -       err = clk_prepare_enable(gmi->clk);
> -       if (err < 0) {
> -               dev_err(gmi->dev, "failed to enable clock: %d\n", err);
> +       err = pm_runtime_resume_and_get(gmi->dev);
> +       if (err) {
> +               pm_runtime_disable(gmi->dev);
>                 return err;
>         }
>
> @@ -83,7 +86,8 @@ static void tegra_gmi_disable(struct tegra_gmi *gmi)
>         writel(config, gmi->base + TEGRA_GMI_CONFIG);
>
>         reset_control_assert(gmi->rst);
> -       clk_disable_unprepare(gmi->clk);
> +
> +       pm_runtime_put(gmi->dev);
>  }
>
>  static int tegra_gmi_parse_dt(struct tegra_gmi *gmi)
> @@ -213,6 +217,7 @@ static int tegra_gmi_probe(struct platform_device *pdev)
>         if (!gmi)
>                 return -ENOMEM;
>
> +       platform_set_drvdata(pdev, gmi);
>         gmi->dev = dev;
>
>         res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> @@ -232,6 +237,14 @@ static int tegra_gmi_probe(struct platform_device *pdev)
>                 return PTR_ERR(gmi->rst);
>         }
>
> +       err = devm_pm_runtime_enable(gmi->dev);
> +       if (err)
> +               return err;
> +
> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> +       if (err)
> +               return err;
> +
>         err = tegra_gmi_parse_dt(gmi);
>         if (err)
>                 return err;
> @@ -247,8 +260,6 @@ static int tegra_gmi_probe(struct platform_device *pdev)
>                 return err;
>         }
>
> -       platform_set_drvdata(pdev, gmi);
> -
>         return 0;
>  }
>
> @@ -262,6 +273,34 @@ static int tegra_gmi_remove(struct platform_device *pdev)

Similar comment as for patch13, for the ->remove() callback.

This problem, which sometimes also exists in the error path in
->probe() (according to my comments in patch13), seems to be a common
issue in the series. I will therefore not continue to repeat my
comment on this for the remaining patches in the series, I think I
have made my point. :-)

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 2:24 p.m. UTC | #11
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> The NAND on Tegra belongs to the core power domain and we're going to
> enable GENPD support for the core domain. Now NAND must be resumed using
> runtime PM API in order to initialize the NAND power state. Add runtime PM
> and OPP support to the NAND driver.
>
> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/mtd/nand/raw/tegra_nand.c | 55 ++++++++++++++++++++++++++-----
>  1 file changed, 47 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mtd/nand/raw/tegra_nand.c b/drivers/mtd/nand/raw/tegra_nand.c
> index 32431bbe69b8..098fcc9cb9df 100644
> --- a/drivers/mtd/nand/raw/tegra_nand.c
> +++ b/drivers/mtd/nand/raw/tegra_nand.c
> @@ -17,8 +17,11 @@
>  #include <linux/mtd/rawnand.h>
>  #include <linux/of.h>
>  #include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
>  #include <linux/reset.h>
>
> +#include <soc/tegra/common.h>
> +
>  #define COMMAND                                        0x00
>  #define   COMMAND_GO                           BIT(31)
>  #define   COMMAND_CLE                          BIT(30)
> @@ -1151,6 +1154,7 @@ static int tegra_nand_probe(struct platform_device *pdev)
>                 return -ENOMEM;
>
>         ctrl->dev = &pdev->dev;
> +       platform_set_drvdata(pdev, ctrl);
>         nand_controller_init(&ctrl->controller);
>         ctrl->controller.ops = &tegra_nand_controller_ops;
>
> @@ -1166,14 +1170,22 @@ static int tegra_nand_probe(struct platform_device *pdev)
>         if (IS_ERR(ctrl->clk))
>                 return PTR_ERR(ctrl->clk);
>
> -       err = clk_prepare_enable(ctrl->clk);
> +       err = devm_pm_runtime_enable(&pdev->dev);
> +       if (err)
> +               return err;
> +
> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> +       if (err)
> +               return err;
> +
> +       err = pm_runtime_resume_and_get(&pdev->dev);
>         if (err)
>                 return err;
>
>         err = reset_control_reset(rst);
>         if (err) {
>                 dev_err(ctrl->dev, "Failed to reset HW: %d\n", err);
> -               goto err_disable_clk;
> +               goto err_put_pm;
>         }
>
>         writel_relaxed(HWSTATUS_CMD_DEFAULT, ctrl->regs + HWSTATUS_CMD);
> @@ -1188,21 +1200,19 @@ static int tegra_nand_probe(struct platform_device *pdev)
>                                dev_name(&pdev->dev), ctrl);
>         if (err) {
>                 dev_err(ctrl->dev, "Failed to get IRQ: %d\n", err);
> -               goto err_disable_clk;
> +               goto err_put_pm;
>         }
>
>         writel_relaxed(DMA_MST_CTRL_IS_DONE, ctrl->regs + DMA_MST_CTRL);
>
>         err = tegra_nand_chips_init(ctrl->dev, ctrl);
>         if (err)
> -               goto err_disable_clk;
> -
> -       platform_set_drvdata(pdev, ctrl);
> +               goto err_put_pm;
>

There is no corresponding call pm_runtime_put() here. Is it
intentional to always leave the device runtime resumed after ->probe()
has succeeded?

I noticed you included some comments about this for some other
drivers, as those needed more tweaks. Is that also the case for this
driver?

>         return 0;
>
> -err_disable_clk:
> -       clk_disable_unprepare(ctrl->clk);
> +err_put_pm:
> +       pm_runtime_put(ctrl->dev);
>         return err;
>  }
>

[...]

Kind regards
Uffe
Dmitry Osipenko Oct. 1, 2021, 2:29 p.m. UTC | #12
01.10.2021 16:39, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> Add runtime power management and support generic power domains.
>>
>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> 
> [...]
> 
>>  static int gr2d_remove(struct platform_device *pdev)
>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>>                 return err;
>>         }
>>
>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
>> +       pm_runtime_disable(&pdev->dev);
> 
> There is no guarantee that the ->runtime_suspend() has been invoked
> here, which means that clock may be left prepared/enabled beyond this
> point.
> 
> I suggest you call pm_runtime_force_suspend(), instead of
> pm_runtime_disable(), to make sure that gets done.

The pm_runtime_disable() performs the final synchronization, please see [1].

[1]
https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412

Calling pm_runtime_force_suspend() isn't correct because each 'enable'
must have the corresponding 'disable'. Hence there is no problem here.
Dmitry Osipenko Oct. 1, 2021, 2:35 p.m. UTC | #13
01.10.2021 17:24, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> The NAND on Tegra belongs to the core power domain and we're going to
>> enable GENPD support for the core domain. Now NAND must be resumed using
>> runtime PM API in order to initialize the NAND power state. Add runtime PM
>> and OPP support to the NAND driver.
>>
>> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/mtd/nand/raw/tegra_nand.c | 55 ++++++++++++++++++++++++++-----
>>  1 file changed, 47 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/raw/tegra_nand.c b/drivers/mtd/nand/raw/tegra_nand.c
>> index 32431bbe69b8..098fcc9cb9df 100644
>> --- a/drivers/mtd/nand/raw/tegra_nand.c
>> +++ b/drivers/mtd/nand/raw/tegra_nand.c
>> @@ -17,8 +17,11 @@
>>  #include <linux/mtd/rawnand.h>
>>  #include <linux/of.h>
>>  #include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>>  #include <linux/reset.h>
>>
>> +#include <soc/tegra/common.h>
>> +
>>  #define COMMAND                                        0x00
>>  #define   COMMAND_GO                           BIT(31)
>>  #define   COMMAND_CLE                          BIT(30)
>> @@ -1151,6 +1154,7 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>                 return -ENOMEM;
>>
>>         ctrl->dev = &pdev->dev;
>> +       platform_set_drvdata(pdev, ctrl);
>>         nand_controller_init(&ctrl->controller);
>>         ctrl->controller.ops = &tegra_nand_controller_ops;
>>
>> @@ -1166,14 +1170,22 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>         if (IS_ERR(ctrl->clk))
>>                 return PTR_ERR(ctrl->clk);
>>
>> -       err = clk_prepare_enable(ctrl->clk);
>> +       err = devm_pm_runtime_enable(&pdev->dev);
>> +       if (err)
>> +               return err;
>> +
>> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
>> +       if (err)
>> +               return err;
>> +
>> +       err = pm_runtime_resume_and_get(&pdev->dev);
>>         if (err)
>>                 return err;
>>
>>         err = reset_control_reset(rst);
>>         if (err) {
>>                 dev_err(ctrl->dev, "Failed to reset HW: %d\n", err);
>> -               goto err_disable_clk;
>> +               goto err_put_pm;
>>         }
>>
>>         writel_relaxed(HWSTATUS_CMD_DEFAULT, ctrl->regs + HWSTATUS_CMD);
>> @@ -1188,21 +1200,19 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>                                dev_name(&pdev->dev), ctrl);
>>         if (err) {
>>                 dev_err(ctrl->dev, "Failed to get IRQ: %d\n", err);
>> -               goto err_disable_clk;
>> +               goto err_put_pm;
>>         }
>>
>>         writel_relaxed(DMA_MST_CTRL_IS_DONE, ctrl->regs + DMA_MST_CTRL);
>>
>>         err = tegra_nand_chips_init(ctrl->dev, ctrl);
>>         if (err)
>> -               goto err_disable_clk;
>> -
>> -       platform_set_drvdata(pdev, ctrl);
>> +               goto err_put_pm;
>>
> 
> There is no corresponding call pm_runtime_put() here. Is it
> intentional to always leave the device runtime resumed after ->probe()
> has succeeded?
> 
> I noticed you included some comments about this for some other
> drivers, as those needed more tweaks. Is that also the case for this
> driver?

Could you please clarify? There is pm_runtime_put() in both probe-error
and remove() code paths here.

I assume you're meaning pm_runtime_disable(), but this patch uses
resource-managed devm_pm_runtime_enable(), and thus, explicit disable
isn't needed.

>>         return 0;
>>
>> -err_disable_clk:
>> -       clk_disable_unprepare(ctrl->clk);
>> +err_put_pm:
>> +       pm_runtime_put(ctrl->dev);
>>         return err;
>>  }
>>
> 
> [...]
> 
> Kind regards
> Uffe
>
Ulf Hansson Oct. 1, 2021, 2:36 p.m. UTC | #14
On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> This series adds runtime PM support to Tegra drivers and enables core
> voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.
>
> All patches in this series are interdependent and should go via Tegra tree.
>
> Changelog:
>
> v13: - Fixed compile-test error reported by build bot by reverting the
>        mmc/ patch to v11. The sdhci_suspend/resume_host() functions aren't
>        available with the disabled CONFIG_PM_SLEEP, some code needs the
>        ifdef.
>
>      - Added last r-b from Rob Herring for the DT patches.
>
>      - Corrected clk/ PM domain-support patch by not using the
>        devm_tegra_core_dev_init_opp_table_common() helper, which I
>        utilized in v12. The clk driver implements its own power domain
>        state syncing and common helper shouldn't be used. This fixes driver
>        probing for some clocks on some devices. It was reported by
>        Svyatoslav Ryhel for PLLE OPP error on T30 Asus Transformer tablet.

Dmitry, I have looked through the series and besides those comments
that I have posted, I have nothing more to add. Overall it looks good
to me.

Kind regards
Uffe
Dmitry Osipenko Oct. 1, 2021, 2:40 p.m. UTC | #15
01.10.2021 17:36, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> This series adds runtime PM support to Tegra drivers and enables core
>> voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.
>>
>> All patches in this series are interdependent and should go via Tegra tree.
>>
>> Changelog:
>>
>> v13: - Fixed compile-test error reported by build bot by reverting the
>>        mmc/ patch to v11. The sdhci_suspend/resume_host() functions aren't
>>        available with the disabled CONFIG_PM_SLEEP, some code needs the
>>        ifdef.
>>
>>      - Added last r-b from Rob Herring for the DT patches.
>>
>>      - Corrected clk/ PM domain-support patch by not using the
>>        devm_tegra_core_dev_init_opp_table_common() helper, which I
>>        utilized in v12. The clk driver implements its own power domain
>>        state syncing and common helper shouldn't be used. This fixes driver
>>        probing for some clocks on some devices. It was reported by
>>        Svyatoslav Ryhel for PLLE OPP error on T30 Asus Transformer tablet.
> 
> Dmitry, I have looked through the series and besides those comments
> that I have posted, I have nothing more to add. Overall it looks good
> to me.

Ulf, thank you very much! Yours input is invaluable. I'm happy that this
series moving steadily to the final stage.
Ulf Hansson Oct. 1, 2021, 2:55 p.m. UTC | #16
On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 16:39, Ulf Hansson пишет:
> > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> Add runtime power management and support generic power domains.
> >>
> >> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >> ---
> >>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >
> > [...]
> >
> >>  static int gr2d_remove(struct platform_device *pdev)
> >> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>                 return err;
> >>         }
> >>
> >> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >> +       pm_runtime_disable(&pdev->dev);
> >
> > There is no guarantee that the ->runtime_suspend() has been invoked
> > here, which means that clock may be left prepared/enabled beyond this
> > point.
> >
> > I suggest you call pm_runtime_force_suspend(), instead of
> > pm_runtime_disable(), to make sure that gets done.
>
> The pm_runtime_disable() performs the final synchronization, please see [1].
>
> [1]
> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412

pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
cancel_work_sync() if dev->power.request_pending has been set.

If the work that was punted to the pm_wq in rpm_idle() has not been
started yet, we end up just canceling it. In other words, there are no
guarantees it runs to completion.

Moreover, use space may have bumped the usage count via sysfs for the
device (pm_runtime_forbid()) to keep the device runtime resumed.

>
> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> must have the corresponding 'disable'. Hence there is no problem here.

pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
should be fine. No?

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 3:01 p.m. UTC | #17
On Fri, 1 Oct 2021 at 16:35, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 17:24, Ulf Hansson пишет:
> > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> The NAND on Tegra belongs to the core power domain and we're going to
> >> enable GENPD support for the core domain. Now NAND must be resumed using
> >> runtime PM API in order to initialize the NAND power state. Add runtime PM
> >> and OPP support to the NAND driver.
> >>
> >> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >> ---
> >>  drivers/mtd/nand/raw/tegra_nand.c | 55 ++++++++++++++++++++++++++-----
> >>  1 file changed, 47 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/mtd/nand/raw/tegra_nand.c b/drivers/mtd/nand/raw/tegra_nand.c
> >> index 32431bbe69b8..098fcc9cb9df 100644
> >> --- a/drivers/mtd/nand/raw/tegra_nand.c
> >> +++ b/drivers/mtd/nand/raw/tegra_nand.c
> >> @@ -17,8 +17,11 @@
> >>  #include <linux/mtd/rawnand.h>
> >>  #include <linux/of.h>
> >>  #include <linux/platform_device.h>
> >> +#include <linux/pm_runtime.h>
> >>  #include <linux/reset.h>
> >>
> >> +#include <soc/tegra/common.h>
> >> +
> >>  #define COMMAND                                        0x00
> >>  #define   COMMAND_GO                           BIT(31)
> >>  #define   COMMAND_CLE                          BIT(30)
> >> @@ -1151,6 +1154,7 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>                 return -ENOMEM;
> >>
> >>         ctrl->dev = &pdev->dev;
> >> +       platform_set_drvdata(pdev, ctrl);
> >>         nand_controller_init(&ctrl->controller);
> >>         ctrl->controller.ops = &tegra_nand_controller_ops;
> >>
> >> @@ -1166,14 +1170,22 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>         if (IS_ERR(ctrl->clk))
> >>                 return PTR_ERR(ctrl->clk);
> >>
> >> -       err = clk_prepare_enable(ctrl->clk);
> >> +       err = devm_pm_runtime_enable(&pdev->dev);
> >> +       if (err)
> >> +               return err;
> >> +
> >> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> >> +       if (err)
> >> +               return err;
> >> +
> >> +       err = pm_runtime_resume_and_get(&pdev->dev);
> >>         if (err)
> >>                 return err;
> >>
> >>         err = reset_control_reset(rst);
> >>         if (err) {
> >>                 dev_err(ctrl->dev, "Failed to reset HW: %d\n", err);
> >> -               goto err_disable_clk;
> >> +               goto err_put_pm;
> >>         }
> >>
> >>         writel_relaxed(HWSTATUS_CMD_DEFAULT, ctrl->regs + HWSTATUS_CMD);
> >> @@ -1188,21 +1200,19 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>                                dev_name(&pdev->dev), ctrl);
> >>         if (err) {
> >>                 dev_err(ctrl->dev, "Failed to get IRQ: %d\n", err);
> >> -               goto err_disable_clk;
> >> +               goto err_put_pm;
> >>         }
> >>
> >>         writel_relaxed(DMA_MST_CTRL_IS_DONE, ctrl->regs + DMA_MST_CTRL);
> >>
> >>         err = tegra_nand_chips_init(ctrl->dev, ctrl);
> >>         if (err)
> >> -               goto err_disable_clk;
> >> -
> >> -       platform_set_drvdata(pdev, ctrl);
> >> +               goto err_put_pm;
> >>
> >
> > There is no corresponding call pm_runtime_put() here. Is it
> > intentional to always leave the device runtime resumed after ->probe()
> > has succeeded?
> >
> > I noticed you included some comments about this for some other
> > drivers, as those needed more tweaks. Is that also the case for this
> > driver?
>
> Could you please clarify? There is pm_runtime_put() in both probe-error
> and remove() code paths here.

I was not considering the error path of ->probe() (or ->remove()), but
was rather thinking about when ->probe() completes successfully. Then
you keep the device runtime resumed, because you have called
pm_runtime_resume_and_get() for it.

Shouldn't you have a corresponding pm_runtime_put() in ->probe(),
allowing it to be runtime suspended, until the device is really needed
later on. No?

>
> I assume you're meaning pm_runtime_disable(), but this patch uses
> resource-managed devm_pm_runtime_enable(), and thus, explicit disable
> isn't needed.
>
> >>         return 0;
> >>
> >> -err_disable_clk:
> >> -       clk_disable_unprepare(ctrl->clk);
> >> +err_put_pm:
> >> +       pm_runtime_put(ctrl->dev);
> >>         return err;
> >>  }
> >>

[...]

Kind regards
Uffe
Ulf Hansson Oct. 1, 2021, 3:02 p.m. UTC | #18
On Fri, 1 Oct 2021 at 16:41, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 17:36, Ulf Hansson пишет:
> > On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> This series adds runtime PM support to Tegra drivers and enables core
> >> voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.
> >>
> >> All patches in this series are interdependent and should go via Tegra tree.
> >>
> >> Changelog:
> >>
> >> v13: - Fixed compile-test error reported by build bot by reverting the
> >>        mmc/ patch to v11. The sdhci_suspend/resume_host() functions aren't
> >>        available with the disabled CONFIG_PM_SLEEP, some code needs the
> >>        ifdef.
> >>
> >>      - Added last r-b from Rob Herring for the DT patches.
> >>
> >>      - Corrected clk/ PM domain-support patch by not using the
> >>        devm_tegra_core_dev_init_opp_table_common() helper, which I
> >>        utilized in v12. The clk driver implements its own power domain
> >>        state syncing and common helper shouldn't be used. This fixes driver
> >>        probing for some clocks on some devices. It was reported by
> >>        Svyatoslav Ryhel for PLLE OPP error on T30 Asus Transformer tablet.
> >
> > Dmitry, I have looked through the series and besides those comments
> > that I have posted, I have nothing more to add. Overall it looks good
> > to me.
>
> Ulf, thank you very much! Yours input is invaluable. I'm happy that this
> series moving steadily to the final stage.

My pleasure. Let's get the final pieces fixed so we can get this merged! :-)

Kind regards
Uffe
Dmitry Osipenko Oct. 1, 2021, 7 p.m. UTC | #19
01.10.2021 17:55, Ulf Hansson пишет:
> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 01.10.2021 16:39, Ulf Hansson пишет:
>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> Add runtime power management and support generic power domains.
>>>>
>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
>>>
>>> [...]
>>>
>>>>  static int gr2d_remove(struct platform_device *pdev)
>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
>>>>                 return err;
>>>>         }
>>>>
>>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
>>>> +       pm_runtime_disable(&pdev->dev);
>>>
>>> There is no guarantee that the ->runtime_suspend() has been invoked
>>> here, which means that clock may be left prepared/enabled beyond this
>>> point.
>>>
>>> I suggest you call pm_runtime_force_suspend(), instead of
>>> pm_runtime_disable(), to make sure that gets done.
>>
>> The pm_runtime_disable() performs the final synchronization, please see [1].
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> 
> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> cancel_work_sync() if dev->power.request_pending has been set.
> 
> If the work that was punted to the pm_wq in rpm_idle() has not been
> started yet, we end up just canceling it. In other words, there are no
> guarantees it runs to completion.

You're right. Although, in a case of this particular patch, the syncing
is actually implicitly done by pm_runtime_dont_use_autosuspend().

But for drivers which don't use auto-suspend, there is no sync. This
looks like a disaster, it's a very common pattern for drivers to
'put+disable'.

> Moreover, use space may have bumped the usage count via sysfs for the
> device (pm_runtime_forbid()) to keep the device runtime resumed.

Right, this is also a disaster in a case of driver removal.

>> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
>> must have the corresponding 'disable'. Hence there is no problem here.
> 
> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> should be fine. No?

[adding Rafael]

Rafael, could you please explain how drivers are supposed to properly
suspend and disable RPM to cut off power and reset state that was
altered by the driver's resume callback? What we're missing? Is Ulf's
suggestion acceptable?

The RPM state of a device is getting reset on driver's removal, hence
all refcounts that were bumped by the rpm-resume callback of the device
driver will be screwed up if device is kept resumed after removal. I
just verified that it's true in practice.
Dmitry Osipenko Oct. 1, 2021, 7:15 p.m. UTC | #20
01.10.2021 15:50, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> Only couple drivers need to get the -ENODEV error code and majority of
>> drivers need to explicitly initialize the performance state. Add new
>> common helper which sets up OPP table for these drivers.
>>
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  include/soc/tegra/common.h | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/include/soc/tegra/common.h b/include/soc/tegra/common.h
>> index af41ad80ec21..5b4a042f60fb 100644
>> --- a/include/soc/tegra/common.h
>> +++ b/include/soc/tegra/common.h
>> @@ -39,4 +39,28 @@ devm_tegra_core_dev_init_opp_table(struct device *dev,
>>  }
>>  #endif
>>
>> +/*
>> + * This function should be invoked with the enabled runtime PM of the device
>> + * in order to initialize performance state properly. Most of Tegra devices
>> + * are assumed to be suspended at a probe time and GENPD require RPM to be
>> + * enabled to set up the rpm-resume state, otherwise device is active and
>> + * performance state is applied immediately. Note that it will initialize
>> + * OPP bandwidth if it's wired in a device-tree for this device, which is
>> + * undesirable for a suspended device.
>> + */
>> +static inline int
>> +devm_tegra_core_dev_init_opp_table_common(struct device *dev)
>> +{
>> +       struct tegra_core_opp_params opp_params = {};
>> +       int err;
>> +
>> +       opp_params.init_state = true;
>> +
>> +       err = devm_tegra_core_dev_init_opp_table(dev, &opp_params);
>> +       if (err != -ENODEV)
>> +               return err;
>> +
>> +       return 0;
>> +}
> 
> Just want to share a few thoughts around these functions.
> 
> So, I assume it's fine to call
> devm_tegra_core_dev_init_opp_table_common() or
> devm_tegra_core_dev_init_opp_table() from consumer drivers during
> ->probe(), as long as those drivers are tegra specific, which I assume
> all are in the series!?

That is correct, all drivers are tegra-specific in this series. External
devices are attached to the internal SoC devices and this series is
about the SoC power management.

> My point is, a cross SoC consumer driver that needs to initiate OPP
> tables can get rather messy, if it would need to make one specific
> function call per SoC.
> 
> That said, I hope we can tackle this as a separate/future problem, so
> the series can get merged as is.

Yes, as we already have seen, it's not an easy problem to make PD core
to handle it in a generic way. If there will be a similar demand from
other SoCs, then we may try to solve that problem again.
Dmitry Osipenko Oct. 1, 2021, 7:50 p.m. UTC | #21
01.10.2021 15:32, Ulf Hansson пишет:
>> +static int tegra_clock_sync_pd_state(struct tegra_clk_device *clk_dev)
>> +{
>> +       unsigned long rate;
>> +       int ret = 0;
>> +
>> +       mutex_lock(&clk_dev->lock);
>> +
>> +       if (!pm_runtime_status_suspended(clk_dev->dev)) {
>> +               rate = clk_hw_get_rate(clk_dev->hw);
>> +               ret = tegra_clock_set_pd_state(clk_dev, rate);
> Don't we need to sync the performance state even when the device is
> runtime suspended?
> 
> Perhaps the clock, via a child-clock for example, can get
> prepared/enabled (hence its device gets runtime resumed) before there
> is a clock rate update for it. Then there is no performance state set
> for it, right? Or maybe that isn't a problem?
> 

Good catch! Older versions of this patch had a special handling for clk
enable/disable. I just forgot to update this function, it's now not a
problem to change performance state of a suspended device and it
actually needs to be done. I'll correct it, thanks!
Dmitry Osipenko Oct. 1, 2021, 9:25 p.m. UTC | #22
01.10.2021 17:06, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>> Add runtime power management and support generic power domains.
>>
>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/gpu/drm/tegra/gr3d.c | 388 ++++++++++++++++++++++++++++++-----
> [...]

> 
> I was looking for a call to dev_pm_opp_set_rate(), but couldn't find
> it. Isn't that needed when changing the rate of the clock?

That is another good catch! Previous versions of this patch were
changing the rate, while the current version not. So the
set_opp_helper() isn't needed for this patch anymore. It may become
needed sometime later, but not for this series. I'll remove it in the
next version, thanks!
Dmitry Osipenko Oct. 2, 2021, 8:44 p.m. UTC | #23
01.10.2021 15:32, Ulf Hansson пишет:
>> +static __maybe_unused int tegra_clock_pm_suspend(struct device *dev)
>> +{
>> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
>> +
>> +       /*
>> +        * Power management of the clock is entangled with the Tegra PMC
>> +        * GENPD because PMC driver enables/disables clocks for toggling
>> +        * of the PD's on/off state.
>> +        *
>> +        * The PMC GENPD is resumed in NOIRQ phase, before RPM of the clocks
>> +        * becomes available, hence PMC can't use clocks at the early resume
>> +        * phase if RPM is involved. For example when 3d clock is enabled,
>> +        * it may enable the parent PLL clock that needs to be RPM-resumed.
>> +        *
>> +        * Secondly, the PLL clocks may be enabled by the low level suspend
>> +        * code, so we need to assume that PLL is in enabled state during
>> +        * suspend.
>> +        *
>> +        * We will keep PLLs and system clock resumed during suspend time.
>> +        * All PLLs on all SoCs are low power and system clock is always-on,
>> +        * so practically not much is changed here.
>> +        */
>> +
>> +       return clk_prepare(clk_dev->hw->clk);
> I am trying to understand, more exactly, what you intend to achieve
> with the clk_prepare() here. It looks a bit weird, to me. Can you try
> to elaborate a bit more on the use case?

The Tegra GENPD driver enable/disable clocks when domain is turned on.
This can't be done during early system resume, when domains are getting
turned on by the drivers core, because when clock is enabled, it's
getting prepared (RPM-resumed) and this preparation fails because
performance state of the clock goes up and it doesn't work during the
early resume time since I2C, which applies the state to hardware, is
suspended and can't work at that early time.

Secondly, Tegra has arch-specific low level assembly which touches
clocks during last phase of system suspend and in the beginning of
resume. Hence, clocks should stay prepared during suspend just because
technically clock should be prepared before it can be enabled.

> Is this rather about making sure that the clock's corresponding PM
> domain stays powered on during system suspend? In that case, I think
> there may be an alternative option....
> 

This is not about domain staying powered on, this is about keeping the
performance state of the domain high during suspend.
Viresh Kumar Oct. 4, 2021, 9:11 a.m. UTC | #24
On 27-09-21, 01:40, Dmitry Osipenko wrote:
> This series adds runtime PM support to Tegra drivers and enables core
> voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.
> 
> All patches in this series are interdependent and should go via Tegra tree.

So you don't need any OPP changes anymore ? I just came back from
vacation, don't know what you guys discussed in between :)
Viresh Kumar Oct. 4, 2021, 9:11 a.m. UTC | #25
On 27-09-21, 01:40, Dmitry Osipenko wrote:
> Elements of the 'names' array are not changed by the code, constify them
> for consistency.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/opp/core.c     | 6 +++---
>  include/linux/pm_opp.h | 8 ++++----
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 04b4691a8aac..3057beabd370 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -2348,12 +2348,12 @@ static void _opp_detach_genpd(struct opp_table *opp_table)
>   * "required-opps" are added in DT.
>   */
>  struct opp_table *dev_pm_opp_attach_genpd(struct device *dev,
> -		const char **names, struct device ***virt_devs)
> +		const char * const *names, struct device ***virt_devs)
>  {
>  	struct opp_table *opp_table;
>  	struct device *virt_dev;
>  	int index = 0, ret = -EINVAL;
> -	const char **name = names;
> +	const char * const *name = names;
>  
>  	opp_table = _add_opp_table(dev, false);
>  	if (IS_ERR(opp_table))
> @@ -2457,7 +2457,7 @@ static void devm_pm_opp_detach_genpd(void *data)
>   *
>   * Return: 0 on success and errorno otherwise.
>   */
> -int devm_pm_opp_attach_genpd(struct device *dev, const char **names,
> +int devm_pm_opp_attach_genpd(struct device *dev, const char * const *names,
>  			     struct device ***virt_devs)
>  {
>  	struct opp_table *opp_table;
> diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h
> index a95d6fdd20b6..879c138c7b8e 100644
> --- a/include/linux/pm_opp.h
> +++ b/include/linux/pm_opp.h
> @@ -156,9 +156,9 @@ int devm_pm_opp_set_clkname(struct device *dev, const char *name);
>  struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data));
>  void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table);
>  int devm_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data));
> -struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char **names, struct device ***virt_devs);
> +struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs);
>  void dev_pm_opp_detach_genpd(struct opp_table *opp_table);
> -int devm_pm_opp_attach_genpd(struct device *dev, const char **names, struct device ***virt_devs);
> +int devm_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs);
>  struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp);
>  int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate);
>  int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
> @@ -376,7 +376,7 @@ static inline int devm_pm_opp_set_clkname(struct device *dev, const char *name)
>  	return -EOPNOTSUPP;
>  }
>  
> -static inline struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char **names, struct device ***virt_devs)
> +static inline struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs)
>  {
>  	return ERR_PTR(-EOPNOTSUPP);
>  }
> @@ -384,7 +384,7 @@ static inline struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, cons
>  static inline void dev_pm_opp_detach_genpd(struct opp_table *opp_table) {}
>  
>  static inline int devm_pm_opp_attach_genpd(struct device *dev,
> -					   const char **names,
> +					   const char * const *names,
>  					   struct device ***virt_devs)
>  {
>  	return -EOPNOTSUPP;

Applied. Thanks.
Ulf Hansson Oct. 4, 2021, 11:01 a.m. UTC | #26
On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 17:55, Ulf Hansson пишет:
> > On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 01.10.2021 16:39, Ulf Hansson пишет:
> >>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> Add runtime power management and support generic power domains.
> >>>>
> >>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>>> ---
> >>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >>>
> >>> [...]
> >>>
> >>>>  static int gr2d_remove(struct platform_device *pdev)
> >>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>>>                 return err;
> >>>>         }
> >>>>
> >>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >>>> +       pm_runtime_disable(&pdev->dev);
> >>>
> >>> There is no guarantee that the ->runtime_suspend() has been invoked
> >>> here, which means that clock may be left prepared/enabled beyond this
> >>> point.
> >>>
> >>> I suggest you call pm_runtime_force_suspend(), instead of
> >>> pm_runtime_disable(), to make sure that gets done.
> >>
> >> The pm_runtime_disable() performs the final synchronization, please see [1].
> >>
> >> [1]
> >> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> >
> > pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> > cancel_work_sync() if dev->power.request_pending has been set.
> >
> > If the work that was punted to the pm_wq in rpm_idle() has not been
> > started yet, we end up just canceling it. In other words, there are no
> > guarantees it runs to completion.
>
> You're right. Although, in a case of this particular patch, the syncing
> is actually implicitly done by pm_runtime_dont_use_autosuspend().
>
> But for drivers which don't use auto-suspend, there is no sync. This
> looks like a disaster, it's a very common pattern for drivers to
> 'put+disable'.
>
> > Moreover, use space may have bumped the usage count via sysfs for the
> > device (pm_runtime_forbid()) to keep the device runtime resumed.
>
> Right, this is also a disaster in a case of driver removal.
>
> >> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> >> must have the corresponding 'disable'. Hence there is no problem here.
> >
> > pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> > should be fine. No?
>
> [adding Rafael]
>
> Rafael, could you please explain how drivers are supposed to properly
> suspend and disable RPM to cut off power and reset state that was
> altered by the driver's resume callback? What we're missing? Is Ulf's
> suggestion acceptable?
>
> The RPM state of a device is getting reset on driver's removal, hence
> all refcounts that were bumped by the rpm-resume callback of the device
> driver will be screwed up if device is kept resumed after removal. I
> just verified that it's true in practice.

Note that, what makes the Tegra drivers a bit special is that they are
always built with CONFIG_PM being set (selected from the "SoC"
Kconfig).

Therefore, pm_runtime_force_suspend() can work for some of these
cases. Using this, would potentially avoid the driver from having to
runtime resume the device in ->remove(), according to the below
generic sequence, which is used in many drivers.

pm_runtime_get_sync()
clk_disable_unprepare() (+ additional things to turn off the device)
pm_runtime_disable()
pm_runtime_put_noidle()

Kind regards
Uffe
Dmitry Osipenko Oct. 4, 2021, 2:52 p.m. UTC | #27
04.10.2021 12:11, Viresh Kumar пишет:
> On 27-09-21, 01:40, Dmitry Osipenko wrote:
>> This series adds runtime PM support to Tegra drivers and enables core
>> voltage scaling for Tegra20/30 SoCs, resolving overheating troubles.
>>
>> All patches in this series are interdependent and should go via Tegra tree.
> 
> So you don't need any OPP changes anymore ? I just came back from
> vacation, don't know what you guys discussed in between :)
> 

We discussed it and decided that we don't need more OPP/domain core
changes. It's already good enough for the starter and making it all
absolutely ideal doesn't worth the effort for now.
Ulf Hansson Oct. 5, 2021, 8:45 a.m. UTC | #28
On Mon, 4 Oct 2021 at 17:57, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 04.10.2021 14:01, Ulf Hansson пишет:
> > On Fri, 1 Oct 2021 at 21:00, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 01.10.2021 17:55, Ulf Hansson пишет:
> >>> On Fri, 1 Oct 2021 at 16:29, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> 01.10.2021 16:39, Ulf Hansson пишет:
> >>>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>>>
> >>>>>> Add runtime power management and support generic power domains.
> >>>>>>
> >>>>>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
> >>>>>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
> >>>>>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
> >>>>>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
> >>>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>>>>> ---
> >>>>>>  drivers/gpu/drm/tegra/gr2d.c | 155 +++++++++++++++++++++++++++++++++--
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>  static int gr2d_remove(struct platform_device *pdev)
> >>>>>> @@ -259,15 +312,101 @@ static int gr2d_remove(struct platform_device *pdev)
> >>>>>>                 return err;
> >>>>>>         }
> >>>>>>
> >>>>>> +       pm_runtime_dont_use_autosuspend(&pdev->dev);
> >>>>>> +       pm_runtime_disable(&pdev->dev);
> >>>>>
> >>>>> There is no guarantee that the ->runtime_suspend() has been invoked
> >>>>> here, which means that clock may be left prepared/enabled beyond this
> >>>>> point.
> >>>>>
> >>>>> I suggest you call pm_runtime_force_suspend(), instead of
> >>>>> pm_runtime_disable(), to make sure that gets done.
> >>>>
> >>>> The pm_runtime_disable() performs the final synchronization, please see [1].
> >>>>
> >>>> [1]
> >>>> https://elixir.bootlin.com/linux/v5.15-rc3/source/drivers/base/power/runtime.c#L1412
> >>>
> >>> pm_runtime_disable() end up calling _pm_runtime_barrier(), which calls
> >>> cancel_work_sync() if dev->power.request_pending has been set.
> >>>
> >>> If the work that was punted to the pm_wq in rpm_idle() has not been
> >>> started yet, we end up just canceling it. In other words, there are no
> >>> guarantees it runs to completion.
> >>
> >> You're right. Although, in a case of this particular patch, the syncing
> >> is actually implicitly done by pm_runtime_dont_use_autosuspend().
> >>
> >> But for drivers which don't use auto-suspend, there is no sync. This
> >> looks like a disaster, it's a very common pattern for drivers to
> >> 'put+disable'.
> >>
> >>> Moreover, use space may have bumped the usage count via sysfs for the
> >>> device (pm_runtime_forbid()) to keep the device runtime resumed.
> >>
> >> Right, this is also a disaster in a case of driver removal.
> >>
> >>>> Calling pm_runtime_force_suspend() isn't correct because each 'enable'
> >>>> must have the corresponding 'disable'. Hence there is no problem here.
> >>>
> >>> pm_runtime_force_suspend() calls pm_runtime_disable(), so I think that
> >>> should be fine. No?
> >>
> >> [adding Rafael]
> >>
> >> Rafael, could you please explain how drivers are supposed to properly
> >> suspend and disable RPM to cut off power and reset state that was
> >> altered by the driver's resume callback? What we're missing? Is Ulf's
> >> suggestion acceptable?
> >>
> >> The RPM state of a device is getting reset on driver's removal, hence
> >> all refcounts that were bumped by the rpm-resume callback of the device
> >> driver will be screwed up if device is kept resumed after removal. I
> >> just verified that it's true in practice.
> >
> > Note that, what makes the Tegra drivers a bit special is that they are
> > always built with CONFIG_PM being set (selected from the "SoC"
> > Kconfig).
> >
> > Therefore, pm_runtime_force_suspend() can work for some of these
> > cases. Using this, would potentially avoid the driver from having to
> > runtime resume the device in ->remove(), according to the below
> > generic sequence, which is used in many drivers.
> >
> > pm_runtime_get_sync()
> > clk_disable_unprepare() (+ additional things to turn off the device)
> > pm_runtime_disable()
> > pm_runtime_put_noidle()
>
> It's not a problem to change this patchset. The problem is that if
> you'll grep mainline for 'pm_runtime_disable', you will find that there
> are a lot of drivers in a potential trouble.

Let's start by fixing this patchset, please - then we can consider
what to do with the other cases separately.

>
> I'm proposing that we should change pm_runtime_disable() to perform the
> syncing with this oneliner:
>
> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
> index ec94049442b9..5c9f28165824 100644
> --- a/drivers/base/power/runtime.c
> +++ b/drivers/base/power/runtime.c
> @@ -1380,6 +1380,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_barrier);
>   */
>  void __pm_runtime_disable(struct device *dev, bool check_resume)
>  {
> +       flush_work(&dev->power.work);
> +

What about the latency this may introduce? I am not sure that is
acceptable here!?

>         spin_lock_irq(&dev->power.lock);
>
>         if (dev->power.disable_depth > 0) {
>
> Objections?
>
> The sysfs rpm-forbid is a separate problem and it's less troublesome
> since it requires root privileges. It's also not something that
> userspace touches casually. For now I don't know what could be done
> about it.

As I said, the common method to address this problem is to run the
following sequence:

pm_runtime_get_sync()
"power off the device"
pm_runtime_disable()
pm_runtime_put_noidle()

This works even if user space, via sysfs, has triggered a call to
pm_runtime_forbid(). Or doesn't it?

If you don't like it, pm_runtime_force_suspend() should work too, at
least for your cases, I believe.

Kind regards
Uffe
Ulf Hansson Oct. 5, 2021, 1:10 p.m. UTC | #29
On Sat, 2 Oct 2021 at 22:44, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 15:32, Ulf Hansson пишет:
> >> +static __maybe_unused int tegra_clock_pm_suspend(struct device *dev)
> >> +{
> >> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
> >> +
> >> +       /*
> >> +        * Power management of the clock is entangled with the Tegra PMC
> >> +        * GENPD because PMC driver enables/disables clocks for toggling
> >> +        * of the PD's on/off state.
> >> +        *
> >> +        * The PMC GENPD is resumed in NOIRQ phase, before RPM of the clocks
> >> +        * becomes available, hence PMC can't use clocks at the early resume
> >> +        * phase if RPM is involved. For example when 3d clock is enabled,
> >> +        * it may enable the parent PLL clock that needs to be RPM-resumed.
> >> +        *
> >> +        * Secondly, the PLL clocks may be enabled by the low level suspend
> >> +        * code, so we need to assume that PLL is in enabled state during
> >> +        * suspend.
> >> +        *
> >> +        * We will keep PLLs and system clock resumed during suspend time.
> >> +        * All PLLs on all SoCs are low power and system clock is always-on,
> >> +        * so practically not much is changed here.
> >> +        */
> >> +
> >> +       return clk_prepare(clk_dev->hw->clk);
> > I am trying to understand, more exactly, what you intend to achieve
> > with the clk_prepare() here. It looks a bit weird, to me. Can you try
> > to elaborate a bit more on the use case?
>
> The Tegra GENPD driver enable/disable clocks when domain is turned on.

Okay. I noticed that in tegra_genpd_power_on(). And the same clocks
are enabled/disabled also in tegra_genpd_power_off(), when powering
off the PM domain.

So I guess the problem kind of exists for tegra_genpd_power_off() too?

> This can't be done during early system resume, when domains are getting
> turned on by the drivers core, because when clock is enabled, it's
> getting prepared (RPM-resumed) and this preparation fails because
> performance state of the clock goes up and it doesn't work during the
> early resume time since I2C, which applies the state to hardware, is
> suspended and can't work at that early time.

This sounds complicated and I still don't quite follow all of it, sorry.

So, tegra_genpd_power_on() gets called from genpd_resume_noirq(), when
the first device of the attached devices to genpd gets resumed. And
vice versa for tegra_genpd_power_off() and genpd_suspend_noirq().

Are you saying that trying to enable/disable clocks from
tegra_genpd_power_on|off() in these paths doesn't work, because it
would also require the performance state to be changed, which would
fail because the I2C bus/driver is suspended?

>
> Secondly, Tegra has arch-specific low level assembly which touches
> clocks during last phase of system suspend and in the beginning of
> resume. Hence, clocks should stay prepared during suspend just because
> technically clock should be prepared before it can be enabled.

So the low level code is gating and ungating the clock behind the back
of the clock driver then? Why is that done like that, more exactly?

>
> > Is this rather about making sure that the clock's corresponding PM
> > domain stays powered on during system suspend? In that case, I think
> > there may be an alternative option....
> >
>
> This is not about domain staying powered on, this is about keeping the
> performance state of the domain high during suspend.

Right, so the PM domain managed in tegra_genpd_power_on|off() can
still be powered on/off, as long as the clock remains ungated?

Kind regards
Uffe
Dmitry Osipenko Oct. 5, 2021, 5:16 p.m. UTC | #30
...
>> It's not a problem to change this patchset. The problem is that if
>> you'll grep mainline for 'pm_runtime_disable', you will find that there
>> are a lot of drivers in a potential trouble.
> 
> Let's start by fixing this patchset, please - then we can consider
> what to do with the other cases separately.

Yeah, should be better to discuss it separately.

...
>>  void __pm_runtime_disable(struct device *dev, bool check_resume)
>>  {
>> +       flush_work(&dev->power.work);
>> +
> 
> What about the latency this may introduce? I am not sure that is
> acceptable here!?

I'm not aware about any code which relies on the original 'cancelling'
behaviour, perhaps Rafael should have more insight.

...
>> The sysfs rpm-forbid is a separate problem and it's less troublesome
>> since it requires root privileges. It's also not something that
>> userspace touches casually. For now I don't know what could be done
>> about it.
> 
> As I said, the common method to address this problem is to run the
> following sequence:
> 
> pm_runtime_get_sync()
> "power off the device"
> pm_runtime_disable()
> pm_runtime_put_noidle()
> 
> This works even if user space, via sysfs, has triggered a call to
> pm_runtime_forbid(). Or doesn't it?
> 
> If you don't like it, pm_runtime_force_suspend() should work too, at
> least for your cases, I believe.

I'll update the patches, thank you.
Dmitry Osipenko Oct. 5, 2021, 10:19 p.m. UTC | #31
05.10.2021 16:10, Ulf Hansson пишет:
> On Sat, 2 Oct 2021 at 22:44, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 01.10.2021 15:32, Ulf Hansson пишет:
>>>> +static __maybe_unused int tegra_clock_pm_suspend(struct device *dev)
>>>> +{
>>>> +       struct tegra_clk_device *clk_dev = dev_get_drvdata(dev);
>>>> +
>>>> +       /*
>>>> +        * Power management of the clock is entangled with the Tegra PMC
>>>> +        * GENPD because PMC driver enables/disables clocks for toggling
>>>> +        * of the PD's on/off state.
>>>> +        *
>>>> +        * The PMC GENPD is resumed in NOIRQ phase, before RPM of the clocks
>>>> +        * becomes available, hence PMC can't use clocks at the early resume
>>>> +        * phase if RPM is involved. For example when 3d clock is enabled,
>>>> +        * it may enable the parent PLL clock that needs to be RPM-resumed.
>>>> +        *
>>>> +        * Secondly, the PLL clocks may be enabled by the low level suspend
>>>> +        * code, so we need to assume that PLL is in enabled state during
>>>> +        * suspend.
>>>> +        *
>>>> +        * We will keep PLLs and system clock resumed during suspend time.
>>>> +        * All PLLs on all SoCs are low power and system clock is always-on,
>>>> +        * so practically not much is changed here.
>>>> +        */
>>>> +
>>>> +       return clk_prepare(clk_dev->hw->clk);
>>> I am trying to understand, more exactly, what you intend to achieve
>>> with the clk_prepare() here. It looks a bit weird, to me. Can you try
>>> to elaborate a bit more on the use case?
>>
>> The Tegra GENPD driver enable/disable clocks when domain is turned on.
> 
> Okay. I noticed that in tegra_genpd_power_on(). And the same clocks
> are enabled/disabled also in tegra_genpd_power_off(), when powering
> off the PM domain.
> 
> So I guess the problem kind of exists for tegra_genpd_power_off() too?

Both OFF/ON are affected by the same problem. If domain was already
turned OFF before genpd_suspend_noirq(), then the OFF problem isn't visible.

I reproduced the OFF problem by removing the clk prepare/unprepare from
the suspend/resume of the clk driver and making some extra changes to
clock tree topology and etc to trigger the problem on Nexus 7.

tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13

I happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
-> GENPD -> I2C -> runtime-pm.

-13 is EACCES, it comes from the runtime PM of I2C device. RPM is
prohibited/disabled during late (NOIRQ) suspend by the drivers core.

>> This can't be done during early system resume, when domains are getting
>> turned on by the drivers core, because when clock is enabled, it's
>> getting prepared (RPM-resumed) and this preparation fails because
>> performance state of the clock goes up and it doesn't work during the
>> early resume time since I2C, which applies the state to hardware, is
>> suspended and can't work at that early time.
> 
> This sounds complicated and I still don't quite follow all of it, sorry.
> 
> So, tegra_genpd_power_on() gets called from genpd_resume_noirq(), when
> the first device of the attached devices to genpd gets resumed. And
> vice versa for tegra_genpd_power_off() and genpd_suspend_noirq().
> 
> Are you saying that trying to enable/disable clocks from
> tegra_genpd_power_on|off() in these paths doesn't work, because it
> would also require the performance state to be changed, which would
> fail because the I2C bus/driver is suspended?

Yes, but it's actually not I2C bus/driver that is suspended, it's
runtime PM that is unavailable during NOIRQ. The I2C driver itself is
suspended after domains are turned OFF and resumed before they are
enabled. It's just runtime PM API that is unavailable. I'm wondering if
this could be changed.

I'm also wondering if we could add some 'was_enabled' flag to GENPDs,
setting it by genpd_suspend_noirq() for the enabled domains, and then
powering-on GENPDs from genpd_resume_noirq() only if they were in the
enabled state during genpd_suspend_noirq() time. It actually puzzled me
for a quite long time why GENPD core enables domains unconditionally
during early resume. This should solve a part of the problem and it
makes suspend/resume a bit safer because there is a smaller chance to
crash hardware during suspend, at least it's easier to debug.

>> Secondly, Tegra has arch-specific low level assembly which touches
>> clocks during last phase of system suspend and in the beginning of
>> resume. Hence, clocks should stay prepared during suspend just because
>> technically clock should be prepared before it can be enabled.
> 
> So the low level code is gating and ungating the clock behind the back
> of the clock driver then? Why is that done like that, more exactly?

I revisited that code again, and it shouldn't touch the clocks.
I changed that code to not toggle the clocks [1] sometime ago, but
forgot about it.

[1] https://git.kernel.org/linus/680ae4452

>>> Is this rather about making sure that the clock's corresponding PM
>>> domain stays powered on during system suspend? In that case, I think
>>> there may be an alternative option....
>>>
>>
>> This is not about domain staying powered on, this is about keeping the
>> performance state of the domain high during suspend.
> 
> Right, so the PM domain managed in tegra_genpd_power_on|off() can
> still be powered on/off, as long as the clock remains ungated?

Not ungated, but prepared.
Dmitry Osipenko Oct. 5, 2021, 10:43 p.m. UTC | #32
06.10.2021 01:19, Dmitry Osipenko пишет:
...
> I reproduced the OFF problem by removing the clk prepare/unprepare from
> the suspend/resume of the clk driver and making some extra changes to
> clock tree topology and etc to trigger the problem on Nexus 7.
> 
> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
> 
> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
> -> GENPD -> I2C -> runtime-pm.
> 
> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
> prohibited/disabled during late (NOIRQ) suspend by the drivers core.

My bad, I double-checked and it's not I2C RPM that is failing now, but
the clock's RPM [1], which is also unavailable during NOIRQ.

[1]
https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116

Previously it was I2C RPM that was failing in a similar way, but code
changed a tad since that time.
Dmitry Osipenko Oct. 6, 2021, 2:40 a.m. UTC | #33
06.10.2021 01:43, Dmitry Osipenko пишет:
> 06.10.2021 01:19, Dmitry Osipenko пишет:
> ...
>> I reproduced the OFF problem by removing the clk prepare/unprepare from
>> the suspend/resume of the clk driver and making some extra changes to
>> clock tree topology and etc to trigger the problem on Nexus 7.
>>
>> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
>>
>> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
>> -> GENPD -> I2C -> runtime-pm.
>>
>> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
>> prohibited/disabled during late (NOIRQ) suspend by the drivers core.
> 
> My bad, I double-checked and it's not I2C RPM that is failing now, but
> the clock's RPM [1], which is also unavailable during NOIRQ.
> 
> [1]
> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
> 
> Previously it was I2C RPM that was failing in a similar way, but code
> changed a tad since that time.
> 

Just in case, I checked that the suspension order isn't somehow the
source of the problem by adding links to device tree in order to always
suspend clocks after the rest of devices and still GENPD gets -EACCESS
from clk_pm_runtime_get().

RPM is disabled by dpm_suspend_late(), which is invoked before
dpm_suspend_noirq() [1]. Hence RPM is unavailable in NOIRQ phase in any
case.

[1]
https://elixir.bootlin.com/linux/v5.15-rc4/source/kernel/power/suspend.c#L399
Ulf Hansson Oct. 6, 2021, 12:43 p.m. UTC | #34
On Wed, 6 Oct 2021 at 00:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 06.10.2021 01:19, Dmitry Osipenko пишет:
> ...
> > I reproduced the OFF problem by removing the clk prepare/unprepare from
> > the suspend/resume of the clk driver and making some extra changes to
> > clock tree topology and etc to trigger the problem on Nexus 7.
> >
> > tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
> >
> > It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
> > -> GENPD -> I2C -> runtime-pm.
> >
> > -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
> > prohibited/disabled during late (NOIRQ) suspend by the drivers core.
>
> My bad, I double-checked and it's not I2C RPM that is failing now, but
> the clock's RPM [1], which is also unavailable during NOIRQ.

Yes, that sounds reasonable.

You would then need a similar patch for the tegra clock driver as I
suggested for tegra I2C driver. That should solve the problem, I
think.

>
> [1]
> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
>
> Previously it was I2C RPM that was failing in a similar way, but code
> changed a tad since that time.

Alright. In any case, as long as the devices gets suspended in the
correct order, I think it should be fine to cook a patch along the
lines of what I suggest for the I2C driver as well.

It should work, I think. Although, maybe you want to avoid runtime
resuming the I2C device, unless it's the device belonging to the PMIC
interface, if there is a way to distinguish that for the driver.

Kind regards
Uffe
Dmitry Osipenko Oct. 6, 2021, 9:14 p.m. UTC | #35
06.10.2021 15:43, Ulf Hansson пишет:
> On Wed, 6 Oct 2021 at 00:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 06.10.2021 01:19, Dmitry Osipenko пишет:
>> ...
>>> I reproduced the OFF problem by removing the clk prepare/unprepare from
>>> the suspend/resume of the clk driver and making some extra changes to
>>> clock tree topology and etc to trigger the problem on Nexus 7.
>>>
>>> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
>>>
>>> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
>>> -> GENPD -> I2C -> runtime-pm.
>>>
>>> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
>>> prohibited/disabled during late (NOIRQ) suspend by the drivers core.
>>
>> My bad, I double-checked and it's not I2C RPM that is failing now, but
>> the clock's RPM [1], which is also unavailable during NOIRQ.
> 
> Yes, that sounds reasonable.
> 
> You would then need a similar patch for the tegra clock driver as I
> suggested for tegra I2C driver. That should solve the problem, I
> think.
> 
>>
>> [1]
>> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
>>
>> Previously it was I2C RPM that was failing in a similar way, but code
>> changed a tad since that time.
> 
> Alright. In any case, as long as the devices gets suspended in the
> correct order, I think it should be fine to cook a patch along the
> lines of what I suggest for the I2C driver as well.
> 
> It should work, I think. Although, maybe you want to avoid runtime
> resuming the I2C device, unless it's the device belonging to the PMIC
> interface, if there is a way to distinguish that for the driver.

Ulf, thank you very much for the suggestions! I was thinking about this
all once again and concluded that the simplest variant will be to just
remove the suspend ops from the clk driver since neither of PLLs require
high voltage. We now have voltage bumped to a nominal level during
suspend by Tegra's regulator-coupler driver and it's much higher than
voltage needed by PLLs. So the problem I was trying to work around
doesn't really exist anymore.
Dmitry Osipenko Oct. 6, 2021, 9:20 p.m. UTC | #36
06.10.2021 15:38, Ulf Hansson пишет:
> In principle what you ask for, is if we can avoid calling
> __pm_runtime_disable() in __device_suspend_late() (and vice versa in
> device_resume_early()).
> 
> I think the short answer is no, at least from a generic point of view.
> Maybe we can figure out a way to allow this on a per device basis, as
> an opt-in solution. I am not sure what Rafael would think about this,
> let's see.
> 
> Another option to address the problem is already available to use for
> these kinds of cases. This would be to add also a pair of
> ->suspend|resume() callbacks to I2C driver. Along the lines of the
> below.
> 
> diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
> index c883044715f3..589bf872ab25 100644
> --- a/drivers/i2c/busses/i2c-tegra.c
> +++ b/drivers/i2c/busses/i2c-tegra.c
> @@ -1918,6 +1918,7 @@ static int __maybe_unused
> tegra_i2c_resume(struct device *dev)
>  }
> 
>  static const struct dev_pm_ops tegra_i2c_pm = {
> +       SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_put_noidle, pm_runtime_get_sync)
>         SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(tegra_i2c_suspend, tegra_i2c_resume)
>         SET_RUNTIME_PM_OPS(tegra_i2c_runtime_suspend, tegra_i2c_runtime_resume,
>                            NULL)
> 
> In this way, the device would already be runtime resumed, if there is
> call to pm_runtime_get_sync() from the clock framework due to the
> clk_prepare|unprepare() being called. If that also turns out to happen
> *after* runtime PM has been disabled for the device, the call to
> pm_runtime_get_sync() would still succeed (returning 1, see
> rpm_resume()), rather than a negative error code.
> 
> Yes, we may end up runtime resuming the device during system suspend,
> even if it turns out not to be needed. Although, that doesn't seem to
> be the case for the Tegra I2C driver, right?

Tegra I2C will turn off clocks on suspend regardless of RPM state.
Overall, it's a plausible solution, thank you!

As I said in the other reply, I'll simply remove the suspend ops from
clk driver, they are not needed anymore. The problem is gone.
Dmitry Osipenko Oct. 6, 2021, 9:25 p.m. UTC | #37
06.10.2021 15:38, Ulf Hansson пишет:
>> I'm also wondering if we could add some 'was_enabled' flag to GENPDs,
>> setting it by genpd_suspend_noirq() for the enabled domains, and then
>> powering-on GENPDs from genpd_resume_noirq() only if they were in the
>> enabled state during genpd_suspend_noirq() time. It actually puzzled me
>> for a quite long time why GENPD core enables domains unconditionally
>> during early resume. This should solve a part of the problem and it
>> makes suspend/resume a bit safer because there is a smaller chance to
>> crash hardware during suspend, at least it's easier to debug.
> Just because the PM domain was already off at genpd_suspend_noirq(),
> doesn't mean that it can stay powered off at genpd_resume_noirq(). At
> least as is today.
> 
> The main reason why genpd_resume_noirq() powers on the PM domain, is
> because it's not possible for the consumer drivers to rely on runtime
> PM to do it (because runtime PM has been disabled by the PM core).

At least Tegra doesn't need to have domains force-resumed. This should
be a platform-specific behaviour. We may add a new flag for that, I
suppose. I'll try to keep this in mind for a future improvement. Thank
you for the clarification.
Dmitry Osipenko Oct. 6, 2021, 10:01 p.m. UTC | #38
07.10.2021 00:14, Dmitry Osipenko пишет:
> 06.10.2021 15:43, Ulf Hansson пишет:
>> On Wed, 6 Oct 2021 at 00:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>
>>> 06.10.2021 01:19, Dmitry Osipenko пишет:
>>> ...
>>>> I reproduced the OFF problem by removing the clk prepare/unprepare from
>>>> the suspend/resume of the clk driver and making some extra changes to
>>>> clock tree topology and etc to trigger the problem on Nexus 7.
>>>>
>>>> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
>>>>
>>>> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
>>>> -> GENPD -> I2C -> runtime-pm.
>>>>
>>>> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
>>>> prohibited/disabled during late (NOIRQ) suspend by the drivers core.
>>>
>>> My bad, I double-checked and it's not I2C RPM that is failing now, but
>>> the clock's RPM [1], which is also unavailable during NOIRQ.
>>
>> Yes, that sounds reasonable.
>>
>> You would then need a similar patch for the tegra clock driver as I
>> suggested for tegra I2C driver. That should solve the problem, I
>> think.
>>
>>>
>>> [1]
>>> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
>>>
>>> Previously it was I2C RPM that was failing in a similar way, but code
>>> changed a tad since that time.
>>
>> Alright. In any case, as long as the devices gets suspended in the
>> correct order, I think it should be fine to cook a patch along the
>> lines of what I suggest for the I2C driver as well.
>>
>> It should work, I think. Although, maybe you want to avoid runtime
>> resuming the I2C device, unless it's the device belonging to the PMIC
>> interface, if there is a way to distinguish that for the driver.
> 
> Ulf, thank you very much for the suggestions! I was thinking about this
> all once again and concluded that the simplest variant will be to just
> remove the suspend ops from the clk driver since neither of PLLs require
> high voltage. We now have voltage bumped to a nominal level during
> suspend by Tegra's regulator-coupler driver and it's much higher than
> voltage needed by PLLs. So the problem I was trying to work around
> doesn't really exist anymore.

I hurried a bit with the conclusion, keep forgetting that I need to
change the clock tree in order to test it all properly :/ It's not fixed
yet.
Dmitry Osipenko Oct. 6, 2021, 10:03 p.m. UTC | #39
06.10.2021 15:38, Ulf Hansson пишет:
>>> Right, so the PM domain managed in tegra_genpd_power_on|off() can
>>> still be powered on/off, as long as the clock remains ungated?
>> Not ungated, but prepared.
> Okay, thanks for clarifying!
> 
> In summary, it sounds like you should be able to fix this problem in
> the I2C driver as I suggested above. If that works, that seems much
> better.

I'll try this variant, thank you.

> Moreover, it would leave the clocks gated/unprepared when the system
> is fully suspended, which I guess is better from an energy point of
> view?

The clocks are kept gated, it wasn't a problem. The problem was that
clocks were needed to be enabled temporarily. In order to enable a
clock, it needs to be prepared first. When clock is prepared, it resumes
clock's device RPM.

Keeping clocks prepared shouldn't make a noticeable difference from the
energy POV since clocks are gated. It's only voltage that is kept high,
but we need to keep it high during suspend anyways in order to resume
successfully. The hardware is mostly gated during suspend, depending on
suspend mode, so the power consumption difference is negligible. At
least I haven't seen any problems, battery doesn't drain during suspend.
Dmitry Osipenko Oct. 6, 2021, 11:21 p.m. UTC | #40
07.10.2021 01:01, Dmitry Osipenko пишет:
> 07.10.2021 00:14, Dmitry Osipenko пишет:
>> 06.10.2021 15:43, Ulf Hansson пишет:
>>> On Wed, 6 Oct 2021 at 00:43, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> 06.10.2021 01:19, Dmitry Osipenko пишет:
>>>> ...
>>>>> I reproduced the OFF problem by removing the clk prepare/unprepare from
>>>>> the suspend/resume of the clk driver and making some extra changes to
>>>>> clock tree topology and etc to trigger the problem on Nexus 7.
>>>>>
>>>>> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
>>>>>
>>>>> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
>>>>> -> GENPD -> I2C -> runtime-pm.
>>>>>
>>>>> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
>>>>> prohibited/disabled during late (NOIRQ) suspend by the drivers core.
>>>>
>>>> My bad, I double-checked and it's not I2C RPM that is failing now, but
>>>> the clock's RPM [1], which is also unavailable during NOIRQ.
>>>
>>> Yes, that sounds reasonable.
>>>
>>> You would then need a similar patch for the tegra clock driver as I
>>> suggested for tegra I2C driver. That should solve the problem, I
>>> think.
>>>
>>>>
>>>> [1]
>>>> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
>>>>
>>>> Previously it was I2C RPM that was failing in a similar way, but code
>>>> changed a tad since that time.
>>>
>>> Alright. In any case, as long as the devices gets suspended in the
>>> correct order, I think it should be fine to cook a patch along the
>>> lines of what I suggest for the I2C driver as well.
>>>
>>> It should work, I think. Although, maybe you want to avoid runtime
>>> resuming the I2C device, unless it's the device belonging to the PMIC
>>> interface, if there is a way to distinguish that for the driver.
>>
>> Ulf, thank you very much for the suggestions! I was thinking about this
>> all once again and concluded that the simplest variant will be to just
>> remove the suspend ops from the clk driver since neither of PLLs require
>> high voltage. We now have voltage bumped to a nominal level during
>> suspend by Tegra's regulator-coupler driver and it's much higher than
>> voltage needed by PLLs. So the problem I was trying to work around
>> doesn't really exist anymore.
> 
> I hurried a bit with the conclusion, keep forgetting that I need to
> change the clock tree in order to test it all properly :/ It's not fixed
> yet.
> 

Please let me iterate once again. The problem we currently have is that
clock may be enabled during NOIRQ time. In order to enable clock, it
needs to be prepared. In order to prepare clock, the clock's device
needs to be runtime-resumed. The runtime PM is unavailable at the NOIRQ
time.

To solve this problem we need to prepare clock beforehand.

The clock will stay prepared during suspend, but this is not a problem
since all the clocks we care about don't require high voltage and
voltage is guaranteed to be bumped high during suspend by Tegra's
regulator-coupler driver anyways.

So everything we need to do is to keep clocks prepared. There are two
options how to do that:

[1] this patch which explicitly prepares clocks using clk API.

[2] Use runtime PM API, like this:

static const struct dev_pm_ops tegra_clock_pm = {
	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_resume_and_get, pm_runtime_put)
};

Ulf, are you now okay with the current variant [1] of the patch or you
prefer the second [2] option more?
Ulf Hansson Oct. 7, 2021, 9:18 a.m. UTC | #41
On Thu, 7 Oct 2021 at 01:21, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 07.10.2021 01:01, Dmitry Osipenko пишет:
> > 07.10.2021 00:14, Dmitry Osipenko пишет:
> >> 06.10.2021 15:43, Ulf Hansson пишет:
> >>> On Wed, 6 Oct 2021 at 00:43, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> 06.10.2021 01:19, Dmitry Osipenko пишет:
> >>>> ...
> >>>>> I reproduced the OFF problem by removing the clk prepare/unprepare from
> >>>>> the suspend/resume of the clk driver and making some extra changes to
> >>>>> clock tree topology and etc to trigger the problem on Nexus 7.
> >>>>>
> >>>>> tegra-pmc 7000e400.pmc: failed to turn off PM domain heg: -13
> >>>>>
> >>>>> It happens from genpd_suspend_noirq() -> tegra_genpd_power_off() -> clk
> >>>>> -> GENPD -> I2C -> runtime-pm.
> >>>>>
> >>>>> -13 is EACCES, it comes from the runtime PM of I2C device. RPM is
> >>>>> prohibited/disabled during late (NOIRQ) suspend by the drivers core.
> >>>>
> >>>> My bad, I double-checked and it's not I2C RPM that is failing now, but
> >>>> the clock's RPM [1], which is also unavailable during NOIRQ.
> >>>
> >>> Yes, that sounds reasonable.
> >>>
> >>> You would then need a similar patch for the tegra clock driver as I
> >>> suggested for tegra I2C driver. That should solve the problem, I
> >>> think.
> >>>
> >>>>
> >>>> [1]
> >>>> https://elixir.free-electrons.com/linux/v5.15-rc4/source/drivers/clk/clk.c#L116
> >>>>
> >>>> Previously it was I2C RPM that was failing in a similar way, but code
> >>>> changed a tad since that time.
> >>>
> >>> Alright. In any case, as long as the devices gets suspended in the
> >>> correct order, I think it should be fine to cook a patch along the
> >>> lines of what I suggest for the I2C driver as well.
> >>>
> >>> It should work, I think. Although, maybe you want to avoid runtime
> >>> resuming the I2C device, unless it's the device belonging to the PMIC
> >>> interface, if there is a way to distinguish that for the driver.
> >>
> >> Ulf, thank you very much for the suggestions! I was thinking about this
> >> all once again and concluded that the simplest variant will be to just
> >> remove the suspend ops from the clk driver since neither of PLLs require
> >> high voltage. We now have voltage bumped to a nominal level during
> >> suspend by Tegra's regulator-coupler driver and it's much higher than
> >> voltage needed by PLLs. So the problem I was trying to work around
> >> doesn't really exist anymore.
> >
> > I hurried a bit with the conclusion, keep forgetting that I need to
> > change the clock tree in order to test it all properly :/ It's not fixed
> > yet.
> >
>
> Please let me iterate once again. The problem we currently have is that
> clock may be enabled during NOIRQ time. In order to enable clock, it
> needs to be prepared. In order to prepare clock, the clock's device
> needs to be runtime-resumed. The runtime PM is unavailable at the NOIRQ
> time.
>
> To solve this problem we need to prepare clock beforehand.
>
> The clock will stay prepared during suspend, but this is not a problem
> since all the clocks we care about don't require high voltage and
> voltage is guaranteed to be bumped high during suspend by Tegra's
> regulator-coupler driver anyways.
>
> So everything we need to do is to keep clocks prepared. There are two
> options how to do that:
>
> [1] this patch which explicitly prepares clocks using clk API.
>
> [2] Use runtime PM API, like this:
>
> static const struct dev_pm_ops tegra_clock_pm = {
>         SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_resume_and_get, pm_runtime_put)
> };
>
> Ulf, are you now okay with the current variant [1] of the patch or you
> prefer the second [2] option more?

I prefer option [2]. The clock_prepare|unprepare() thingy in option
[1], looks more like an odd workaround to me.

Does that make sense to you as well?

Kind regards
Uffe
Dmitry Osipenko Oct. 7, 2021, 10:36 a.m. UTC | #42
07.10.2021 12:18, Ulf Hansson пишет:
>> Please let me iterate once again. The problem we currently have is that
>> clock may be enabled during NOIRQ time. In order to enable clock, it
>> needs to be prepared. In order to prepare clock, the clock's device
>> needs to be runtime-resumed. The runtime PM is unavailable at the NOIRQ
>> time.
>>
>> To solve this problem we need to prepare clock beforehand.
>>
>> The clock will stay prepared during suspend, but this is not a problem
>> since all the clocks we care about don't require high voltage and
>> voltage is guaranteed to be bumped high during suspend by Tegra's
>> regulator-coupler driver anyways.
>>
>> So everything we need to do is to keep clocks prepared. There are two
>> options how to do that:
>>
>> [1] this patch which explicitly prepares clocks using clk API.
>>
>> [2] Use runtime PM API, like this:
>>
>> static const struct dev_pm_ops tegra_clock_pm = {
>>         SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_resume_and_get, pm_runtime_put)
>> };
>>
>> Ulf, are you now okay with the current variant [1] of the patch or you
>> prefer the second [2] option more?
> I prefer option [2]. The clock_prepare|unprepare() thingy in option
> [1], looks more like an odd workaround to me.
> 
> Does that make sense to you as well?

I don't have a strong preference since both variants give the same
effect. I'll keep testing option [2] and will use it in the next version
if no problem will be found. Thank you!
Dmitry Osipenko Oct. 16, 2021, 3:36 p.m. UTC | #43
01.10.2021 16:27, Ulf Hansson пишет:
> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> Add OPP and SoC core voltage scaling support to the display controller
>> driver. This is required for enabling system-wide DVFS on pre-Tegra186
>> SoCs.
>>
>> Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30
>> Tested-by: Paul Fertser <fercerpav@gmail.com> # PAZ00 T20
>> Tested-by: Nicolas Chauvet <kwizart@gmail.com> # PAZ00 T20 and TK1 T124
>> Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30
>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>> ---
>>  drivers/gpu/drm/tegra/dc.c | 74 ++++++++++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/tegra/dc.h |  2 ++
>>  2 files changed, 76 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
>> index a29d64f87563..d4047a14e2b6 100644
>> --- a/drivers/gpu/drm/tegra/dc.c
>> +++ b/drivers/gpu/drm/tegra/dc.c
>> @@ -11,9 +11,12 @@
>>  #include <linux/interconnect.h>
>>  #include <linux/module.h>
>>  #include <linux/of_device.h>
>> +#include <linux/pm_domain.h>
>> +#include <linux/pm_opp.h>
>>  #include <linux/pm_runtime.h>
>>  #include <linux/reset.h>
>>
>> +#include <soc/tegra/common.h>
>>  #include <soc/tegra/pmc.h>
>>
>>  #include <drm/drm_atomic.h>
>> @@ -1762,6 +1765,47 @@ int tegra_dc_state_setup_clock(struct tegra_dc *dc,
>>         return 0;
>>  }
>>
>> +static void tegra_dc_update_voltage_state(struct tegra_dc *dc,
>> +                                         struct tegra_dc_state *state)
>> +{
>> +       unsigned long rate, pstate;
>> +       struct dev_pm_opp *opp;
>> +       int err;
>> +
>> +       if (!dc->has_opp_table)
>> +               return;
>> +
>> +       /* calculate actual pixel clock rate which depends on internal divider */
>> +       rate = DIV_ROUND_UP(clk_get_rate(dc->clk) * 2, state->div + 2);
>> +
>> +       /* find suitable OPP for the rate */
>> +       opp = dev_pm_opp_find_freq_ceil(dc->dev, &rate);
>> +
>> +       if (opp == ERR_PTR(-ERANGE))
>> +               opp = dev_pm_opp_find_freq_floor(dc->dev, &rate);
>> +
>> +       if (IS_ERR(opp)) {
>> +               dev_err(dc->dev, "failed to find OPP for %luHz: %pe\n",
>> +                       rate, opp);
>> +               return;
>> +       }
>> +
>> +       pstate = dev_pm_opp_get_required_pstate(opp, 0);
>> +       dev_pm_opp_put(opp);
>> +
>> +       /*
>> +        * The minimum core voltage depends on the pixel clock rate (which
>> +        * depends on internal clock divider of the CRTC) and not on the
>> +        * rate of the display controller clock. This is why we're not using
>> +        * dev_pm_opp_set_rate() API and instead controlling the power domain
>> +        * directly.
>> +        */
>> +       err = dev_pm_genpd_set_performance_state(dc->dev, pstate);
>> +       if (err)
>> +               dev_err(dc->dev, "failed to set power domain state to %lu: %d\n",
>> +                       pstate, err);
> 
> Yeah, the above code looks very similar to the code I pointed to in
> patch6. Perhaps we need to discuss with Viresh, whether it makes sense
> to fold in a patch adding an opp helper function after all, to avoid
> the open coding.
> 
> Viresh?

I'll keep it open-coded for now. This code is specific to Tegra because
normally ceil error shouldn't fall back to the floor, but for Tegra it's
expected to happen and it's a normal condition.
Dmitry Osipenko Oct. 17, 2021, 8:38 a.m. UTC | #44
01.10.2021 18:01, Ulf Hansson пишет:
> On Fri, 1 Oct 2021 at 16:35, Dmitry Osipenko <digetx@gmail.com> wrote:
>>
>> 01.10.2021 17:24, Ulf Hansson пишет:
>>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
>>>>
>>>> The NAND on Tegra belongs to the core power domain and we're going to
>>>> enable GENPD support for the core domain. Now NAND must be resumed using
>>>> runtime PM API in order to initialize the NAND power state. Add runtime PM
>>>> and OPP support to the NAND driver.
>>>>
>>>> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
>>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
>>>> ---
>>>>  drivers/mtd/nand/raw/tegra_nand.c | 55 ++++++++++++++++++++++++++-----
>>>>  1 file changed, 47 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/mtd/nand/raw/tegra_nand.c b/drivers/mtd/nand/raw/tegra_nand.c
>>>> index 32431bbe69b8..098fcc9cb9df 100644
>>>> --- a/drivers/mtd/nand/raw/tegra_nand.c
>>>> +++ b/drivers/mtd/nand/raw/tegra_nand.c
>>>> @@ -17,8 +17,11 @@
>>>>  #include <linux/mtd/rawnand.h>
>>>>  #include <linux/of.h>
>>>>  #include <linux/platform_device.h>
>>>> +#include <linux/pm_runtime.h>
>>>>  #include <linux/reset.h>
>>>>
>>>> +#include <soc/tegra/common.h>
>>>> +
>>>>  #define COMMAND                                        0x00
>>>>  #define   COMMAND_GO                           BIT(31)
>>>>  #define   COMMAND_CLE                          BIT(30)
>>>> @@ -1151,6 +1154,7 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>>>                 return -ENOMEM;
>>>>
>>>>         ctrl->dev = &pdev->dev;
>>>> +       platform_set_drvdata(pdev, ctrl);
>>>>         nand_controller_init(&ctrl->controller);
>>>>         ctrl->controller.ops = &tegra_nand_controller_ops;
>>>>
>>>> @@ -1166,14 +1170,22 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>>>         if (IS_ERR(ctrl->clk))
>>>>                 return PTR_ERR(ctrl->clk);
>>>>
>>>> -       err = clk_prepare_enable(ctrl->clk);
>>>> +       err = devm_pm_runtime_enable(&pdev->dev);
>>>> +       if (err)
>>>> +               return err;
>>>> +
>>>> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
>>>> +       if (err)
>>>> +               return err;
>>>> +
>>>> +       err = pm_runtime_resume_and_get(&pdev->dev);
>>>>         if (err)
>>>>                 return err;
>>>>
>>>>         err = reset_control_reset(rst);
>>>>         if (err) {
>>>>                 dev_err(ctrl->dev, "Failed to reset HW: %d\n", err);
>>>> -               goto err_disable_clk;
>>>> +               goto err_put_pm;
>>>>         }
>>>>
>>>>         writel_relaxed(HWSTATUS_CMD_DEFAULT, ctrl->regs + HWSTATUS_CMD);
>>>> @@ -1188,21 +1200,19 @@ static int tegra_nand_probe(struct platform_device *pdev)
>>>>                                dev_name(&pdev->dev), ctrl);
>>>>         if (err) {
>>>>                 dev_err(ctrl->dev, "Failed to get IRQ: %d\n", err);
>>>> -               goto err_disable_clk;
>>>> +               goto err_put_pm;
>>>>         }
>>>>
>>>>         writel_relaxed(DMA_MST_CTRL_IS_DONE, ctrl->regs + DMA_MST_CTRL);
>>>>
>>>>         err = tegra_nand_chips_init(ctrl->dev, ctrl);
>>>>         if (err)
>>>> -               goto err_disable_clk;
>>>> -
>>>> -       platform_set_drvdata(pdev, ctrl);
>>>> +               goto err_put_pm;
>>>>
>>>
>>> There is no corresponding call pm_runtime_put() here. Is it
>>> intentional to always leave the device runtime resumed after ->probe()
>>> has succeeded?
>>>
>>> I noticed you included some comments about this for some other
>>> drivers, as those needed more tweaks. Is that also the case for this
>>> driver?
>>
>> Could you please clarify? There is pm_runtime_put() in both probe-error
>> and remove() code paths here.
> 
> I was not considering the error path of ->probe() (or ->remove()), but
> was rather thinking about when ->probe() completes successfully. Then
> you keep the device runtime resumed, because you have called
> pm_runtime_resume_and_get() for it.
> 
> Shouldn't you have a corresponding pm_runtime_put() in ->probe(),
> allowing it to be runtime suspended, until the device is really needed
> later on. No?

This driver doesn't support active power management. I don't have Tegra
hardware that uses NAND storage for testing, so it's up to somebody else
to implement dynamic power management. NAND doesn't require high
voltages, so it's fine to keep the old driver behaviour by keeping
hardware resumed since the probe time.
Ulf Hansson Oct. 19, 2021, 11:40 a.m. UTC | #45
On Sun, 17 Oct 2021 at 10:38, Dmitry Osipenko <digetx@gmail.com> wrote:
>
> 01.10.2021 18:01, Ulf Hansson пишет:
> > On Fri, 1 Oct 2021 at 16:35, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>
> >> 01.10.2021 17:24, Ulf Hansson пишет:
> >>> On Mon, 27 Sept 2021 at 00:42, Dmitry Osipenko <digetx@gmail.com> wrote:
> >>>>
> >>>> The NAND on Tegra belongs to the core power domain and we're going to
> >>>> enable GENPD support for the core domain. Now NAND must be resumed using
> >>>> runtime PM API in order to initialize the NAND power state. Add runtime PM
> >>>> and OPP support to the NAND driver.
> >>>>
> >>>> Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
> >>>> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> >>>> ---
> >>>>  drivers/mtd/nand/raw/tegra_nand.c | 55 ++++++++++++++++++++++++++-----
> >>>>  1 file changed, 47 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/mtd/nand/raw/tegra_nand.c b/drivers/mtd/nand/raw/tegra_nand.c
> >>>> index 32431bbe69b8..098fcc9cb9df 100644
> >>>> --- a/drivers/mtd/nand/raw/tegra_nand.c
> >>>> +++ b/drivers/mtd/nand/raw/tegra_nand.c
> >>>> @@ -17,8 +17,11 @@
> >>>>  #include <linux/mtd/rawnand.h>
> >>>>  #include <linux/of.h>
> >>>>  #include <linux/platform_device.h>
> >>>> +#include <linux/pm_runtime.h>
> >>>>  #include <linux/reset.h>
> >>>>
> >>>> +#include <soc/tegra/common.h>
> >>>> +
> >>>>  #define COMMAND                                        0x00
> >>>>  #define   COMMAND_GO                           BIT(31)
> >>>>  #define   COMMAND_CLE                          BIT(30)
> >>>> @@ -1151,6 +1154,7 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>>>                 return -ENOMEM;
> >>>>
> >>>>         ctrl->dev = &pdev->dev;
> >>>> +       platform_set_drvdata(pdev, ctrl);
> >>>>         nand_controller_init(&ctrl->controller);
> >>>>         ctrl->controller.ops = &tegra_nand_controller_ops;
> >>>>
> >>>> @@ -1166,14 +1170,22 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>>>         if (IS_ERR(ctrl->clk))
> >>>>                 return PTR_ERR(ctrl->clk);
> >>>>
> >>>> -       err = clk_prepare_enable(ctrl->clk);
> >>>> +       err = devm_pm_runtime_enable(&pdev->dev);
> >>>> +       if (err)
> >>>> +               return err;
> >>>> +
> >>>> +       err = devm_tegra_core_dev_init_opp_table_common(&pdev->dev);
> >>>> +       if (err)
> >>>> +               return err;
> >>>> +
> >>>> +       err = pm_runtime_resume_and_get(&pdev->dev);
> >>>>         if (err)
> >>>>                 return err;
> >>>>
> >>>>         err = reset_control_reset(rst);
> >>>>         if (err) {
> >>>>                 dev_err(ctrl->dev, "Failed to reset HW: %d\n", err);
> >>>> -               goto err_disable_clk;
> >>>> +               goto err_put_pm;
> >>>>         }
> >>>>
> >>>>         writel_relaxed(HWSTATUS_CMD_DEFAULT, ctrl->regs + HWSTATUS_CMD);
> >>>> @@ -1188,21 +1200,19 @@ static int tegra_nand_probe(struct platform_device *pdev)
> >>>>                                dev_name(&pdev->dev), ctrl);
> >>>>         if (err) {
> >>>>                 dev_err(ctrl->dev, "Failed to get IRQ: %d\n", err);
> >>>> -               goto err_disable_clk;
> >>>> +               goto err_put_pm;
> >>>>         }
> >>>>
> >>>>         writel_relaxed(DMA_MST_CTRL_IS_DONE, ctrl->regs + DMA_MST_CTRL);
> >>>>
> >>>>         err = tegra_nand_chips_init(ctrl->dev, ctrl);
> >>>>         if (err)
> >>>> -               goto err_disable_clk;
> >>>> -
> >>>> -       platform_set_drvdata(pdev, ctrl);
> >>>> +               goto err_put_pm;
> >>>>
> >>>
> >>> There is no corresponding call pm_runtime_put() here. Is it
> >>> intentional to always leave the device runtime resumed after ->probe()
> >>> has succeeded?
> >>>
> >>> I noticed you included some comments about this for some other
> >>> drivers, as those needed more tweaks. Is that also the case for this
> >>> driver?
> >>
> >> Could you please clarify? There is pm_runtime_put() in both probe-error
> >> and remove() code paths here.
> >
> > I was not considering the error path of ->probe() (or ->remove()), but
> > was rather thinking about when ->probe() completes successfully. Then
> > you keep the device runtime resumed, because you have called
> > pm_runtime_resume_and_get() for it.
> >
> > Shouldn't you have a corresponding pm_runtime_put() in ->probe(),
> > allowing it to be runtime suspended, until the device is really needed
> > later on. No?
>
> This driver doesn't support active power management. I don't have Tegra
> hardware that uses NAND storage for testing, so it's up to somebody else
> to implement dynamic power management. NAND doesn't require high
> voltages, so it's fine to keep the old driver behaviour by keeping
> hardware resumed since the probe time.

Alright, fair enough and thanks for clarifying!

Kind regards
Uffe