Message ID | 20230330094904.2589428-1-cyndis@kapsi.fi |
---|---|
State | Accepted |
Headers | show |
Series | [v2] thermal: tegra-bpmp: Handle offline zones | expand |
On 30/03/2023 12:06, Mikko Perttunen wrote: > On 3/30/23 13:03, Daniel Lezcano wrote: >> On 30/03/2023 11:49, Mikko Perttunen wrote: >>> From: Mikko Perttunen <mperttunen@nvidia.com> >>> >>> Thermal zones located in power domains may not be accessible when >>> the domain is powergated. In this situation, reading the temperature >>> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >>> use -256C as the temperature for offline zones. >> >>> For smooth operation, for offline zones, return -EAGAIN when reading >>> the temperature and allow registration of zones even if they are >>> offline during probe. >> >> I think it makes more sense to check if the power domain associated >> with the device is powered up and if not return -EPROBE_DEFER. > > The power domains in question are related to computer vision engines > that only get powered on when in use, possibly never if the user doesn't > run a computer vision workload on the system. We still want other > thermal zones to be available. Ok, I see the point. I'm worried about the semantic of the errors returned, the translation from BPMP_EFAULT to EAGAIN and the assumption it is a disabled (may be forever) thermal zone. What does the documentation say for the error msg.rx.ret == -BPMP_EFAULT? >>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> >>> --- >>> v2: >>> * Adjusted commit message. >>> * Patch 2/2 dropped for now since it is more controversial, >>> and this patch is more critical. >>> >>> drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> index f5fd4018f72f..4ffc3bb3bf35 100644 >>> --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct >>> tegra_bpmp_thermal_zone *zone, >>> err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); >>> if (err) >>> return err; >>> + if (msg.rx.ret == -BPMP_EFAULT) >>> + return -EAGAIN; >>> if (msg.rx.ret) >>> return -EINVAL; >>> @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct >>> platform_device *pdev) >>> zone->tegra = tegra; >>> err = __tegra_bpmp_thermal_get_temp(zone, &temp); >>> - if (err < 0) { >>> + >>> + /* >>> + * Sensors in powergated domains may temporarily fail to be >>> read >>> + * (-EAGAIN), but will become accessible when the domain is >>> powered on. >>> + */ >>> + if (err < 0 && err != -EAGAIN) { >>> devm_kfree(&pdev->dev, zone); >>> continue; >>> } >> >
On 3/30/23 15:36, Daniel Lezcano wrote: > On 30/03/2023 12:06, Mikko Perttunen wrote: >> On 3/30/23 13:03, Daniel Lezcano wrote: >>> On 30/03/2023 11:49, Mikko Perttunen wrote: >>>> From: Mikko Perttunen <mperttunen@nvidia.com> >>>> >>>> Thermal zones located in power domains may not be accessible when >>>> the domain is powergated. In this situation, reading the temperature >>>> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >>>> use -256C as the temperature for offline zones. >>> >>>> For smooth operation, for offline zones, return -EAGAIN when reading >>>> the temperature and allow registration of zones even if they are >>>> offline during probe. >>> >>> I think it makes more sense to check if the power domain associated >>> with the device is powered up and if not return -EPROBE_DEFER. >> >> The power domains in question are related to computer vision engines >> that only get powered on when in use, possibly never if the user >> doesn't run a computer vision workload on the system. We still want >> other thermal zones to be available. > > Ok, I see the point. > > I'm worried about the semantic of the errors returned, the translation > from BPMP_EFAULT to EAGAIN and the assumption it is a disabled (may be > forever) thermal zone. > > What does the documentation say for the error msg.rx.ret == -BPMP_EFAULT? > The documentation says Value | Description -------------- | ----------------------------------------- 0 | Temperature query succeeded. -#BPMP_EINVAL | Invalid request parameters. -#BPMP_ENOENT | No driver registered for thermal zone. -#BPMP_EFAULT | Problem reading temperature measurement. In practice, what BPMP_EFAULT means here is that the hardware has no indicated temperature for the zone, which really only happens if the power domain is powered off. Mikko > >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> >>>> --- >>>> v2: >>>> * Adjusted commit message. >>>> * Patch 2/2 dropped for now since it is more controversial, >>>> and this patch is more critical. >>>> >>>> drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- >>>> 1 file changed, 8 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> index f5fd4018f72f..4ffc3bb3bf35 100644 >>>> --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct >>>> tegra_bpmp_thermal_zone *zone, >>>> err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); >>>> if (err) >>>> return err; >>>> + if (msg.rx.ret == -BPMP_EFAULT) >>>> + return -EAGAIN; >>>> if (msg.rx.ret) >>>> return -EINVAL; >>>> @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct >>>> platform_device *pdev) >>>> zone->tegra = tegra; >>>> err = __tegra_bpmp_thermal_get_temp(zone, &temp); >>>> - if (err < 0) { >>>> + >>>> + /* >>>> + * Sensors in powergated domains may temporarily fail to be >>>> read >>>> + * (-EAGAIN), but will become accessible when the domain is >>>> powered on. >>>> + */ >>>> + if (err < 0 && err != -EAGAIN) { >>>> devm_kfree(&pdev->dev, zone); >>>> continue; >>>> } >>> >> >
On Thu, Mar 30, 2023 at 12:49:04PM +0300, Mikko Perttunen wrote: > From: Mikko Perttunen <mperttunen@nvidia.com> > > Thermal zones located in power domains may not be accessible when > the domain is powergated. In this situation, reading the temperature > will return -BPMP_EFAULT. When evaluating trips, BPMP will internally > use -256C as the temperature for offline zones. > > For smooth operation, for offline zones, return -EAGAIN when reading > the temperature and allow registration of zones even if they are > offline during probe. > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> > --- > v2: > * Adjusted commit message. > * Patch 2/2 dropped for now since it is more controversial, > and this patch is more critical. > > drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) Acked-by: Thierry Reding <treding@nvidia.com>
On 30/03/2023 11:49, Mikko Perttunen wrote: > From: Mikko Perttunen <mperttunen@nvidia.com> > > Thermal zones located in power domains may not be accessible when > the domain is powergated. In this situation, reading the temperature > will return -BPMP_EFAULT. When evaluating trips, BPMP will internally > use -256C as the temperature for offline zones. > > For smooth operation, for offline zones, return -EAGAIN when reading > the temperature and allow registration of zones even if they are > offline during probe. > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Applied, thanks
On 3/31/23 00:14, Daniel Lezcano wrote: > On 30/03/2023 11:49, Mikko Perttunen wrote: >> From: Mikko Perttunen <mperttunen@nvidia.com> >> >> Thermal zones located in power domains may not be accessible when >> the domain is powergated. In this situation, reading the temperature >> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >> use -256C as the temperature for offline zones. >> >> For smooth operation, for offline zones, return -EAGAIN when reading >> the temperature and allow registration of zones even if they are >> offline during probe. >> >> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> > > Applied, thanks > Thank you! Mikko
diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c b/drivers/thermal/tegra/tegra-bpmp-thermal.c index f5fd4018f72f..4ffc3bb3bf35 100644 --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct tegra_bpmp_thermal_zone *zone, err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); if (err) return err; + if (msg.rx.ret == -BPMP_EFAULT) + return -EAGAIN; if (msg.rx.ret) return -EINVAL; @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct platform_device *pdev) zone->tegra = tegra; err = __tegra_bpmp_thermal_get_temp(zone, &temp); - if (err < 0) { + + /* + * Sensors in powergated domains may temporarily fail to be read + * (-EAGAIN), but will become accessible when the domain is powered on. + */ + if (err < 0 && err != -EAGAIN) { devm_kfree(&pdev->dev, zone); continue; }