Message ID | 20211011165707.138157-1-marcan@marcan.st |
---|---|
Headers | show |
Series | Apple SoC CPU P-state switching | expand |
On 12-10-21, 01:57, Hector Martin wrote: > When required-opps is used in CPU OPP tables, there is no parent power > domain to drive it. Squelch this error, to allow a clock driver to > handle this directly instead. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/opp/core.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/opp/core.c b/drivers/opp/core.c > index 04b4691a8aac..89e616721f70 100644 > --- a/drivers/opp/core.c > +++ b/drivers/opp/core.c > @@ -873,12 +873,13 @@ static int _set_required_opp(struct device *dev, struct device *pd_dev, > return 0; > > ret = dev_pm_genpd_set_performance_state(pd_dev, pstate); > - if (ret) { > + if (ret && ret != -ENODEV) { > dev_err(dev, "Failed to set performance rate of %s: %d (%d)\n", > dev_name(pd_dev), pstate, ret); > + return ret; > } > > - return ret; > + return 0; > } > > /* This is only called for PM domain for now */ I am not sure why you need this, since _set_required_opps() has this check: if (unlikely(!required_opp_tables[0]->is_genpd)) { dev_err(dev, "required-opps don't belong to a genpd\n"); return -ENOENT; }
On 12/10/2021 12.21, Viresh Kumar wrote: > I am not sure why you need this, since _set_required_opps() has this check: > > if (unlikely(!required_opp_tables[0]->is_genpd)) { > dev_err(dev, "required-opps don't belong to a genpd\n"); > return -ENOENT; > } > The table *is* assigned to a genpd (the memory controller), it's just that that genpd isn't actually a parent of the CPU device. Without the patch you end up with: [ 3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19) [ 3.042881] cpu cpu4: Failed to set required opps: -19 [ 3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19
On 12-10-21, 14:34, Hector Martin wrote: > The table *is* assigned to a genpd (the memory controller), it's just that > that genpd isn't actually a parent of the CPU device. Without the patch you > end up with: > > [ 3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19) > [ 3.042881] cpu cpu4: Failed to set required opps: -19 > [ 3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19 Hmm, Saravana and Sibi were working on a similar problem earlier and decided to solve this using devfreq instead. Don't remember the exact series which got merged for this, Sibi ? If this part fails, how do you actually set the performance state of the memory controller's genpd ?
On 12/10/2021 14.51, Viresh Kumar wrote: > On 12-10-21, 14:34, Hector Martin wrote: >> The table *is* assigned to a genpd (the memory controller), it's just that >> that genpd isn't actually a parent of the CPU device. Without the patch you >> end up with: >> >> [ 3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19) >> [ 3.042881] cpu cpu4: Failed to set required opps: -19 >> [ 3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19 > > Hmm, Saravana and Sibi were working on a similar problem earlier and decided to > solve this using devfreq instead. Don't remember the exact series which got > merged for this, Sibi ? > > If this part fails, how do you actually set the performance state of the memory > controller's genpd ? The clock controller has the genpd as an actual power-domain parent, so it does it instead. From patch #7: > + if (cluster->has_pd) > + dev_pm_genpd_set_performance_state(cluster->dev, > + dev_pm_opp_get_required_pstate(opp, 0)); > + This is arguably not entirely representative of how the hardware works, since technically the cluster switching couldn't care less what the memory controller is doing; it's a soft dependency, states that should be switched together but are not interdependent (in fact, the clock code does this unconditionally after the CPU p-state change, regardless of whether we're shifting up or down; this is, FWIW, the same order macOS uses, and it clearly doesn't matter which way you do it).
On 11/10/2021 18:57, Hector Martin wrote: > This driver binds to the memory controller hardware in Apple SoCs such > as the Apple M1, and provides a power domain that downstream devices can > use to change the performance state of the memory controller. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/memory/Kconfig | 9 +++ > drivers/memory/Makefile | 1 + > drivers/memory/apple-mcc.c | 130 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 140 insertions(+) > create mode 100644 drivers/memory/apple-mcc.c > > diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig > index 72c0df129d5c..48ef3d563a1c 100644 > --- a/drivers/memory/Kconfig > +++ b/drivers/memory/Kconfig > @@ -30,6 +30,15 @@ config ARM_PL172_MPMC > If you have an embedded system with an AMBA bus and a PL172 > controller, say Y or M here. > > +config APPLE_MCC > + tristate "Apple SoC MCC driver" > + default y if ARCH_APPLE > + select PM_GENERIC_DOMAINS > + depends on ARCH_APPLE || COMPILE_TEST > + help > + This driver manages performance tuning for the memory controller in > + Apple SoCs, such as the Apple M1. > + > config ATMEL_SDRAMC > bool "Atmel (Multi-port DDR-)SDRAM Controller" > default y if ARCH_AT91 > diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile > index bc7663ed1c25..947840cbd2d4 100644 > --- a/drivers/memory/Makefile > +++ b/drivers/memory/Makefile > @@ -8,6 +8,7 @@ ifeq ($(CONFIG_DDR),y) > obj-$(CONFIG_OF) += of_memory.o > endif > obj-$(CONFIG_ARM_PL172_MPMC) += pl172.o > +obj-$(CONFIG_APPLE_MCC) += apple-mcc.o > obj-$(CONFIG_ATMEL_SDRAMC) += atmel-sdramc.o > obj-$(CONFIG_ATMEL_EBI) += atmel-ebi.o > obj-$(CONFIG_BRCMSTB_DPFE) += brcmstb_dpfe.o > diff --git a/drivers/memory/apple-mcc.c b/drivers/memory/apple-mcc.c > new file mode 100644 > index 000000000000..55959f034b9a > --- /dev/null > +++ b/drivers/memory/apple-mcc.c > @@ -0,0 +1,130 @@ > +// SPDX-License-Identifier: GPL-2.0-only OR MIT > +/* > + * Apple SoC MCC memory controller performance control driver > + * > + * Copyright The Asahi Linux Contributors Copyright date? > + */ > + > +#include <linux/delay.h> > +#include <linux/err.h> > +#include <linux/io.h> > +#include <linux/module.h> > +#include <linux/of.h> > +#include <linux/platform_device.h> > +#include <linux/pm_domain.h> > +#include <linux/pm_opp.h> > + > +#define APPLE_MCC_PERF_CONFIG1 0xdc4 > +#define APPLE_MCC_PERF_CONFIG2 0xdbc > +#define APPLE_MCC_CHANNEL(x) ((x) * 0x40000) > + > +struct apple_mcc { > + struct device *dev; > + struct generic_pm_domain genpd; > + void __iomem *reg_base; > + u32 num_channels; > +}; > + > +#define to_apple_mcc(_genpd) container_of(_genpd, struct apple_mcc, genpd) > + > +static int apple_mcc_set_performance_state(struct generic_pm_domain *genpd, unsigned int state) > +{ > + struct apple_mcc *mcc = to_apple_mcc(genpd); > + struct dev_pm_opp *opp; > + struct device_node *np; > + u32 perf_config[2]; > + unsigned int i; > + > + dev_dbg(mcc->dev, "switching to perf state %d\n", state); > + > + opp = dev_pm_opp_find_level_exact(&mcc->genpd.dev, state); > + if (IS_ERR(opp)) > + return PTR_ERR(opp); > + > + np = dev_pm_opp_get_of_node(opp); > + if (of_property_read_u32_array(np, "apple,memory-perf-config", > + perf_config, ARRAY_SIZE(perf_config))) { > + dev_err(mcc->dev, "missing apple,memory-perf-config property"); > + of_node_put(np); > + return -EINVAL; > + } > + of_node_put(np); > + > + for (i = 0; i < mcc->num_channels; i++) { > + writel_relaxed(perf_config[0], > + mcc->reg_base + APPLE_MCC_CHANNEL(i) + APPLE_MCC_PERF_CONFIG1); > + writel_relaxed(perf_config[1], > + mcc->reg_base + APPLE_MCC_CHANNEL(i) + APPLE_MCC_PERF_CONFIG2); > + } > + > + return 0; > +} > + > +static unsigned int apple_mcc_opp_to_performance_state(struct generic_pm_domain *genpd, > + struct dev_pm_opp *opp) > +{ > + return dev_pm_opp_get_level(opp); > +} > + > +static int apple_mcc_probe(struct platform_device *pdev) > +{ > + struct device *dev = &pdev->dev; > + struct device_node *node = dev->of_node; By convention mostly we call the variable "np". > + struct apple_mcc *mcc; > + int ret; > + > + mcc = devm_kzalloc(dev, sizeof(*mcc), GFP_KERNEL); > + if (!mcc) > + return -ENOMEM; > + > + mcc->reg_base = devm_platform_ioremap_resource(pdev, 0); > + if (IS_ERR(mcc->reg_base)) > + return PTR_ERR(mcc->reg_base); > + > + if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) { Don't you have a limit of supported channels? It cannot be any uint32... > + dev_err(dev, "missing apple,num-channels property\n"); Use almost everywhere dev_err_probe - less code and you get error msg printed. > + return -ENOENT; > + } > + > + mcc->dev = dev; > + mcc->genpd.name = "apple-mcc-perf"; > + mcc->genpd.opp_to_performance_state = apple_mcc_opp_to_performance_state; > + mcc->genpd.set_performance_state = apple_mcc_set_performance_state; > + > + ret = pm_genpd_init(&mcc->genpd, NULL, false); > + if (ret < 0) { > + dev_err(dev, "pm_genpd_init failed\n"); > + return ret; > + } > + > + ret = of_genpd_add_provider_simple_noclk(node, &mcc->genpd); > + if (ret < 0) { > + dev_err(dev, "of_genpd_add_provider_simple failed\n"); > + return ret; > + } > + > + dev_info(dev, "Apple MCC performance driver initialized\n"); Please skip it, or at least make it a dev_dbg, you don't print any valuable information here. > + > + return 0; > +} > + > +static const struct of_device_id apple_mcc_of_match[] = { > + { .compatible = "apple,mcc" }, > + {} > +}; > + > +MODULE_DEVICE_TABLE(of, apple_mcc_of_match); > + > +static struct platform_driver apple_mcc_driver = { > + .probe = apple_mcc_probe, > + .driver = { > + .name = "apple-mcc", > + .of_match_table = apple_mcc_of_match, > + }, > +}; module_platform_driver() goes here. > + > +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>"); > +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs"); > +MODULE_LICENSE("GPL v2"); I think this will be "Dual MIT/GPL", based on your SPDX. > + > +module_platform_driver(apple_mcc_driver); > Best regards, Krzysztof
On 12-10-21, 14:57, Hector Martin wrote: > On 12/10/2021 14.51, Viresh Kumar wrote: > > On 12-10-21, 14:34, Hector Martin wrote: > > > The table *is* assigned to a genpd (the memory controller), it's just that > > > that genpd isn't actually a parent of the CPU device. Without the patch you > > > end up with: > > > > > > [ 3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19) > > > [ 3.042881] cpu cpu4: Failed to set required opps: -19 > > > [ 3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19 > > > > Hmm, Saravana and Sibi were working on a similar problem earlier and decided to > > solve this using devfreq instead. Don't remember the exact series which got > > merged for this, Sibi ? > > > > If this part fails, how do you actually set the performance state of the memory > > controller's genpd ? > > The clock controller has the genpd as an actual power-domain parent, so it > does it instead. From patch #7: > > > + if (cluster->has_pd) > > + dev_pm_genpd_set_performance_state(cluster->dev, > > + dev_pm_opp_get_required_pstate(opp, 0)); > > + > > This is arguably not entirely representative of how the hardware works, > since technically the cluster switching couldn't care less what the memory > controller is doing; it's a soft dependency, states that should be switched > together but are not interdependent (in fact, the clock code does this > unconditionally after the CPU p-state change, regardless of whether we're > shifting up or down; this is, FWIW, the same order macOS uses, and it > clearly doesn't matter which way you do it). Yeah, I understand what you are doing. But the current patch is incorrect in the sense that it can cause a bug on other platforms. To make this work, you should rather set this genpd as parent of CPU devices (which are doing anyway since you are updating them with CPU's DVFS). With that the clk driver won't be required to do the magic behind the scene.
On 2021年10月12日 18:26:03 JST, Viresh Kumar <viresh.kumar@linaro.org> wrote: >On 12-10-21, 14:57, Hector Martin wrote: >> >> This is arguably not entirely representative of how the hardware works, >> since technically the cluster switching couldn't care less what the memory >> controller is doing; it's a soft dependency, states that should be switched >> together but are not interdependent (in fact, the clock code does this >> unconditionally after the CPU p-state change, regardless of whether we're >> shifting up or down; this is, FWIW, the same order macOS uses, and it >> clearly doesn't matter which way you do it). > >Yeah, I understand what you are doing. But the current patch is >incorrect in the sense that it can cause a bug on other platforms. To >make this work, you should rather set this genpd as parent of CPU >devices (which are doing anyway since you are updating them with CPU's >DVFS). With that the clk driver won't be required to do the magic >behind the scene. > That doesn't work, though, because the CPUs aren't normal devices with runtime-pm. That was the first thing I tried :). If you think this *should* be made to work instead then I can try that.
On 12-10-21, 18:31, Hector Martin "marcan" wrote: > That doesn't work, though, because the CPUs aren't normal devices > with runtime-pm. That was the first thing I tried :). What's the exact problem with runtime PM here ? > If you think this *should* be made to work instead then I can try that.
On 14-10-21, 15:52, Hector Martin wrote: > The CPU devices aren't attached to their genpd, so the required OPP > transition fails with the same error. > > However, this was easier to fix than I expected. With this patch to > cpufreq-dt, it all works properly, and I can drop the parent genpd > from the clock node and related handling. Thoughts? > > commit c4f88743374c1f4678ee7f17fb6cae30ded9ed59 > Author: Hector Martin <marcan@marcan.st> > Date: Thu Oct 14 15:47:45 2021 +0900 > > cpufreq: dt: Attach CPU devices to power domains > This allows the required-opps mechanism to work for CPU OPP tables, > triggering specific OPP levels in a parent power domain. > Signed-off-by: Hector Martin <marcan@marcan.st> > > diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c > index 8fcaba541539..5b22846b557d 100644 > --- a/drivers/cpufreq/cpufreq-dt.c > +++ b/drivers/cpufreq/cpufreq-dt.c > @@ -16,6 +16,7 @@ > #include <linux/list.h> > #include <linux/module.h> > #include <linux/of.h> > +#include <linux/pm_domain.h> > #include <linux/pm_opp.h> > #include <linux/platform_device.h> > #include <linux/regulator/consumer.h> > @@ -264,6 +265,16 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu) > goto out; > } > + /* > + * Attach the CPU device to its genpd domain (if any), to allow OPP > + * dependencies to be satisfied. > + */ > + ret = genpd_dev_pm_attach(cpu_dev); > + if (ret <= 0) { > + dev_err(cpu_dev, "Failed to attach CPU device to genpd\n"); > + goto out; > + } > + Other platform do this from some other place I think. Ulf, where should this code be moved ? cpu-clk driver ?
On 12/10/2021 18.19, Krzysztof Kozlowski wrote: >> +// SPDX-License-Identifier: GPL-2.0-only OR MIT >> +/* >> + * Apple SoC MCC memory controller performance control driver >> + * >> + * Copyright The Asahi Linux Contributors > > Copyright date? We've gone over this one a few times already; most copyright dates quickly become outdated and meaningless :) See: https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/ >> +static int apple_mcc_probe(struct platform_device *pdev) >> +{ >> + struct device *dev = &pdev->dev; >> + struct device_node *node = dev->of_node; > > By convention mostly we call the variable "np". Ack, I'll change it for v2. >> + mcc->reg_base = devm_platform_ioremap_resource(pdev, 0); >> + if (IS_ERR(mcc->reg_base)) >> + return PTR_ERR(mcc->reg_base); >> + >> + if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) { > > Don't you have a limit of supported channels? It cannot be any uint32... Today, it's max 8. But if come Monday we find out Apple's new chips have 16 channels and otherwise the same register layout, I'd much rather not have to change the driver... >> + dev_err(dev, "missing apple,num-channels property\n"); > > Use almost everywhere dev_err_probe - less code and you get error msg > printed. Heh, I didn't know about that one. Thanks! >> + >> + dev_info(dev, "Apple MCC performance driver initialized\n"); > > Please skip it, or at least make it a dev_dbg, you don't print any > valuable information here. Ack, I'll remove this. >> +static struct platform_driver apple_mcc_driver = { >> + .probe = apple_mcc_probe, >> + .driver = { >> + .name = "apple-mcc", >> + .of_match_table = apple_mcc_of_match, >> + }, >> +}; > > module_platform_driver() goes here. Ack, will fix for v2. > >> + >> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>"); >> +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs"); >> +MODULE_LICENSE("GPL v2"); > > I think this will be "Dual MIT/GPL", based on your SPDX. Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess anything containing "GPL" works with EXPORT_SYMBOL_GPL? Thanks for the review!
On 14/10/2021 15.56, Viresh Kumar wrote: >> + /* >> + * Attach the CPU device to its genpd domain (if any), to allow OPP >> + * dependencies to be satisfied. >> + */ >> + ret = genpd_dev_pm_attach(cpu_dev); >> + if (ret <= 0) { >> + dev_err(cpu_dev, "Failed to attach CPU device to genpd\n"); >> + goto out; >> + } >> + > > Other platform do this from some other place I think. > > Ulf, where should this code be moved ? cpu-clk driver ? > I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via dev_pm_domain_attach). Though it only does it for CPU#0; we need to do it for all CPUs.
On 14-10-21, 16:03, Hector Martin wrote: > On 14/10/2021 15.56, Viresh Kumar wrote: > > > + /* > > > + * Attach the CPU device to its genpd domain (if any), to allow OPP > > > + * dependencies to be satisfied. > > > + */ > > > + ret = genpd_dev_pm_attach(cpu_dev); > > > + if (ret <= 0) { > > > + dev_err(cpu_dev, "Failed to attach CPU device to genpd\n"); > > > + goto out; > > > + } > > > + > > > > Other platform do this from some other place I think. > > > > Ulf, where should this code be moved ? cpu-clk driver ? > > > > I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via > dev_pm_domain_attach). That may be a good place since you are already adding it and it is related to CPU clk. > Though it only does it for CPU#0; we need to do it > for all CPUs. Sure.
On 14/10/2021 16.03, Hector Martin wrote: > On 14/10/2021 15.56, Viresh Kumar wrote: >>> + /* >>> + * Attach the CPU device to its genpd domain (if any), to allow OPP >>> + * dependencies to be satisfied. >>> + */ >>> + ret = genpd_dev_pm_attach(cpu_dev); >>> + if (ret <= 0) { >>> + dev_err(cpu_dev, "Failed to attach CPU device to genpd\n"); >>> + goto out; >>> + } >>> + >> >> Other platform do this from some other place I think. >> >> Ulf, where should this code be moved ? cpu-clk driver ? >> > > I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via > dev_pm_domain_attach). Though it only does it for CPU#0; we need to do > it for all CPUs. Looking into this further, I'm not sure I like the idea of doing this in the clocks driver. There might be locking issues since it gets instantiated twice and yet doesn't really itself know what subset of CPUs it applies to. There's another driver that does this: drivers/cpuidle/cpuidle-psci-domain.c. That one specifically looks for a power domain called "psci". Perhaps it would make sense to make this generic in cpufreq-dt as per my prior patch, but explicitly request a "cpufreq" domain? That way only devicetrees that opt in to having this handled by cpufreq by naming it that way would get this behavior.
On 14/10/2021 08:59, Hector Martin wrote: > On 12/10/2021 18.19, Krzysztof Kozlowski wrote: >>> +// SPDX-License-Identifier: GPL-2.0-only OR MIT >>> +/* >>> + * Apple SoC MCC memory controller performance control driver >>> + * >>> + * Copyright The Asahi Linux Contributors >> >> Copyright date? > > We've gone over this one a few times already; most copyright dates > quickly become outdated and meaningless :) > > See: > https://www.linuxfoundation.org/blog/copyright-notices-in-open-source-software-projects/ > >>> +static int apple_mcc_probe(struct platform_device *pdev) >>> +{ >>> + struct device *dev = &pdev->dev; >>> + struct device_node *node = dev->of_node; >> >> By convention mostly we call the variable "np". > > Ack, I'll change it for v2. > >>> + mcc->reg_base = devm_platform_ioremap_resource(pdev, 0); >>> + if (IS_ERR(mcc->reg_base)) >>> + return PTR_ERR(mcc->reg_base); >>> + >>> + if (of_property_read_u32(node, "apple,num-channels", &mcc->num_channels)) { >> >> Don't you have a limit of supported channels? It cannot be any uint32... > > Today, it's max 8. But if come Monday we find out Apple's new chips have > 16 channels and otherwise the same register layout, I'd much rather not > have to change the driver... OK, however if the driver ever receives different DT with a different value, it will accept it unconditionally and go via address space. I am just saying that being conservative on received values is safer, but I am fine with skipping this problem. At the end we trust DT that it will always match the kernel, don't we? Oh wait, someone can use DT from other kernel in this one... > >>> + dev_err(dev, "missing apple,num-channels property\n"); >> >> Use almost everywhere dev_err_probe - less code and you get error msg >> printed. > > Heh, I didn't know about that one. Thanks! > >>> + >>> + dev_info(dev, "Apple MCC performance driver initialized\n"); >> >> Please skip it, or at least make it a dev_dbg, you don't print any >> valuable information here. > > Ack, I'll remove this. > >>> +static struct platform_driver apple_mcc_driver = { >>> + .probe = apple_mcc_probe, >>> + .driver = { >>> + .name = "apple-mcc", >>> + .of_match_table = apple_mcc_of_match, >>> + }, >>> +}; >> >> module_platform_driver() goes here. > > Ack, will fix for v2. > >> >>> + >>> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>"); >>> +MODULE_DESCRIPTION("MCC memory controller performance tuning driver for Apple SoCs"); >>> +MODULE_LICENSE("GPL v2"); >> >> I think this will be "Dual MIT/GPL", based on your SPDX. > > Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess > anything containing "GPL" works with EXPORT_SYMBOL_GPL? I don't think exporting symbols is related to how you license your code. Best regards, Krzysztof
On 14/10/2021 16.36, Krzysztof Kozlowski wrote: > On 14/10/2021 08:59, Hector Martin wrote: >>> Don't you have a limit of supported channels? It cannot be any uint32... >> >> Today, it's max 8. But if come Monday we find out Apple's new chips have >> 16 channels and otherwise the same register layout, I'd much rather not >> have to change the driver... > > OK, however if the driver ever receives different DT with a different > value, it will accept it unconditionally and go via address space. I am > just saying that being conservative on received values is safer, but I > am fine with skipping this problem. At the end we trust DT that it will > always match the kernel, don't we? Oh wait, someone can use DT from > other kernel in this one... DTs using these compatibles should have the same register layout, and should work with this driver; if a new chip comes out that has a different register layout we will change the compatibles (both) and therefore older kernels won't bind at all. If it has the same layout we'll keep the base compatible, `reg` will grow as needed to accomodate the extra channels, and e.g. num-channels=16 will then just work on older kernels with no changes. Obviously a broken DT with an insane value here would crash the driver, but so would any other number of crazy DT things; however, I don't expect that to ever happen. There's also the case where we end up with multiple memory controllers at discrete offsets (e.g. rumored multi-die configurations); in that case we'll end up with multiple genpd parents and have to add code to support that, and in the meantime older kernels will just have broken cpufreq on the p-cores. But I think that is ~acceptable as long as the system boots; we don't expect to be able to *fully* support newer SoCs on older kernels with no code changes. What I'm aiming for is just making the system work, hopefully with NVMe and USB and a dumb framebuffer, so that distro installers can run and then users can later install a proper up to date kernel will full support for the new SoC. >> Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess >> anything containing "GPL" works with EXPORT_SYMBOL_GPL? > > I don't think exporting symbols is related to how you license your code. It is; only modules with a GPL-compatible MODULE_LICENSE get to use symbols exported via EXPORT_SYMBOL_GPL. See kernel/module.c for the symbol lookup logic and include/linux/license.h for the logic to check the string (seems like "Dual MIT/GPL" is explicitly whitelisted there). Of course, this is a futile effort, as ~every time I see a proprietary module in some embedded device, it either falsely declares itself to be GPL, or they have a shim module that re-exports GPL symbols as non-GPL.
On 14/10/2021 09:52, Hector Martin wrote: > On 14/10/2021 16.36, Krzysztof Kozlowski wrote: (...) > >>> Ah, I didn't realize that was a valid option for MODULE_LICENSE. I guess >>> anything containing "GPL" works with EXPORT_SYMBOL_GPL? >> >> I don't think exporting symbols is related to how you license your code. > > It is; only modules with a GPL-compatible MODULE_LICENSE get to use > symbols exported via EXPORT_SYMBOL_GPL. Although there might be such correlation but it's not a rule. You can have a GPL module exporting symbols without GPL requirement (EXPORT_SYMBOLS). You can have a GPL+MIT module exporting symbols as GPL. Obviously you cannot have a non-GPL module, as we do not accept these and there is no such choice. So answering your question that "GPL" works with EXPORT_SYMBOL_GPL - everything is GPL but it works with both EXPORT_SYMBOL and EXPORT_SYMBOL_GPL. > > See kernel/module.c for the symbol lookup logic and > include/linux/license.h for the logic to check the string (seems like > "Dual MIT/GPL" is explicitly whitelisted there). Not related to export symbol. It is used for determining the tainted kernel via other licenses. > > Of course, this is a futile effort, as ~every time I see a proprietary > module in some embedded device, it either falsely declares itself to be > GPL, or they have a shim module that re-exports GPL symbols as non-GPL. > This is being removed soon (or already). Best regards, Krzysztof
On 14/10/2021 17.04, Krzysztof Kozlowski wrote: >> It is; only modules with a GPL-compatible MODULE_LICENSE get to use >> symbols exported via EXPORT_SYMBOL_GPL. > > Although there might be such correlation but it's not a rule. You can > have a GPL module exporting symbols without GPL requirement > (EXPORT_SYMBOLS). You can have a GPL+MIT module exporting symbols as > GPL. Obviously you cannot have a non-GPL module, as we do not accept > these and there is no such choice. What I mean is that modules can only import GPL symbols if they themselves are GPL compatible. What I didn't know is that "Dual MIT/GPL" is a valid string for MODULE_LICENSE to qualify as such. >> See kernel/module.c for the symbol lookup logic and >> include/linux/license.h for the logic to check the string (seems like >> "Dual MIT/GPL" is explicitly whitelisted there). > > Not related to export symbol. It is used for determining the tainted > kernel via other licenses. > Not just that; that module taint is used as a filter so that non-GPL-compatible modules are technically prevented from resolving EXPORT_SYMBOL_GPL symbols. >> Of course, this is a futile effort, as ~every time I see a proprietary >> module in some embedded device, it either falsely declares itself to be >> GPL, or they have a shim module that re-exports GPL symbols as non-GPL. >> > > This is being removed soon (or already). ? Good luck getting proprietary embedded vendors to start following licenses... :)
On Tue, 12 Oct 2021 at 07:57, Hector Martin <marcan@marcan.st> wrote: > > On 12/10/2021 14.51, Viresh Kumar wrote: > > On 12-10-21, 14:34, Hector Martin wrote: > >> The table *is* assigned to a genpd (the memory controller), it's just that > >> that genpd isn't actually a parent of the CPU device. Without the patch you > >> end up with: > >> > >> [ 3.040060] cpu cpu4: Failed to set performance rate of cpu4: 0 (-19) > >> [ 3.042881] cpu cpu4: Failed to set required opps: -19 > >> [ 3.045508] cpufreq: __target_index: Failed to change cpu frequency: -19 > > > > Hmm, Saravana and Sibi were working on a similar problem earlier and decided to > > solve this using devfreq instead. Don't remember the exact series which got > > merged for this, Sibi ? > > > > If this part fails, how do you actually set the performance state of the memory > > controller's genpd ? > > The clock controller has the genpd as an actual power-domain parent, so > it does it instead. From patch #7: > > > + if (cluster->has_pd) > > + dev_pm_genpd_set_performance_state(cluster->dev, > > + dev_pm_opp_get_required_pstate(opp, 0)); > > + > > This is arguably not entirely representative of how the hardware works, > since technically the cluster switching couldn't care less what the > memory controller is doing; it's a soft dependency, states that should > be switched together but are not interdependent (in fact, the clock code > does this unconditionally after the CPU p-state change, regardless of > whether we're shifting up or down; this is, FWIW, the same order macOS > uses, and it clearly doesn't matter which way you do it). Yes, this sounds like you should move away from modeling the memory part as a parent genpd for the CPUs' genpd. As Viresh pointed out, a devfreq driver seems like a better way to do this. As a matter of fact, there are already devfreq drivers that do this, unless I am mistaken. It looks like devfreq providers are listening to opp/cpufreq notifiers, as to get an indication of when it could make sense to change a performance state. In some cases the devfreq provider is also modeled as an interconnect provider, allowing consumers to specify memory bandwidth constraints, which may trigger a new performance state to be set for the memory controller. In the tegra case, the memory controller is modelled as an interconnect provider and the devfreq node is modelled as an interconnect-consumer of the memory controller. Perhaps this can work for apple SoCs too? That said, perhaps as an option to move forward, we can try to get the cpufreq pieces solved first. Then as a step on top, add the performance scaling for the memory controller? > > -- > Hector Martin (marcan@marcan.st) > Public Key: https://mrcn.st/pub Kind regards Uffe
On Thu, 14 Oct 2021 at 09:23, Hector Martin <marcan@marcan.st> wrote: > > On 14/10/2021 16.03, Hector Martin wrote: > > On 14/10/2021 15.56, Viresh Kumar wrote: > >>> + /* > >>> + * Attach the CPU device to its genpd domain (if any), to allow OPP > >>> + * dependencies to be satisfied. > >>> + */ > >>> + ret = genpd_dev_pm_attach(cpu_dev); > >>> + if (ret <= 0) { > >>> + dev_err(cpu_dev, "Failed to attach CPU device to genpd\n"); > >>> + goto out; > >>> + } > >>> + > >> > >> Other platform do this from some other place I think. > >> > >> Ulf, where should this code be moved ? cpu-clk driver ? > >> > > > > I see one driver that does this is drivers/clk/qcom/apcs-sdx55.c (via > > dev_pm_domain_attach). Though it only does it for CPU#0; we need to do > > it for all CPUs. > > Looking into this further, I'm not sure I like the idea of doing this in > the clocks driver. There might be locking issues since it gets > instantiated twice and yet doesn't really itself know what subset of > CPUs it applies to. I agree. I suggest you look into using a genpd provider and hook up all CPU's devices to it. I think that is what Viresh also suggested earlier - and this makes most sense to me. As a reference you may have a look at some Qcom platforms that already use this: arch/arm64/boot/dts/qcom/qcs404.dtsi drivers/cpufreq/qcom-cpufreq-nvmem.c: To hook up CPU devices to their PM domains (genpds) - it calls dev_pm_opp_attach_genpd(), which is a kind of wrapper for dev_pm_domain_attach_by_name(). drivers/soc/qcom/cpr.c Registers the genpd provider that is capable of dealing with performance states/OPPs for CPUs. > > There's another driver that does this: > drivers/cpuidle/cpuidle-psci-domain.c. That one specifically looks for a > power domain called "psci". Perhaps it would make sense to make this > generic in cpufreq-dt as per my prior patch, but explicitly request a > "cpufreq" domain? That way only devicetrees that opt in to having this > handled by cpufreq by naming it that way would get this behavior. That sounds like an idea that is worth exploring. In this way, the only thing that needs to be implemented for new cases would be the genpd provider driver. BTW, as you will figure out by looking at the above references, for the qcom case we are using "cpr" as the domain name for cpufreq. Of course, that doesn't mean we can use "cpufreq" (or whatever name that makes sense) going forward for new cases. > > -- > Hector Martin (marcan@marcan.st) > Public Key: https://mrcn.st/pub Kind regards Uffe
On 14/10/2021 18.55, Ulf Hansson wrote: > Yes, this sounds like you should move away from modeling the memory > part as a parent genpd for the CPUs' genpd. > > As Viresh pointed out, a devfreq driver seems like a better way to do > this. As a matter of fact, there are already devfreq drivers that do > this, unless I am mistaken. > > It looks like devfreq providers are listening to opp/cpufreq > notifiers, as to get an indication of when it could make sense to > change a performance state. > > In some cases the devfreq provider is also modeled as an interconnect > provider, allowing consumers to specify memory bandwidth constraints, > which may trigger a new performance state to be set for the memory > controller. > > In the tegra case, the memory controller is modelled as an > interconnect provider and the devfreq node is modelled as an > interconnect-consumer of the memory controller. Perhaps this can work > for apple SoCs too? I was poking around and noticed the OPP core can already integrate with interconnect requirements, so perhaps the memory controller can be an interconnect provider, and the CPU nodes can directly reference it as a consumer? This seems like a more accurate model of what the hardware does, and I think I saw some devices doing this already. (only problem is I have no idea of the actual bandwidth numbers involved here... I'll have to run some benchmarks to make sure this isn't just completely dummy data) > > That said, perhaps as an option to move forward, we can try to get the > cpufreq pieces solved first. Then as a step on top, add the > performance scaling for the memory controller? Sure; that's a pretty much independent part of this patchset, though I'm thinking I might as well try some things out for v2 anyway; if it looks like it'll take longer we can split it out and do just the cpufreq side.
On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote: > > On 14/10/2021 18.55, Ulf Hansson wrote: > > Yes, this sounds like you should move away from modeling the memory > > part as a parent genpd for the CPUs' genpd. > > > > As Viresh pointed out, a devfreq driver seems like a better way to do > > this. As a matter of fact, there are already devfreq drivers that do > > this, unless I am mistaken. > > > > It looks like devfreq providers are listening to opp/cpufreq > > notifiers, as to get an indication of when it could make sense to > > change a performance state. > > > > In some cases the devfreq provider is also modeled as an interconnect > > provider, allowing consumers to specify memory bandwidth constraints, > > which may trigger a new performance state to be set for the memory > > controller. > > > > In the tegra case, the memory controller is modelled as an > > interconnect provider and the devfreq node is modelled as an > > interconnect-consumer of the memory controller. Perhaps this can work > > for apple SoCs too? > > I was poking around and noticed the OPP core can already integrate with > interconnect requirements, so perhaps the memory controller can be an > interconnect provider, and the CPU nodes can directly reference it as a > consumer? This seems like a more accurate model of what the hardware > does, and I think I saw some devices doing this already. Yeah, that could work too. And, yes, I agree, it may be a better description of the HW. > > (only problem is I have no idea of the actual bandwidth numbers involved > here... I'll have to run some benchmarks to make sure this isn't just > completely dummy data) > > > > > That said, perhaps as an option to move forward, we can try to get the > > cpufreq pieces solved first. Then as a step on top, add the > > performance scaling for the memory controller? > > Sure; that's a pretty much independent part of this patchset, though I'm > thinking I might as well try some things out for v2 anyway; if it looks > like it'll take longer we can split it out and do just the cpufreq side. In any case, I do my best to help with review. Kind regards Uffe
On 14/10/2021 21.55, Ulf Hansson wrote: > On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote: >> I was poking around and noticed the OPP core can already integrate with >> interconnect requirements, so perhaps the memory controller can be an >> interconnect provider, and the CPU nodes can directly reference it as a >> consumer? This seems like a more accurate model of what the hardware >> does, and I think I saw some devices doing this already. > > Yeah, that could work too. And, yes, I agree, it may be a better > description of the HW. > >> >> (only problem is I have no idea of the actual bandwidth numbers involved >> here... I'll have to run some benchmarks to make sure this isn't just >> completely dummy data) >> So... I tried getting bandwidth numbers and failed. It seems these registers don't actually affect peak performance in any measurable way. I'm also getting almost the same GeekBench scores on macOS with and without this mechanism enabled, although there is one subtest that seems to show a measurable difference. My current guess is this is something more subtle (latencies? idle timers and such?) than a performance state. If that is the case, do you have any ideas as to the best way to model it in Linux? Should we even bother if it mostly has a minimal performance gain for typical workloads? I'll try to do some latency tests, see if I can make sense of what it's actually doing.
On Thu, 14 Oct 2021 at 19:02, Hector Martin <marcan@marcan.st> wrote: > > On 14/10/2021 21.55, Ulf Hansson wrote: > > On Thu, 14 Oct 2021 at 13:43, Hector Martin <marcan@marcan.st> wrote: > >> I was poking around and noticed the OPP core can already integrate with > >> interconnect requirements, so perhaps the memory controller can be an > >> interconnect provider, and the CPU nodes can directly reference it as a > >> consumer? This seems like a more accurate model of what the hardware > >> does, and I think I saw some devices doing this already. > > > > Yeah, that could work too. And, yes, I agree, it may be a better > > description of the HW. > > > >> > >> (only problem is I have no idea of the actual bandwidth numbers involved > >> here... I'll have to run some benchmarks to make sure this isn't just > >> completely dummy data) > >> > > So... I tried getting bandwidth numbers and failed. It seems these > registers don't actually affect peak performance in any measurable way. > I'm also getting almost the same GeekBench scores on macOS with and > without this mechanism enabled, although there is one subtest that seems > to show a measurable difference. > > My current guess is this is something more subtle (latencies? idle > timers and such?) than a performance state. If that is the case, do you > have any ideas as to the best way to model it in Linux? Should we even > bother if it mostly has a minimal performance gain for typical workloads? For latency constraints, we have dev_pm_qos. This will make the genpd governor, to prevent deeper idle states for the device and its corresponding PM domain (genpd). But that doesn't sound like a good fit here. If you are right, it rather sounds like there is some kind of quiescence mode of the memory controller that can be prevented. But I have no clue, of course. :-) > > I'll try to do some latency tests, see if I can make sense of what it's > actually doing. > Kind regards Uffe
On 15/10/2021 07.07, Stephen Boyd wrote: > This looks bad from a locking perspective. How is lockdep holding up > with this driver? We're underneath the prepare lock here and we're > setting a couple level registers which is all good but now we're calling > into genpd code and who knows what's going to happen locking wise. It seems this is all going away given the other discussion threads point towards handling this directly via OPP in the cpufreq-dt driver. I'll run whatever I end up with for v2 through lockdep though, good call! > I don't actually see anything in here that indicates this is supposed to > be a clk provider. Is it being modeled as a clk so that it can use > cpufreq-dt? If it was a clk provider I'd expect it to be looking at > parent clk rates, and reading hardware to calculate frequencies based on > dividers and multipliers, etc. None of that is happening here. > > Why not write a cpufreq driver, similar to qcom-cpufreq-hw.c that looks > through the OPP table and then writes the value into the pstate > registers? The registers in here look awfully similar to the qcom > hardware. I don't know what the DESIRED1 and DESIRED2 registers are for > though. Maybe they're so that one or the other frequency can be used if > available? Like a min/max? > > Either way, writing this as a cpufreq driver avoids the clk framework > entirely which is super great for me :) It also avoids locking headaches > from the clk prepare lock, and it also lets you support lockless cpufreq > transitions by implementing the fast_switch function. I don't see any > downsides to the cpufreq driver approach. I wasn't too sure about this approach; I thought using a clk provider would end up simplifying things since I could use the cpufreq-dt machinery to take care of all the OPP stuff, and a lot of SoCs seemed to be going that way, but it seems cpufreq might be a better approach for this SoC? There can only be one cpufreq driver instance, while I used two clock controllers to model the two clusters. So in the cpufreq case, the driver itself would have to deal with all potential CPU cluster instances/combinations itself. Not sure how much more code that will be, hopefully not too much... I see qcom-cpufreq-hw uses a qcom,freq-domain prop to link CPUs to the cpufreq domains. cpufreq-dt and vexpress-spc-cpufreq instead use dev_pm_opp_get_sharing_cpus to look for shared OPP tables. Is there a reason not to do it that way and avoid the vendor prop? I guess the prop is more explicit while the sharing approach would have an implicit order dependency (i.e. CPUs are always grouped by cluster and clusters are listed in /cpus in the same order as in the cpufreq node)... (Ack on the other comments, but if this becomes a cpufreq driver most of it is going to end up rewritten... :)) For the cpufreq case, do you have any suggestions as to how to relate it to the memory controller configuration tweaks? Ideally this would go through the OPP tables so it can be customized for future SoCs without stuff hardcoded in the driver... it seems the configuration affects power saving behavior / latencies, so it doesn't quite match the interconnect framework bandwidth request stuff. I'm also not sure how this would affect fast_switch, since going through those frameworks might imply locks... we might even find ourselves with a situation in the near future where multiple cpufreq policies can request memory controller latency reduction independently; I can come up with how to do this locklessly using atomics, but I can't imagine that being workable with higher-level frameworks, it would have to be a vendor-specific mechanism at that point...