mbox series

[v3,0/5] Apple SoC cpufreq driver

Message ID 20221024043925.25379-1-marcan@marcan.st
Headers show
Series Apple SoC cpufreq driver | expand

Message

Hector Martin Oct. 24, 2022, 4:39 a.m. UTC
Hi folks,

Third time's the charm? Here's v3 of the cpufreq driver for Apple SoCs.
This version takes a page from both v1 and v2, keeping the dedicated
cpufreq style (instead of pretending to be a clock controller) but using
dedicated DT nodes for each cluster, which accurately represents the
hardware. In particular, this makes supporting t6002 (M1 Ultra) a lot
more reasonable on the DT side.

This version also switches to the standard performance-domains binding,
so we don't need any more vendor-specific properties. In order to
support this, I had to make the performance-domains parsing code more
generic. This required a minor change to the only consumer
(mediatek-cpufreq-hw).

The Linux driver probes based on platform compatible, and then attempts
to locate the cluster nodes by following the performance-domains links
from CPU nodes (this will then fail for any incompatible nodes, e.g. if
a future SoC needs a new compatible and can't fall back). This approach
was suggested by robh as the right way to handle the impedance mismatch
between the hardware, which has separate controllers per cluster, and
the Linux model where there can only be one CPUFreq driver instance.

Functionality-wise, there are no significant changes from v2. The only
notable difference is support for t8112 (M2). This works largely the
same as the other SoCs, but they ran out of bits in the current PState
register, so that needs a SoC-specific quirk. Since that register is
not used by macOS (it was discovered experimentally) and is not critical
for functionality (it just allows accurately reporting the current
frequency to userspace, given boost clock limitations), I've decided to
only use it when a SoC-specific compatible is present. The default
fallback code will simply report the requested frequency as actual.
I expect this will work for future SoCs.

As usual, MAINTAINERS and DT changes are split. I expect patches #2~#4
to go through the cpufreq tree, and we'll take care of #1 and #5 via
the asahi-soc tree.

Hector Martin (5):
  MAINTAINERS: Add entries for Apple SoC cpufreq driver
  dt-bindings: cpufreq: apple,soc-cpufreq: Add binding for Apple SoC
    cpufreq
  cpufreq: Generalize of_perf_domain_get_sharing_cpumask phandle format
  cpufreq: apple-soc: Add new driver to control Apple SoC CPU P-states
  arm64: dts: apple: Add CPU topology & cpufreq nodes for t8103

 .../cpufreq/apple,cluster-cpufreq.yaml        | 119 ++++++
 MAINTAINERS                                   |   2 +
 arch/arm64/boot/dts/apple/t8103.dtsi          | 206 +++++++++-
 drivers/cpufreq/Kconfig.arm                   |   9 +
 drivers/cpufreq/Makefile                      |   1 +
 drivers/cpufreq/apple-soc-cpufreq.c           | 352 ++++++++++++++++++
 drivers/cpufreq/cpufreq-dt-platdev.c          |   2 +
 drivers/cpufreq/mediatek-cpufreq-hw.c         |  14 +-
 include/linux/cpufreq.h                       |  28 +-
 9 files changed, 706 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/cpufreq/apple,cluster-cpufreq.yaml
 create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c

Comments

Marc Zyngier Oct. 24, 2022, 8:27 a.m. UTC | #1
On Mon, 24 Oct 2022 05:39:24 +0100,
Hector Martin <marcan@marcan.st> wrote:
> 
> This driver implements CPU frequency scaling for Apple Silicon SoCs,
> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
> 
> Each CPU cluster has its own register set, and frequency management is
> fully automated by the hardware; the driver only has to write one
> register. There is boost frequency support, but the hardware will only
> allow their use if only a subset of cores in a cluster are in
> non-deep-idle. Since we don't support deep idle yet, these frequencies
> are not achievable, but the driver supports them. They will remain
> disabled in the device tree until deep idle is implemented, to avoid
> confusing users.
> 
> This driver does not yet implement the memory controller performance
> state tuning that usually accompanies higher CPU p-states. This will be
> done in a future patch.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/cpufreq/Kconfig.arm          |   9 +
>  drivers/cpufreq/Makefile             |   1 +
>  drivers/cpufreq/apple-soc-cpufreq.c  | 352 +++++++++++++++++++++++++++
>  drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
>  4 files changed, 364 insertions(+)
>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c
>

[...]

> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
> +	&cpufreq_freq_attr_scaling_available_freqs,
> +	NULL,
> +	NULL,

nit: extra NULL?

	M.
Marc Zyngier Oct. 24, 2022, 8:28 a.m. UTC | #2
On Mon, 24 Oct 2022 05:39:20 +0100,
Hector Martin <marcan@marcan.st> wrote:
> 
> Hi folks,
> 
> Third time's the charm? Here's v3 of the cpufreq driver for Apple SoCs.
> This version takes a page from both v1 and v2, keeping the dedicated
> cpufreq style (instead of pretending to be a clock controller) but using
> dedicated DT nodes for each cluster, which accurately represents the
> hardware. In particular, this makes supporting t6002 (M1 Ultra) a lot
> more reasonable on the DT side.
> 
> This version also switches to the standard performance-domains binding,
> so we don't need any more vendor-specific properties. In order to
> support this, I had to make the performance-domains parsing code more
> generic. This required a minor change to the only consumer
> (mediatek-cpufreq-hw).
> 
> The Linux driver probes based on platform compatible, and then attempts
> to locate the cluster nodes by following the performance-domains links
> from CPU nodes (this will then fail for any incompatible nodes, e.g. if
> a future SoC needs a new compatible and can't fall back). This approach
> was suggested by robh as the right way to handle the impedance mismatch
> between the hardware, which has separate controllers per cluster, and
> the Linux model where there can only be one CPUFreq driver instance.
> 
> Functionality-wise, there are no significant changes from v2. The only
> notable difference is support for t8112 (M2). This works largely the
> same as the other SoCs, but they ran out of bits in the current PState
> register, so that needs a SoC-specific quirk. Since that register is
> not used by macOS (it was discovered experimentally) and is not critical
> for functionality (it just allows accurately reporting the current
> frequency to userspace, given boost clock limitations), I've decided to
> only use it when a SoC-specific compatible is present. The default
> fallback code will simply report the requested frequency as actual.
> I expect this will work for future SoCs.
> 
> As usual, MAINTAINERS and DT changes are split. I expect patches #2~#4
> to go through the cpufreq tree, and we'll take care of #1 and #5 via
> the asahi-soc tree.
> 
> Hector Martin (5):
>   MAINTAINERS: Add entries for Apple SoC cpufreq driver
>   dt-bindings: cpufreq: apple,soc-cpufreq: Add binding for Apple SoC
>     cpufreq
>   cpufreq: Generalize of_perf_domain_get_sharing_cpumask phandle format
>   cpufreq: apple-soc: Add new driver to control Apple SoC CPU P-states
>   arm64: dts: apple: Add CPU topology & cpufreq nodes for t8103
> 
>  .../cpufreq/apple,cluster-cpufreq.yaml        | 119 ++++++
>  MAINTAINERS                                   |   2 +
>  arch/arm64/boot/dts/apple/t8103.dtsi          | 206 +++++++++-
>  drivers/cpufreq/Kconfig.arm                   |   9 +
>  drivers/cpufreq/Makefile                      |   1 +
>  drivers/cpufreq/apple-soc-cpufreq.c           | 352 ++++++++++++++++++
>  drivers/cpufreq/cpufreq-dt-platdev.c          |   2 +
>  drivers/cpufreq/mediatek-cpufreq-hw.c         |  14 +-
>  include/linux/cpufreq.h                       |  28 +-
>  9 files changed, 706 insertions(+), 27 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/cpufreq/apple,cluster-cpufreq.yaml
>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c

FWIW, and for the whole series:

Acked-by: Marc Zyngier <maz@kernel.org>

	M.
Krzysztof Kozlowski Oct. 25, 2022, 4:02 p.m. UTC | #3
On 24/10/2022 00:39, Hector Martin wrote:
> Add the missing CPU topology/capacity information and the cpufreq nodes,
> so we can have CPU frequency scaling and the scheduler has the
> information it needs to make the correct decisions.
> 

Thank you for your patch. There is something to discuss/improve.

> +
>  	timer {
>  		compatible = "arm,armv8-timer";
>  		interrupt-parent = <&aic>;
> @@ -124,6 +298,18 @@ soc {
>  		ranges;
>  		nonposted-mmio;
>  
> +		cpufreq_e: cpufreq@210e20000 {

Node name: performance-controller

> +			compatible = "apple,t8103-cluster-cpufreq", "apple,cluster-cpufreq";
> +			reg = <0x2 0x10e20000 0 0x1000>;
> +			#performance-domain-cells = <0>;
> +		};
> +
> +		cpufreq_p: cpufreq@211e20000 {

Ditto

> +			compatible = "apple,t8103-cluster-cpufreq", "apple,cluster-cpufreq";
> +			reg = <0x2 0x11e20000 0 0x1000>;
> +			#performance-domain-cells = <0>;
> +		};
> +
Krzysztof
Ulf Hansson Nov. 1, 2022, 3:16 p.m. UTC | #4
On Mon, 24 Oct 2022 at 06:40, Hector Martin <marcan@marcan.st> wrote:
>
> This driver implements CPU frequency scaling for Apple Silicon SoCs,
> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
>
> Each CPU cluster has its own register set, and frequency management is
> fully automated by the hardware; the driver only has to write one
> register. There is boost frequency support, but the hardware will only
> allow their use if only a subset of cores in a cluster are in
> non-deep-idle. Since we don't support deep idle yet, these frequencies
> are not achievable, but the driver supports them. They will remain
> disabled in the device tree until deep idle is implemented, to avoid
> confusing users.

Out of curiosity, may I ask if this implies the need of a
synchronization mechanism on the Linux side? Or is the boost frequency
dynamically managed solely by HW/FW?

>
> This driver does not yet implement the memory controller performance
> state tuning that usually accompanies higher CPU p-states. This will be
> done in a future patch.
>
> Signed-off-by: Hector Martin <marcan@marcan.st>

Other than the question above, I intend to help to review the series
in more detail soon.

Kind regards
Uffe

> ---
>  drivers/cpufreq/Kconfig.arm          |   9 +
>  drivers/cpufreq/Makefile             |   1 +
>  drivers/cpufreq/apple-soc-cpufreq.c  | 352 +++++++++++++++++++++++++++
>  drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
>  4 files changed, 364 insertions(+)
>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c
>
> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm
> index 82e5de1f6f8c..29969f84008a 100644
> --- a/drivers/cpufreq/Kconfig.arm
> +++ b/drivers/cpufreq/Kconfig.arm
> @@ -41,6 +41,15 @@ config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM
>           To compile this driver as a module, choose M here: the
>           module will be called sun50i-cpufreq-nvmem.
>
> +config ARM_APPLE_SOC_CPUFREQ
> +       tristate "Apple Silicon SoC CPUFreq support"
> +       depends on ARCH_APPLE || (COMPILE_TEST && 64BIT)
> +       select PM_OPP
> +       default ARCH_APPLE
> +       help
> +         This adds the CPUFreq driver for Apple Silicon machines
> +         (e.g. Apple M1).
> +
>  config ARM_ARMADA_37XX_CPUFREQ
>         tristate "Armada 37xx CPUFreq support"
>         depends on ARCH_MVEBU && CPUFREQ_DT
> diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
> index 49b98c62c5af..32a7029e25ed 100644
> --- a/drivers/cpufreq/Makefile
> +++ b/drivers/cpufreq/Makefile
> @@ -52,6 +52,7 @@ obj-$(CONFIG_X86_AMD_FREQ_SENSITIVITY)        += amd_freq_sensitivity.o
>
>  ##################################################################################
>  # ARM SoC drivers
> +obj-$(CONFIG_ARM_APPLE_SOC_CPUFREQ)    += apple-soc-cpufreq.o
>  obj-$(CONFIG_ARM_ARMADA_37XX_CPUFREQ)  += armada-37xx-cpufreq.o
>  obj-$(CONFIG_ARM_ARMADA_8K_CPUFREQ)    += armada-8k-cpufreq.o
>  obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ)  += brcmstb-avs-cpufreq.o
> diff --git a/drivers/cpufreq/apple-soc-cpufreq.c b/drivers/cpufreq/apple-soc-cpufreq.c
> new file mode 100644
> index 000000000000..12c4b490edb8
> --- /dev/null
> +++ b/drivers/cpufreq/apple-soc-cpufreq.c
> @@ -0,0 +1,352 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Apple SoC CPU cluster performance state driver
> + *
> + * Copyright The Asahi Linux Contributors
> + *
> + * Based on scpi-cpufreq.c
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/cpu.h>
> +#include <linux/cpufreq.h>
> +#include <linux/cpumask.h>
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/pm_opp.h>
> +#include <linux/slab.h>
> +
> +#define APPLE_DVFS_CMD                 0x20
> +#define APPLE_DVFS_CMD_BUSY            BIT(31)
> +#define APPLE_DVFS_CMD_SET             BIT(25)
> +#define APPLE_DVFS_CMD_PS2             GENMASK(16, 12)
> +#define APPLE_DVFS_CMD_PS1             GENMASK(4, 0)
> +
> +/* Same timebase as CPU counter (24MHz) */
> +#define APPLE_DVFS_LAST_CHG_TIME       0x38
> +
> +/*
> + * Apple ran out of bits and had to shift this in T8112...
> + */
> +#define APPLE_DVFS_STATUS                      0x50
> +#define APPLE_DVFS_STATUS_CUR_PS_T8103         GENMASK(7, 4)
> +#define APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8103   4
> +#define APPLE_DVFS_STATUS_TGT_PS_T8103         GENMASK(3, 0)
> +#define APPLE_DVFS_STATUS_CUR_PS_T8112         GENMASK(9, 5)
> +#define APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8112   5
> +#define APPLE_DVFS_STATUS_TGT_PS_T8112         GENMASK(4, 0)
> +
> +/*
> + * Div is +1, base clock is 12MHz on existing SoCs.
> + * For documentation purposes. We use the OPP table to
> + * get the frequency.
> + */
> +#define APPLE_DVFS_PLL_STATUS          0xc0
> +#define APPLE_DVFS_PLL_FACTOR          0xc8
> +#define APPLE_DVFS_PLL_FACTOR_MULT     GENMASK(31, 16)
> +#define APPLE_DVFS_PLL_FACTOR_DIV      GENMASK(15, 0)
> +
> +#define APPLE_DVFS_TRANSITION_TIMEOUT 100
> +
> +struct apple_soc_cpufreq_info {
> +       u64 max_pstate;
> +       u64 cur_pstate_mask;
> +       u64 cur_pstate_shift;
> +};
> +
> +struct apple_cpu_priv {
> +       struct device *cpu_dev;
> +       void __iomem *reg_base;
> +       const struct apple_soc_cpufreq_info *info;
> +};
> +
> +static struct cpufreq_driver apple_soc_cpufreq_driver;
> +
> +const struct apple_soc_cpufreq_info soc_t8103_info = {
> +       .max_pstate = 15,
> +       .cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8103,
> +       .cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8103,
> +};
> +
> +const struct apple_soc_cpufreq_info soc_t8112_info = {
> +       .max_pstate = 31,
> +       .cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8112,
> +       .cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8112,
> +};
> +
> +const struct apple_soc_cpufreq_info soc_default_info = {
> +       .max_pstate = 15,
> +       .cur_pstate_mask = 0, /* fallback */
> +};
> +
> +static const struct of_device_id apple_soc_cpufreq_of_match[] = {
> +       {
> +               .compatible = "apple,t8103-cluster-cpufreq",
> +               .data = &soc_t8103_info,
> +       },
> +       {
> +               .compatible = "apple,t8112-cluster-cpufreq",
> +               .data = &soc_t8112_info,
> +       },
> +       {
> +               .compatible = "apple,cluster-cpufreq",
> +               .data = &soc_default_info,
> +       },
> +       {}
> +};
> +
> +static unsigned int apple_soc_cpufreq_get_rate(unsigned int cpu)
> +{
> +       struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
> +       struct apple_cpu_priv *priv = policy->driver_data;
> +       unsigned int pstate;
> +       unsigned int i;
> +
> +       if (priv->info->cur_pstate_mask) {
> +               u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_STATUS);
> +
> +               pstate = (reg & priv->info->cur_pstate_mask) >>  priv->info->cur_pstate_shift;
> +       } else {
> +               /*
> +                * For the fallback case we might not know the layout of DVFS_STATUS,
> +                * so just use the command register value (which ignores boost limitations).
> +                */
> +               u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_CMD);
> +
> +               pstate = FIELD_GET(APPLE_DVFS_CMD_PS1, reg);
> +       }
> +
> +       for (i = 0; policy->freq_table[i].frequency != CPUFREQ_TABLE_END; i++)
> +               if (policy->freq_table[i].driver_data == pstate)
> +                       return policy->freq_table[i].frequency;
> +
> +       dev_err(priv->cpu_dev, "could not find frequency for pstate %d\n",
> +               pstate);
> +       return 0;
> +}
> +
> +static int apple_soc_cpufreq_set_target(struct cpufreq_policy *policy,
> +                                       unsigned int index)
> +{
> +       struct apple_cpu_priv *priv = policy->driver_data;
> +       unsigned int pstate = policy->freq_table[index].driver_data;
> +       u64 reg;
> +
> +       /* Fallback for newer SoCs */
> +       if (index > priv->info->max_pstate)
> +               index = priv->info->max_pstate;
> +
> +       if (readq_poll_timeout_atomic(priv->reg_base + APPLE_DVFS_CMD, reg,
> +                                     !(reg & APPLE_DVFS_CMD_BUSY), 2,
> +                                     APPLE_DVFS_TRANSITION_TIMEOUT)) {
> +               return -EIO;
> +       }
> +
> +       reg &= ~(APPLE_DVFS_CMD_PS1 | APPLE_DVFS_CMD_PS2);
> +       reg |= FIELD_PREP(APPLE_DVFS_CMD_PS1, pstate);
> +       reg |= FIELD_PREP(APPLE_DVFS_CMD_PS2, pstate);
> +       reg |= APPLE_DVFS_CMD_SET;
> +
> +       writeq_relaxed(reg, priv->reg_base + APPLE_DVFS_CMD);
> +
> +       return 0;
> +}
> +
> +static unsigned int apple_soc_cpufreq_fast_switch(struct cpufreq_policy *policy,
> +                                                 unsigned int target_freq)
> +{
> +       if (apple_soc_cpufreq_set_target(policy, policy->cached_resolved_idx) < 0)
> +               return 0;
> +
> +       return policy->freq_table[policy->cached_resolved_idx].frequency;
> +}
> +
> +static int apple_soc_cpufreq_find_cluster(struct cpufreq_policy *policy,
> +                                         void __iomem **reg_base,
> +                                         const struct apple_soc_cpufreq_info **info)
> +{
> +       struct of_phandle_args args;
> +       const struct of_device_id *match;
> +       int ret = 0;
> +
> +       ret = of_perf_domain_get_sharing_cpumask(policy->cpu, "performance-domains",
> +                                                "#performance-domain-cells",
> +                                                policy->cpus, &args);
> +       if (ret < 0)
> +               return ret;
> +
> +       match = of_match_node(apple_soc_cpufreq_of_match, args.np);
> +       of_node_put(args.np);
> +       if (!match)
> +               return -ENODEV;
> +
> +       *info = match->data;
> +
> +       *reg_base = of_iomap(args.np, 0);
> +       if (IS_ERR(*reg_base))
> +               return PTR_ERR(*reg_base);
> +
> +       return 0;
> +}
> +
> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
> +       &cpufreq_freq_attr_scaling_available_freqs,
> +       NULL,
> +       NULL,
> +};
> +
> +static int apple_soc_cpufreq_init(struct cpufreq_policy *policy)
> +{
> +       int ret, i;
> +       unsigned int transition_latency;
> +       void __iomem *reg_base;
> +       struct device *cpu_dev;
> +       struct apple_cpu_priv *priv;
> +       const struct apple_soc_cpufreq_info *info;
> +       struct cpufreq_frequency_table *freq_table;
> +
> +       cpu_dev = get_cpu_device(policy->cpu);
> +       if (!cpu_dev) {
> +               pr_err("failed to get cpu%d device\n", policy->cpu);
> +               return -ENODEV;
> +       }
> +
> +       ret = dev_pm_opp_of_add_table(cpu_dev);
> +       if (ret < 0) {
> +               dev_err(cpu_dev, "%s: failed to add OPP table: %d\n", __func__, ret);
> +               return ret;
> +       }
> +
> +       ret = apple_soc_cpufreq_find_cluster(policy, &reg_base, &info);
> +       if (ret) {
> +               dev_err(cpu_dev, "%s: failed to get cluster info: %d\n", __func__, ret);
> +               return ret;
> +       }
> +
> +       ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
> +       if (ret) {
> +               dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n", __func__, ret);
> +               goto out_iounmap;
> +       }
> +
> +       ret = dev_pm_opp_get_opp_count(cpu_dev);
> +       if (ret <= 0) {
> +               dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");
> +               ret = -EPROBE_DEFER;
> +               goto out_free_opp;
> +       }
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv) {
> +               ret = -ENOMEM;
> +               goto out_free_opp;
> +       }
> +
> +       ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
> +       if (ret) {
> +               dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> +               goto out_free_priv;
> +       }
> +
> +       /* Get OPP levels (p-state indexes) and stash them in driver_data */
> +       for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
> +               unsigned long rate = freq_table[i].frequency * 1000;
> +               struct dev_pm_opp *opp = dev_pm_opp_find_freq_floor(cpu_dev, &rate);
> +
> +               if (IS_ERR(opp)) {
> +                       ret = PTR_ERR(opp);
> +                       goto out_free_cpufreq_table;
> +               }
> +               freq_table[i].driver_data = dev_pm_opp_get_level(opp);
> +               dev_pm_opp_put(opp);
> +       }
> +
> +       priv->cpu_dev = cpu_dev;
> +       priv->reg_base = reg_base;
> +       priv->info = info;
> +       policy->driver_data = priv;
> +       policy->freq_table = freq_table;
> +
> +       transition_latency = dev_pm_opp_get_max_transition_latency(cpu_dev);
> +       if (!transition_latency)
> +               transition_latency = CPUFREQ_ETERNAL;
> +
> +       policy->cpuinfo.transition_latency = transition_latency;
> +       policy->dvfs_possible_from_any_cpu = true;
> +       policy->fast_switch_possible = true;
> +
> +       if (policy_has_boost_freq(policy)) {
> +               ret = cpufreq_enable_boost_support();
> +               if (ret) {
> +                       dev_warn(cpu_dev, "failed to enable boost: %d\n", ret);
> +               } else {
> +                       apple_soc_cpufreq_hw_attr[1] = &cpufreq_freq_attr_scaling_boost_freqs;
> +                       apple_soc_cpufreq_driver.boost_enabled = true;
> +               }
> +       }
> +
> +       return 0;
> +
> +out_free_cpufreq_table:
> +       dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table);
> +out_free_priv:
> +       kfree(priv);
> +out_free_opp:
> +       dev_pm_opp_remove_all_dynamic(cpu_dev);
> +out_iounmap:
> +       iounmap(reg_base);
> +       return ret;
> +}
> +
> +static int apple_soc_cpufreq_exit(struct cpufreq_policy *policy)
> +{
> +       struct apple_cpu_priv *priv = policy->driver_data;
> +
> +       dev_pm_opp_free_cpufreq_table(priv->cpu_dev, &policy->freq_table);
> +       dev_pm_opp_remove_all_dynamic(priv->cpu_dev);
> +       iounmap(priv->reg_base);
> +       kfree(priv);
> +
> +       return 0;
> +}
> +
> +static struct cpufreq_driver apple_soc_cpufreq_driver = {
> +       .name           = "apple-cpufreq",
> +       .flags          = CPUFREQ_HAVE_GOVERNOR_PER_POLICY |
> +                         CPUFREQ_NEED_INITIAL_FREQ_CHECK | CPUFREQ_IS_COOLING_DEV,
> +       .verify         = cpufreq_generic_frequency_table_verify,
> +       .attr           = cpufreq_generic_attr,
> +       .get            = apple_soc_cpufreq_get_rate,
> +       .init           = apple_soc_cpufreq_init,
> +       .exit           = apple_soc_cpufreq_exit,
> +       .target_index   = apple_soc_cpufreq_set_target,
> +       .fast_switch    = apple_soc_cpufreq_fast_switch,
> +       .register_em    = cpufreq_register_em_with_opp,
> +       .attr           = apple_soc_cpufreq_hw_attr,
> +};
> +
> +static int __init apple_soc_cpufreq_module_init(void)
> +{
> +       if (!of_machine_is_compatible("apple,arm-platform"))
> +               return -ENODEV;
> +
> +       return cpufreq_register_driver(&apple_soc_cpufreq_driver);
> +}
> +module_init(apple_soc_cpufreq_module_init);
> +
> +static void __exit apple_soc_cpufreq_module_exit(void)
> +{
> +       cpufreq_unregister_driver(&apple_soc_cpufreq_driver);
> +}
> +module_exit(apple_soc_cpufreq_module_exit);
> +
> +MODULE_DEVICE_TABLE(of, apple_soc_cpufreq_of_match);
> +MODULE_AUTHOR("Hector Martin <marcan@marcan.st>");
> +MODULE_DESCRIPTION("Apple SoC CPU cluster DVFS driver");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c
> index 6ac3800db450..a108b9796770 100644
> --- a/drivers/cpufreq/cpufreq-dt-platdev.c
> +++ b/drivers/cpufreq/cpufreq-dt-platdev.c
> @@ -103,6 +103,8 @@ static const struct of_device_id allowlist[] __initconst = {
>  static const struct of_device_id blocklist[] __initconst = {
>         { .compatible = "allwinner,sun50i-h6", },
>
> +       { .compatible = "apple,arm-platform", },
> +
>         { .compatible = "arm,vexpress", },
>
>         { .compatible = "calxeda,highbank", },
> --
> 2.35.1
>
Hector Martin Nov. 1, 2022, 6:17 p.m. UTC | #5
On 02/11/2022 00.16, Ulf Hansson wrote:
> On Mon, 24 Oct 2022 at 06:40, Hector Martin <marcan@marcan.st> wrote:
>>
>> This driver implements CPU frequency scaling for Apple Silicon SoCs,
>> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
>>
>> Each CPU cluster has its own register set, and frequency management is
>> fully automated by the hardware; the driver only has to write one
>> register. There is boost frequency support, but the hardware will only
>> allow their use if only a subset of cores in a cluster are in
>> non-deep-idle. Since we don't support deep idle yet, these frequencies
>> are not achievable, but the driver supports them. They will remain
>> disabled in the device tree until deep idle is implemented, to avoid
>> confusing users.
> 
> Out of curiosity, may I ask if this implies the need of a
> synchronization mechanism on the Linux side? Or is the boost frequency
> dynamically managed solely by HW/FW?

It's managed by hardware - Linux gets to request whatever frequency it
wants, and the hardware will limit it to what is achievable given the
current idle states within the cluster (and it will change automatically
with them). So if Linux asks for 3.2 GHz but there are no deep idle
cores in the cluster, you get 3.0. If there's one deep idle core, you
get 3.1 (I think). Three, 3.2. So this driver doesn't have to do
anything (and will report the correct current-frequency as long as the
per-SoC compatible is matched; without that this feature is disabled and
it just reports the requested frequency). We could enable the boost
states today just fine, it's just that they would never actually be
reached by the hardware.

- Hector
Viresh Kumar Nov. 2, 2022, 5:37 a.m. UTC | #6
On 24-10-22, 13:39, Hector Martin wrote:
> of_perf_domain_get_sharing_cpumask currently assumes a 1-argument
> phandle format, and directly returns the argument. Generalize this to
> return the full of_phandle_args, so it can be used by drivers which use
> other phandle styles (e.g. separate nodes). This also requires changing
> the CPU sharing match to compare the full args structure.
> 
> Also, make sure to of_node_put(args.np) (the original code was leaking a
> reference).
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/cpufreq/mediatek-cpufreq-hw.c | 14 +++++++++-----
>  include/linux/cpufreq.h               | 28 +++++++++++++++------------
>  2 files changed, 25 insertions(+), 17 deletions(-)

Applied. Thanks.
Viresh Kumar Nov. 2, 2022, 6:18 a.m. UTC | #7
On 24-10-22, 13:39, Hector Martin wrote:
> diff --git a/drivers/cpufreq/apple-soc-cpufreq.c b/drivers/cpufreq/apple-soc-cpufreq.c
> +struct apple_soc_cpufreq_info {
> +	u64 max_pstate;
> +	u64 cur_pstate_mask;
> +	u64 cur_pstate_shift;
> +};
> +
> +struct apple_cpu_priv {
> +	struct device *cpu_dev;
> +	void __iomem *reg_base;
> +	const struct apple_soc_cpufreq_info *info;
> +};
> +
> +static struct cpufreq_driver apple_soc_cpufreq_driver;
> +
> +const struct apple_soc_cpufreq_info soc_t8103_info = {

static ? For other instances too.

> +	.max_pstate = 15,
> +	.cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8103,
> +	.cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8103,
> +};
> +
> +const struct apple_soc_cpufreq_info soc_t8112_info = {
> +	.max_pstate = 31,
> +	.cur_pstate_mask = APPLE_DVFS_STATUS_CUR_PS_T8112,
> +	.cur_pstate_shift = APPLE_DVFS_STATUS_CUR_PS_SHIFT_T8112,
> +};
> +
> +const struct apple_soc_cpufreq_info soc_default_info = {
> +	.max_pstate = 15,
> +	.cur_pstate_mask = 0, /* fallback */
> +};
> +
> +static const struct of_device_id apple_soc_cpufreq_of_match[] = {
> +	{
> +		.compatible = "apple,t8103-cluster-cpufreq",
> +		.data = &soc_t8103_info,
> +	},
> +	{

Isn't the preferred way for this is "}, {" instead ?

I couldn't find this in Coding Guidelines, but somehow remember that
to be the preferred format.

> +		.compatible = "apple,t8112-cluster-cpufreq",
> +		.data = &soc_t8112_info,
> +	},
> +	{
> +		.compatible = "apple,cluster-cpufreq",
> +		.data = &soc_default_info,
> +	},
> +	{}
> +};
> +
> +static unsigned int apple_soc_cpufreq_get_rate(unsigned int cpu)
> +{
> +	struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
> +	struct apple_cpu_priv *priv = policy->driver_data;
> +	unsigned int pstate;
> +	unsigned int i;

Merge these two ?

> +
> +	if (priv->info->cur_pstate_mask) {
> +		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_STATUS);
> +
> +		pstate = (reg & priv->info->cur_pstate_mask) >>  priv->info->cur_pstate_shift;
> +	} else {
> +		/*
> +		 * For the fallback case we might not know the layout of DVFS_STATUS,
> +		 * so just use the command register value (which ignores boost limitations).
> +		 */
> +		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_CMD);
> +
> +		pstate = FIELD_GET(APPLE_DVFS_CMD_PS1, reg);
> +	}
> +
> +	for (i = 0; policy->freq_table[i].frequency != CPUFREQ_TABLE_END; i++)

You may want to use, cpufreq_for_each_valid_entry(), or some other
generic iterator here.

> +		if (policy->freq_table[i].driver_data == pstate)
> +			return policy->freq_table[i].frequency;
> +
> +	dev_err(priv->cpu_dev, "could not find frequency for pstate %d\n",
> +		pstate);
> +	return 0;
> +}
> +
> +static int apple_soc_cpufreq_set_target(struct cpufreq_policy *policy,
> +					unsigned int index)
> +{
> +	struct apple_cpu_priv *priv = policy->driver_data;
> +	unsigned int pstate = policy->freq_table[index].driver_data;
> +	u64 reg;
> +
> +	/* Fallback for newer SoCs */
> +	if (index > priv->info->max_pstate)
> +		index = priv->info->max_pstate;
> +
> +	if (readq_poll_timeout_atomic(priv->reg_base + APPLE_DVFS_CMD, reg,
> +				      !(reg & APPLE_DVFS_CMD_BUSY), 2,
> +				      APPLE_DVFS_TRANSITION_TIMEOUT)) {
> +		return -EIO;
> +	}
> +
> +	reg &= ~(APPLE_DVFS_CMD_PS1 | APPLE_DVFS_CMD_PS2);
> +	reg |= FIELD_PREP(APPLE_DVFS_CMD_PS1, pstate);
> +	reg |= FIELD_PREP(APPLE_DVFS_CMD_PS2, pstate);
> +	reg |= APPLE_DVFS_CMD_SET;
> +
> +	writeq_relaxed(reg, priv->reg_base + APPLE_DVFS_CMD);
> +
> +	return 0;
> +}
> +
> +static unsigned int apple_soc_cpufreq_fast_switch(struct cpufreq_policy *policy,
> +						  unsigned int target_freq)
> +{
> +	if (apple_soc_cpufreq_set_target(policy, policy->cached_resolved_idx) < 0)
> +		return 0;
> +
> +	return policy->freq_table[policy->cached_resolved_idx].frequency;
> +}
> +
> +static int apple_soc_cpufreq_find_cluster(struct cpufreq_policy *policy,
> +					  void __iomem **reg_base,
> +					  const struct apple_soc_cpufreq_info **info)
> +{
> +	struct of_phandle_args args;
> +	const struct of_device_id *match;
> +	int ret = 0;
> +
> +	ret = of_perf_domain_get_sharing_cpumask(policy->cpu, "performance-domains",
> +						 "#performance-domain-cells",
> +						 policy->cpus, &args);
> +	if (ret < 0)
> +		return ret;
> +
> +	match = of_match_node(apple_soc_cpufreq_of_match, args.np);
> +	of_node_put(args.np);
> +	if (!match)
> +		return -ENODEV;
> +
> +	*info = match->data;
> +
> +	*reg_base = of_iomap(args.np, 0);
> +	if (IS_ERR(*reg_base))
> +		return PTR_ERR(*reg_base);
> +
> +	return 0;
> +}
> +
> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
> +	&cpufreq_freq_attr_scaling_available_freqs,
> +	NULL,
> +	NULL,
> +};
> +
> +static int apple_soc_cpufreq_init(struct cpufreq_policy *policy)
> +{
> +	int ret, i;
> +	unsigned int transition_latency;
> +	void __iomem *reg_base;
> +	struct device *cpu_dev;
> +	struct apple_cpu_priv *priv;
> +	const struct apple_soc_cpufreq_info *info;
> +	struct cpufreq_frequency_table *freq_table;
> +
> +	cpu_dev = get_cpu_device(policy->cpu);
> +	if (!cpu_dev) {
> +		pr_err("failed to get cpu%d device\n", policy->cpu);
> +		return -ENODEV;
> +	}
> +
> +	ret = dev_pm_opp_of_add_table(cpu_dev);
> +	if (ret < 0) {
> +		dev_err(cpu_dev, "%s: failed to add OPP table: %d\n", __func__, ret);
> +		return ret;
> +	}
> +
> +	ret = apple_soc_cpufreq_find_cluster(policy, &reg_base, &info);
> +	if (ret) {
> +		dev_err(cpu_dev, "%s: failed to get cluster info: %d\n", __func__, ret);
> +		return ret;
> +	}
> +
> +	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);

Why do you need this ? The OPP core should be able to find this
information by itself in your case AFAIU. The OPP core will refer
"operating-points-v2 = <&pcluster_opp>" and find that the cores are
related.

> +	if (ret) {
> +		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n", __func__, ret);
> +		goto out_iounmap;
> +	}
> +
> +	ret = dev_pm_opp_get_opp_count(cpu_dev);
> +	if (ret <= 0) {
> +		dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");

Why would this happen in your case ?

> +		ret = -EPROBE_DEFER;
> +		goto out_free_opp;
> +	}
> +
> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +	if (!priv) {
> +		ret = -ENOMEM;
> +		goto out_free_opp;
> +	}
> +
> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
> +	if (ret) {
> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
> +		goto out_free_priv;
> +	}
> +
> +	/* Get OPP levels (p-state indexes) and stash them in driver_data */
> +	for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
> +		unsigned long rate = freq_table[i].frequency * 1000;
> +		struct dev_pm_opp *opp = dev_pm_opp_find_freq_floor(cpu_dev, &rate);

Shouldn't you use dev_pm_opp_find_freq_exact() here ?
Hector Martin Nov. 9, 2022, 12:13 p.m. UTC | #8
On 24/10/2022 17.27, Marc Zyngier wrote:
> On Mon, 24 Oct 2022 05:39:24 +0100,
> Hector Martin <marcan@marcan.st> wrote:
>>
>> This driver implements CPU frequency scaling for Apple Silicon SoCs,
>> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
>>
>> Each CPU cluster has its own register set, and frequency management is
>> fully automated by the hardware; the driver only has to write one
>> register. There is boost frequency support, but the hardware will only
>> allow their use if only a subset of cores in a cluster are in
>> non-deep-idle. Since we don't support deep idle yet, these frequencies
>> are not achievable, but the driver supports them. They will remain
>> disabled in the device tree until deep idle is implemented, to avoid
>> confusing users.
>>
>> This driver does not yet implement the memory controller performance
>> state tuning that usually accompanies higher CPU p-states. This will be
>> done in a future patch.
>>
>> Signed-off-by: Hector Martin <marcan@marcan.st>
>> ---
>>  drivers/cpufreq/Kconfig.arm          |   9 +
>>  drivers/cpufreq/Makefile             |   1 +
>>  drivers/cpufreq/apple-soc-cpufreq.c  | 352 +++++++++++++++++++++++++++
>>  drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
>>  4 files changed, 364 insertions(+)
>>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c
>>
> 
> [...]
> 
>> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
>> +	&cpufreq_freq_attr_scaling_available_freqs,
>> +	NULL,
>> +	NULL,
> 
> nit: extra NULL?

That slot gets filled in later if boost is enabled, hence the need for
an extra terminating NULL in that case.

- Hector
Hector Martin Nov. 9, 2022, 12:36 p.m. UTC | #9
On 02/11/2022 15.18, Viresh Kumar wrote:
> On 24-10-22, 13:39, Hector Martin wrote:
>> +const struct apple_soc_cpufreq_info soc_t8103_info = {
> 
> static ? For other instances too.

Ack, fixed.

>> +static const struct of_device_id apple_soc_cpufreq_of_match[] = {
>> +	{
>> +		.compatible = "apple,t8103-cluster-cpufreq",
>> +		.data = &soc_t8103_info,
>> +	},
>> +	{
> 
> Isn't the preferred way for this is "}, {" instead ?
> 
> I couldn't find this in Coding Guidelines, but somehow remember that
> to be the preferred format.

I did an informal search and the two-line form seems to be more common,
though both are in widespread use. I can change it if you want, though
it seems kind of a wash.

>> +static unsigned int apple_soc_cpufreq_get_rate(unsigned int cpu)
>> +{
>> +	struct cpufreq_policy *policy = cpufreq_cpu_get_raw(cpu);
>> +	struct apple_cpu_priv *priv = policy->driver_data;
>> +	unsigned int pstate;
>> +	unsigned int i;
> 
> Merge these two ?

Done.

> 
>> +
>> +	if (priv->info->cur_pstate_mask) {
>> +		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_STATUS);
>> +
>> +		pstate = (reg & priv->info->cur_pstate_mask) >>  priv->info->cur_pstate_shift;
>> +	} else {
>> +		/*
>> +		 * For the fallback case we might not know the layout of DVFS_STATUS,
>> +		 * so just use the command register value (which ignores boost limitations).
>> +		 */
>> +		u64 reg = readq_relaxed(priv->reg_base + APPLE_DVFS_CMD);
>> +
>> +		pstate = FIELD_GET(APPLE_DVFS_CMD_PS1, reg);
>> +	}
>> +
>> +	for (i = 0; policy->freq_table[i].frequency != CPUFREQ_TABLE_END; i++)
> 
> You may want to use, cpufreq_for_each_valid_entry(), or some other
> generic iterator here.

Done.

>> +	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
> 
> Why do you need this ? The OPP core should be able to find this
> information by itself in your case AFAIU. The OPP core will refer
> "operating-points-v2 = <&pcluster_opp>" and find that the cores are
> related.

We have multiple clusters sharing an OPP table (e.g. the M1 Ultra has 2
e-cluster and 4 p-clusters, and duplicating OPP tables seems very
silly), so this is necessary to tell it about the subset of cores
sharing a table that are actually one domain.

> 
>> +	if (ret) {
>> +		dev_err(cpu_dev, "%s: failed to mark OPPs as shared: %d\n", __func__, ret);
>> +		goto out_iounmap;
>> +	}
>> +
>> +	ret = dev_pm_opp_get_opp_count(cpu_dev);
>> +	if (ret <= 0) {
>> +		dev_dbg(cpu_dev, "OPP table is not ready, deferring probe\n");
> 
> Why would this happen in your case ?

Good question. This came from scpi-cpufreq.c. It sounds like it doesn't
make any sense here; the error path should just error out, not defer.
I'll change it to that.

>> +		ret = -EPROBE_DEFER;
>> +		goto out_free_opp;
>> +	}
>> +
>> +	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> +	if (!priv) {
>> +		ret = -ENOMEM;
>> +		goto out_free_opp;
>> +	}
>> +
>> +	ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table);
>> +	if (ret) {
>> +		dev_err(cpu_dev, "failed to init cpufreq table: %d\n", ret);
>> +		goto out_free_priv;
>> +	}
>> +
>> +	/* Get OPP levels (p-state indexes) and stash them in driver_data */
>> +	for (i = 0; freq_table[i].frequency != CPUFREQ_TABLE_END; i++) {
>> +		unsigned long rate = freq_table[i].frequency * 1000;
>> +		struct dev_pm_opp *opp = dev_pm_opp_find_freq_floor(cpu_dev, &rate);
> 
> Shouldn't you use dev_pm_opp_find_freq_exact() here ?

Actually, it would seem the correct thing to do is
dev_pm_opp_find_freq_ceil, or otherwise use _floor and add 999.
dev_pm_opp_init_cpufreq_table() truncates down to kHz, so the real
frequency will always be between `rate` and `rate + 999` here. This
makes it work with frequencies that aren't a multiple of 1 kHz (we don't
have any of those but it seems broken not to support it).

- Hector
Marc Zyngier Nov. 9, 2022, 2:20 p.m. UTC | #10
On Wed, 09 Nov 2022 12:13:33 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> On 24/10/2022 17.27, Marc Zyngier wrote:
> > On Mon, 24 Oct 2022 05:39:24 +0100,
> > Hector Martin <marcan@marcan.st> wrote:
> >>
> >> This driver implements CPU frequency scaling for Apple Silicon SoCs,
> >> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
> >>
> >> Each CPU cluster has its own register set, and frequency management is
> >> fully automated by the hardware; the driver only has to write one
> >> register. There is boost frequency support, but the hardware will only
> >> allow their use if only a subset of cores in a cluster are in
> >> non-deep-idle. Since we don't support deep idle yet, these frequencies
> >> are not achievable, but the driver supports them. They will remain
> >> disabled in the device tree until deep idle is implemented, to avoid
> >> confusing users.
> >>
> >> This driver does not yet implement the memory controller performance
> >> state tuning that usually accompanies higher CPU p-states. This will be
> >> done in a future patch.
> >>
> >> Signed-off-by: Hector Martin <marcan@marcan.st>
> >> ---
> >>  drivers/cpufreq/Kconfig.arm          |   9 +
> >>  drivers/cpufreq/Makefile             |   1 +
> >>  drivers/cpufreq/apple-soc-cpufreq.c  | 352 +++++++++++++++++++++++++++
> >>  drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
> >>  4 files changed, 364 insertions(+)
> >>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c
> >>
> > 
> > [...]
> > 
> >> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
> >> +	&cpufreq_freq_attr_scaling_available_freqs,
> >> +	NULL,
> >> +	NULL,
> > 
> > nit: extra NULL?
> 
> That slot gets filled in later if boost is enabled, hence the need for
> an extra terminating NULL in that case.

Right. Consider placing a comment next to the first NULL so that
someone else doesn't consider it useless and accidentally removes
it...

	M.
Hector Martin Nov. 9, 2022, 3:39 p.m. UTC | #11
On 09/11/2022 23.20, Marc Zyngier wrote:
> On Wed, 09 Nov 2022 12:13:33 +0000,
> Hector Martin <marcan@marcan.st> wrote:
>>
>> On 24/10/2022 17.27, Marc Zyngier wrote:
>>> On Mon, 24 Oct 2022 05:39:24 +0100,
>>> Hector Martin <marcan@marcan.st> wrote:
>>>>
>>>> This driver implements CPU frequency scaling for Apple Silicon SoCs,
>>>> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
>>>>
>>>> Each CPU cluster has its own register set, and frequency management is
>>>> fully automated by the hardware; the driver only has to write one
>>>> register. There is boost frequency support, but the hardware will only
>>>> allow their use if only a subset of cores in a cluster are in
>>>> non-deep-idle. Since we don't support deep idle yet, these frequencies
>>>> are not achievable, but the driver supports them. They will remain
>>>> disabled in the device tree until deep idle is implemented, to avoid
>>>> confusing users.
>>>>
>>>> This driver does not yet implement the memory controller performance
>>>> state tuning that usually accompanies higher CPU p-states. This will be
>>>> done in a future patch.
>>>>
>>>> Signed-off-by: Hector Martin <marcan@marcan.st>
>>>> ---
>>>>  drivers/cpufreq/Kconfig.arm          |   9 +
>>>>  drivers/cpufreq/Makefile             |   1 +
>>>>  drivers/cpufreq/apple-soc-cpufreq.c  | 352 +++++++++++++++++++++++++++
>>>>  drivers/cpufreq/cpufreq-dt-platdev.c |   2 +
>>>>  4 files changed, 364 insertions(+)
>>>>  create mode 100644 drivers/cpufreq/apple-soc-cpufreq.c
>>>>
>>>
>>> [...]
>>>
>>>> +static struct freq_attr *apple_soc_cpufreq_hw_attr[] = {
>>>> +	&cpufreq_freq_attr_scaling_available_freqs,
>>>> +	NULL,
>>>> +	NULL,
>>>
>>> nit: extra NULL?
>>
>> That slot gets filled in later if boost is enabled, hence the need for
>> an extra terminating NULL in that case.
> 
> Right. Consider placing a comment next to the first NULL so that
> someone else doesn't consider it useless and accidentally removes
> it...
> 

Good point, done :)

- Hector
Viresh Kumar Nov. 14, 2022, 6:51 a.m. UTC | #12
On 09-11-22, 21:36, Hector Martin wrote:
> On 02/11/2022 15.18, Viresh Kumar wrote:
> >> +	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
> > 
> > Why do you need this ? The OPP core should be able to find this
> > information by itself in your case AFAIU. The OPP core will refer
> > "operating-points-v2 = <&pcluster_opp>" and find that the cores are
> > related.
> 
> We have multiple clusters sharing an OPP table (e.g. the M1 Ultra has 2
> e-cluster and 4 p-clusters, and duplicating OPP tables seems very
> silly), so this is necessary to tell it about the subset of cores
> sharing a table that are actually one domain.

The cluster sharing information is already part of the OPP tables, "opp-shared"
property. Platforms like scpi needed this because they didn't have the OPP table
in DT and so no way to find out the relation of the CPUs.

See how drivers/cpufreq/mediatek-cpufreq.c has done this.
dev_pm_opp_of_get_sharing_cpus() followed by dev_pm_opp_of_cpumask_add_table().
Hector Martin Nov. 14, 2022, 6:57 a.m. UTC | #13
On 14/11/2022 15.51, Viresh Kumar wrote:
> On 09-11-22, 21:36, Hector Martin wrote:
>> On 02/11/2022 15.18, Viresh Kumar wrote:
>>>> +	ret = dev_pm_opp_set_sharing_cpus(cpu_dev, policy->cpus);
>>>
>>> Why do you need this ? The OPP core should be able to find this
>>> information by itself in your case AFAIU. The OPP core will refer
>>> "operating-points-v2 = <&pcluster_opp>" and find that the cores are
>>> related.
>>
>> We have multiple clusters sharing an OPP table (e.g. the M1 Ultra has 2
>> e-cluster and 4 p-clusters, and duplicating OPP tables seems very
>> silly), so this is necessary to tell it about the subset of cores
>> sharing a table that are actually one domain.
> 
> The cluster sharing information is already part of the OPP tables, "opp-shared"
> property. Platforms like scpi needed this because they didn't have the OPP table
> in DT and so no way to find out the relation of the CPUs.
> 
> See how drivers/cpufreq/mediatek-cpufreq.c has done this.
> dev_pm_opp_of_get_sharing_cpus() followed by dev_pm_opp_of_cpumask_add_table().

I don't think you understood me. We have multiple identical clusters.
All those clusters share an OPP table but are *not* the same cpufreq
domain. So we can have 8 CPUs which are two 4-CPU cluster using one OPP
table.

There is no way to express this relationship with OPP tables without
duplicating the tables themselves.

- Hector
Viresh Kumar Nov. 14, 2022, 7:03 a.m. UTC | #14
On 14-11-22, 15:57, Hector Martin wrote:
> I don't think you understood me. We have multiple identical clusters.
> All those clusters share an OPP table but are *not* the same cpufreq
> domain. So we can have 8 CPUs which are two 4-CPU cluster using one OPP
> table.

I looked at the patch 5/5. It shows two clusters, e and p, with four CPUs in
each cluster. And as per the OPP table all the CPUs in cluster e are part of
same frequency domain, i.e. switch rate together. Same with all CPUs of cluster
p.

And I also see both the clusters have separate OPP tables.

> There is no way to express this relationship with OPP tables without
> duplicating the tables themselves.

Can you show how the DT looks in this case ? I am still not clear on what the
scenario is here :(
Hector Martin Nov. 14, 2022, 11:06 a.m. UTC | #15
For the benefit of everyone else: I screwed up a reply from mobile and
ended up with a brief exchange offline, but here's the useful part of
the context. Viresh also pointed out dropping opp-shared if we don't use
that mechanism, so I'll do that for v4.

On 14/11/2022 16.03, Viresh Kumar wrote:
> On 14-11-22, 15:57, Hector Martin wrote:
>> There is no way to express this relationship with OPP tables without
>> duplicating the tables themselves.
> 
> Can you show how the DT looks in this case ? I am still not clear on what the
> scenario is here :(

Here's the most complicated one we have so far: 20 cores and 6 clusters
sharing 2 OPP tables (mind the include and define fun - this is the best
way we could come up with to express two SoC dies glued together into
one MCM).

https://github.com/AsahiLinux/linux/blob/asahi-wip/arch/arm64/boot/dts/apple/t6002.dtsi

- Hector
Ulf Hansson Nov. 14, 2022, 7:49 p.m. UTC | #16
On Tue, 1 Nov 2022 at 19:17, Hector Martin <marcan@marcan.st> wrote:
>
> On 02/11/2022 00.16, Ulf Hansson wrote:
> > On Mon, 24 Oct 2022 at 06:40, Hector Martin <marcan@marcan.st> wrote:
> >>
> >> This driver implements CPU frequency scaling for Apple Silicon SoCs,
> >> including M1 (t8103), M1 Max/Pro/Ultra (t600x), and M2 (t8112).
> >>
> >> Each CPU cluster has its own register set, and frequency management is
> >> fully automated by the hardware; the driver only has to write one
> >> register. There is boost frequency support, but the hardware will only
> >> allow their use if only a subset of cores in a cluster are in
> >> non-deep-idle. Since we don't support deep idle yet, these frequencies
> >> are not achievable, but the driver supports them. They will remain
> >> disabled in the device tree until deep idle is implemented, to avoid
> >> confusing users.
> >
> > Out of curiosity, may I ask if this implies the need of a
> > synchronization mechanism on the Linux side? Or is the boost frequency
> > dynamically managed solely by HW/FW?
>
> It's managed by hardware - Linux gets to request whatever frequency it
> wants, and the hardware will limit it to what is achievable given the
> current idle states within the cluster (and it will change automatically
> with them). So if Linux asks for 3.2 GHz but there are no deep idle
> cores in the cluster, you get 3.0. If there's one deep idle core, you
> get 3.1 (I think). Three, 3.2. So this driver doesn't have to do
> anything (and will report the correct current-frequency as long as the
> per-SoC compatible is matched; without that this feature is disabled and
> it just reports the requested frequency). We could enable the boost
> states today just fine, it's just that they would never actually be
> reached by the hardware.

Thanks for sharing these details. It's always nice to know a bit more
about how the HW works!

From the reviewing point of view, I don't have more to add at this point!

Kind regards
Uffe