[0/6] irqchip/apple-aic: Add support for AICv2

Message ID	20211209043249.65474-1-marcan@marcan.st
Headers	show Return-Path: <devicetree-owner@vger.kernel.org> sender: hector@marcansoft.com) by mail.marcansoft.com (Postfix) with ESMTPSA id 98D9E41E64; Thu, 9 Dec 2021 04:33:13 +0000 (UTC) From: Hector Martin <marcan@marcan.st> To: Thomas Gleixner <tglx@linutronix.de>, Marc Zyngier <maz@kernel.org>, Rob Herring <robh+dt@kernel.org> Cc: Hector Martin <marcan@marcan.st>, Sven Peter <sven@svenpeter.dev>, Alyssa Rosenzweig <alyssa@rosenzweig.io>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, devicetree@vger.kernel.org Subject: [PATCH 0/6] irqchip/apple-aic: Add support for AICv2 Date: Thu, 9 Dec 2021 13:32:43 +0900 Message-Id: <20211209043249.65474-1-marcan@marcan.st> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	irqchip/apple-aic: Add support for AICv2 \| expand [0/6] irqchip/apple-aic: Add support for AICv2 [1/6] dt-bindings: interrupt-controller: apple,aic: Add apple,aic2 support

Hector Martin Dec. 9, 2021, 4:32 a.m. UTC

Hi folks,

In the t6000/t6001 (M1 Pro / Max) SoCs, Apple introduced a new version
of their interrupt controller. This is a significant departure from
AICv1 and seems designed to better scale to larger chips. This series
adds support for it to the existing AIC driver.

Gone are CPU affinities; instead there seems to be some kind of
"automagic" dispatch to willing CPU cores, and cores can also opt-out
via an IMP-DEF sysreg (!). Right now the bootloader just sets up all
cores to accept IRQs, and we ignore all this and let the magic
algorithm pick a CPU to accept the IRQ. In the future, we might start
making use of these finer-grained capabilities for e.g. better
real-time guarantees (CPUs running RT threads might opt out of IRQs).

Legacy IPI support is also gone, so this implements Fast IPI support.
Fast IPIs are implemented entirely in the CPU core complexes, using
FIQs and IMP-DEF sysregs. This is also supported on t8103/M1, so we
enable it there too, but we keep the legacy AIC IPI codepath in case
it is useful for backporting to older chips.

This also adds support for multi-die AIC2 controllers. While no
multi-die products exist yet, the AIC2 in t600x is built to support
up to 2 dies, and it's pretty clear how it works, so let's implement
it. If we're lucky, when multi-die products roll around, this will
let us support them with only DT changes. In order to support the
extra die dimension, this introduces a 4-argument IRQ phandle form
(3-argument is always supported and just implies die 0).

All register offsets are computed based on capability register values,
which should allow forward-compatibility with future AIC2 variants...
except for one. For some inexplicable reason, the number of actually
implemented die register sets is nowhere to be found (t600x has 2,
but claims 1 die in use and 8 dies max, neither of which is what we
need), and this is necessary to compute the event register offset,
which is page-aligned after the die register sets. We have no choice
but to stick this offset in the device tree... which is the same thing
Apple do in their ADT.

Hector Martin (6):
  dt-bindings: interrupt-controller: apple,aic: Add apple,aic2 support
  irqchip/apple-aic: Add Fast IPI support
  irqchip/apple-aic: Switch to irq_domain_create_tree and sparse hwirqs
  irqchip/apple-aic: Dynamically compute register offsets
  irqchip/apple-aic: Support multiple dies
  irqchip/apple-aic: Add support for AICv2

 .../interrupt-controller/apple,aic.yaml       |  62 ++-
 drivers/irqchip/irq-apple-aic.c               | 419 ++++++++++++++----
 2 files changed, 393 insertions(+), 88 deletions(-)

Marc Zyngier Dec. 12, 2021, 12:21 p.m. UTC | #1

On Thu, 09 Dec 2021 04:32:45 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> The newer AICv2 present in t600x SoCs does not have legacy IPI support
> at all. Since t8103 also supports Fast IPIs, implement support for this
> first. The legacy IPI code is left as a fallback, so it can be
> potentially used by older SoCs in the future.
> 
> The vIPI code is shared; only the IPI firing/acking bits change for Fast
> IPIs.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/irqchip/irq-apple-aic.c | 112 ++++++++++++++++++++++++++++----
>  1 file changed, 98 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
> index 3759dc36cc8f..1aa63580cae4 100644
> --- a/drivers/irqchip/irq-apple-aic.c
> +++ b/drivers/irqchip/irq-apple-aic.c
> @@ -24,7 +24,7 @@
>   * - Default "this CPU" register view and explicit per-CPU views
>   *
>   * In addition, this driver also handles FIQs, as these are routed to the same
> - * IRQ vector. These are used for Fast IPIs (TODO), the ARMv8 timer IRQs, and
> + * IRQ vector. These are used for Fast IPIs, the ARMv8 timer IRQs, and
>   * performance counters (TODO).
>   *
>   * Implementation notes:
> @@ -106,7 +106,6 @@
>  
>  /*
>   * IMP-DEF sysregs that control FIQ sources
> - * Note: sysreg-based IPIs are not supported yet.
>   */
>  
>  /* Core PMC control register */
> @@ -155,6 +154,10 @@
>  #define SYS_IMP_APL_UPMSR_EL1		sys_reg(3, 7, 15, 6, 4)
>  #define UPMSR_IACT			BIT(0)
>  
> +/* MPIDR fields */
> +#define MPIDR_CPU			GENMASK(7, 0)
> +#define MPIDR_CLUSTER			GENMASK(15, 8)

This should be defined in terms of MPIDR_AFFINITY_LEVEL() and co.

> +
>  #define AIC_NR_FIQ		4
>  #define AIC_NR_SWIPI		32
>  
> @@ -173,12 +176,42 @@
>  #define AIC_TMR_EL02_PHYS	AIC_TMR_GUEST_PHYS
>  #define AIC_TMR_EL02_VIRT	AIC_TMR_GUEST_VIRT
>  
> +struct aic_info {
> +	int version;
> +
> +	/* Features */
> +	bool fast_ipi;
> +};
> +
> +static const struct aic_info aic1_info = {
> +	.version	= 1,
> +};
> +
> +static const struct aic_info aic1_fipi_info = {
> +	.version	= 1,
> +
> +	.fast_ipi	= true,

Do you anticipate multiple feature flags like this? If so, maybe we
should consider biting the bullet and making this an unsigned long
populated with discrete flags.

Not something we need to decide now though.

> +};
> +
> +static const struct of_device_id aic_info_match[] = {
> +	{
> +		.compatible = "apple,t8103-aic",
> +		.data = &aic1_fipi_info,
> +	},
> +	{
> +		.compatible = "apple,aic",
> +		.data = &aic1_info,
> +	},
> +	{}
> +};
> +
>  struct aic_irq_chip {
>  	void __iomem *base;
>  	struct irq_domain *hw_domain;
>  	struct irq_domain *ipi_domain;
>  	int nr_hw;
> -	int ipi_hwirq;
> +
> +	struct aic_info info;
>  };
>  
>  static DEFINE_PER_CPU(uint32_t, aic_fiq_unmasked);
> @@ -387,8 +420,12 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs)
>  	 */
>  
>  	if (read_sysreg_s(SYS_IMP_APL_IPI_SR_EL1) & IPI_SR_PENDING) {
> -		pr_err_ratelimited("Fast IPI fired. Acking.\n");
> -		write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1);
> +		if (aic_irqc->info.fast_ipi) {

On the other hand, this is likely to hit on the fast path. Given that
we know at probe time whether we support SR-based IPIs, we can turn
this into a static key and save a few fetches on every IPI. It applies
everywhere you look at this flag at runtime.

> +			aic_handle_ipi(regs);
> +		} else {
> +			pr_err_ratelimited("Fast IPI fired. Acking.\n");
> +			write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1);
> +		}
>  	}
>  
>  	if (TIMER_FIRING(read_sysreg(cntp_ctl_el0)))
> @@ -564,6 +601,21 @@ static const struct irq_domain_ops aic_irq_domain_ops = {
>   * IPI irqchip
>   */
>  
> +static void aic_ipi_send_fast(int cpu)
> +{
> +	u64 mpidr = cpu_logical_map(cpu);
> +	u64 my_mpidr = cpu_logical_map(smp_processor_id());

This is the equivalent of reading MPIDR_EL1. My gut feeling is that it
is a bit faster to access the sysreg than a percpu lookup, a function
call and another memory access.

> +	u64 cluster = FIELD_GET(MPIDR_CLUSTER, mpidr);

See my earlier comment. This really should be:

	u64 cluster = MIPDR_AFFINITY_LEVEL(mpidr, 1);

rather than inventing a new decoding scheme.

> +	u64 idx = FIELD_GET(MPIDR_CPU, mpidr);
> +
> +	if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster)
> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx),
> +			       SYS_IMP_APL_IPI_RR_LOCAL_EL1);
> +	else
> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster),
> +			       SYS_IMP_APL_IPI_RR_GLOBAL_EL1);

Don't you need an ISB, either here or in the two callers? At the
moment, I don't see what will force the execution of these writes, and
they could be arbitrarily delayed.

> +}
> +
>  static void aic_ipi_mask(struct irq_data *d)
>  {
>  	u32 irq_bit = BIT(irqd_to_hwirq(d));
> @@ -589,8 +641,12 @@ static void aic_ipi_unmask(struct irq_data *d)
>  	 * If a pending vIPI was unmasked, raise a HW IPI to ourselves.
>  	 * No barriers needed here since this is a self-IPI.
>  	 */
> -	if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit)
> -		aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id()));
> +	if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) {
> +		if (ic->info.fast_ipi)
> +			aic_ipi_send_fast(smp_processor_id());

nit: if this is common enough, maybe having an aic_ipi_send_self_fast
could be better. Needs evaluation though.

> +		else
> +			aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id()));
> +	}
>  }
>  
>  static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
> @@ -618,8 +674,12 @@ static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
>  		smp_mb__after_atomic();
>  
>  		if (!(pending & irq_bit) &&
> -		    (atomic_read(per_cpu_ptr(&aic_vipi_enable, cpu)) & irq_bit))
> -			send |= AIC_IPI_SEND_CPU(cpu);
> +		    (atomic_read(per_cpu_ptr(&aic_vipi_enable, cpu)) & irq_bit)) {
> +			if (ic->info.fast_ipi)
> +				aic_ipi_send_fast(cpu);
> +			else
> +				send |= AIC_IPI_SEND_CPU(cpu);
> +		}
>  	}
>  
>  	/*
> @@ -651,8 +711,16 @@ static void aic_handle_ipi(struct pt_regs *regs)
>  	/*
>  	 * Ack the IPI. We need to order this after the AIC event read, but
>  	 * that is enforced by normal MMIO ordering guarantees.
> +	 *
> +	 * For the Fast IPI case, this needs to be ordered before the vIPI
> +	 * handling below, so we need to isb();
>  	 */
> -	aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER);
> +	if (aic_irqc->info.fast_ipi) {
> +		write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1);
> +		isb();
> +	} else {
> +		aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER);
> +	}
>  
>  	/*
>  	 * The mask read does not need to be ordered. Only we can change
> @@ -680,7 +748,8 @@ static void aic_handle_ipi(struct pt_regs *regs)
>  	 * No ordering needed here; at worst this just changes the timing of
>  	 * when the next IPI will be delivered.
>  	 */
> -	aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
> +	if (!aic_irqc->info.fast_ipi)
> +		aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
>  }
>  
>  static int aic_ipi_alloc(struct irq_domain *d, unsigned int virq,
> @@ -779,8 +848,12 @@ static int aic_init_cpu(unsigned int cpu)
>  	 * by AIC during processing). We manage masks at the vIPI level.
>  	 */
>  	aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER);
> -	aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF);
> -	aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
> +	if (!aic_irqc->info.fast_ipi) {
> +		aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF);
> +		aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
> +	} else {
> +		aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER);
> +	}
>  
>  	/* Initialize the local mask state */
>  	__this_cpu_write(aic_fiq_unmasked, 0);
> @@ -800,6 +873,7 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	void __iomem *regs;
>  	u32 info;
>  	struct aic_irq_chip *irqc;
> +	const struct of_device_id *match;
>  
>  	regs = of_iomap(node, 0);
>  	if (WARN_ON(!regs))
> @@ -809,9 +883,16 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	if (!irqc)
>  		return -ENOMEM;
>  
> -	aic_irqc = irqc;
>  	irqc->base = regs;
>  
> +	match = of_match_node(aic_info_match, node);
> +	if (!match)
> +		return -ENODEV;
> +
> +	irqc->info = *(struct aic_info *)match->data;

Why the copy? All the data is const, and isn't going away.

> +
> +	aic_irqc = irqc;
> +
>  	info = aic_ic_read(irqc, AIC_INFO);
>  	irqc->nr_hw = FIELD_GET(AIC_INFO_NR_HW, info);
>  
> @@ -846,6 +927,9 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	if (!is_kernel_in_hyp_mode())
>  		pr_info("Kernel running in EL1, mapping interrupts");
>  
> +	if (irqc->info.fast_ipi)
> +		pr_info("Using Fast IPIs");
> +
>  	cpuhp_setup_state(CPUHP_AP_IRQ_APPLE_AIC_STARTING,
>  			  "irqchip/apple-aic/ipi:starting",
>  			  aic_init_cpu, NULL);

Thanks,

	M.

Marc Zyngier Dec. 12, 2021, 2:37 p.m. UTC | #2

On Thu, 09 Dec 2021 04:32:46 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> This allows us to directly use the hardware event number as the hwirq
> number. Since IRQ events have bit 16 set (type=1), FIQs now move to
> starting at hwirq number 0.
> 
> This will become more important once multi-die support is introduced in
> a later commit.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/irqchip/irq-apple-aic.c | 67 ++++++++++++++++++---------------
>  1 file changed, 36 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
> index 1aa63580cae4..572d1af175fc 100644
> --- a/drivers/irqchip/irq-apple-aic.c
> +++ b/drivers/irqchip/irq-apple-aic.c
> @@ -66,7 +66,7 @@
>   */
>  
>  #define AIC_INFO		0x0004
> -#define AIC_INFO_NR_HW		GENMASK(15, 0)
> +#define AIC_INFO_NR_IRQ		GENMASK(15, 0)
>  
>  #define AIC_CONFIG		0x0010
>  
> @@ -75,6 +75,7 @@
>  #define AIC_EVENT_TYPE		GENMASK(31, 16)
>  #define AIC_EVENT_NUM		GENMASK(15, 0)
>  
> +#define AIC_EVENT_TYPE_FIQ	0 /* Software use */

What does 'SW use' mean? Are you using the fact that the event
register never returns 0 in the top bits?

>  #define AIC_EVENT_TYPE_HW	1
>  #define AIC_EVENT_TYPE_IPI	4
>  #define AIC_EVENT_IPI_OTHER	1
> @@ -158,6 +159,8 @@
>  #define MPIDR_CPU			GENMASK(7, 0)
>  #define MPIDR_CLUSTER			GENMASK(15, 8)
>  
> +#define AIC_FIQ_HWIRQ(x)	(FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_FIQ) | \
> +				 FIELD_PREP(AIC_EVENT_NUM, x))
>  #define AIC_NR_FIQ		4
>  #define AIC_NR_SWIPI		32
>  
> @@ -209,7 +212,7 @@ struct aic_irq_chip {
>  	void __iomem *base;
>  	struct irq_domain *hw_domain;
>  	struct irq_domain *ipi_domain;
> -	int nr_hw;
> +	int nr_irq;
>  
>  	struct aic_info info;
>  };
> @@ -239,18 +242,22 @@ static void aic_ic_write(struct aic_irq_chip *ic, u32 reg, u32 val)
>  
>  static void aic_irq_mask(struct irq_data *d)
>  {
> +	irq_hw_number_t hwirq = irqd_to_hwirq(d);
>  	struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
>  
> -	aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irqd_to_hwirq(d)),
> -		     MASK_BIT(irqd_to_hwirq(d)));
> +	u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq);

This expression is used quite a few times, and could use a helper
clarifying its purpose (converting the event/hwirq to an index?).
'irq' is a bit of a misnomer too, but I struggle to find another
name...


> +
> +	aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irq), MASK_BIT(irq));
>  }
>  
>  static void aic_irq_unmask(struct irq_data *d)
>  {
> +	irq_hw_number_t hwirq = irqd_to_hwirq(d);
>  	struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
>  
> -	aic_ic_write(ic, AIC_MASK_CLR + MASK_REG(d->hwirq),
> -		     MASK_BIT(irqd_to_hwirq(d)));
> +	u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq);
> +
> +	aic_ic_write(ic, AIC_MASK_CLR + MASK_REG(irq), MASK_BIT(irq));
>  }
>  
>  static void aic_irq_eoi(struct irq_data *d)
> @@ -278,7 +285,7 @@ static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs)
>  		irq = FIELD_GET(AIC_EVENT_NUM, event);
>  
>  		if (type == AIC_EVENT_TYPE_HW)
> -			generic_handle_domain_irq(aic_irqc->hw_domain, irq);
> +			generic_handle_domain_irq(aic_irqc->hw_domain, event);
>  		else if (type == AIC_EVENT_TYPE_IPI && irq == 1)
>  			aic_handle_ipi(regs);
>  		else if (event != 0)
> @@ -310,7 +317,7 @@ static int aic_irq_set_affinity(struct irq_data *d,
>  	else
>  		cpu = cpumask_any_and(mask_val, cpu_online_mask);
>  
> -	aic_ic_write(ic, AIC_TARGET_CPU + hwirq * 4, BIT(cpu));
> +	aic_ic_write(ic, AIC_TARGET_CPU + FIELD_GET(AIC_EVENT_NUM, hwirq) * 4, BIT(cpu));
>  	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>  
>  	return IRQ_SET_MASK_OK;
> @@ -340,9 +347,7 @@ static struct irq_chip aic_chip = {
>  
>  static unsigned long aic_fiq_get_idx(struct irq_data *d)
>  {
> -	struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
> -
> -	return irqd_to_hwirq(d) - ic->nr_hw;
> +	return FIELD_GET(AIC_EVENT_NUM, irqd_to_hwirq(d));
>  }
>  
>  static void aic_fiq_set_mask(struct irq_data *d)
> @@ -430,11 +435,11 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs)
>  
>  	if (TIMER_FIRING(read_sysreg(cntp_ctl_el0)))
>  		generic_handle_domain_irq(aic_irqc->hw_domain,
> -					  aic_irqc->nr_hw + AIC_TMR_EL0_PHYS);
> +					  AIC_FIQ_HWIRQ(AIC_TMR_EL0_PHYS));
>  
>  	if (TIMER_FIRING(read_sysreg(cntv_ctl_el0)))
>  		generic_handle_domain_irq(aic_irqc->hw_domain,
> -					  aic_irqc->nr_hw + AIC_TMR_EL0_VIRT);
> +					  AIC_FIQ_HWIRQ(AIC_TMR_EL0_VIRT));
>  
>  	if (is_kernel_in_hyp_mode()) {
>  		uint64_t enabled = read_sysreg_s(SYS_IMP_APL_VM_TMR_FIQ_ENA_EL2);
> @@ -442,12 +447,12 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs)
>  		if ((enabled & VM_TMR_FIQ_ENABLE_P) &&
>  		    TIMER_FIRING(read_sysreg_s(SYS_CNTP_CTL_EL02)))
>  			generic_handle_domain_irq(aic_irqc->hw_domain,
> -						  aic_irqc->nr_hw + AIC_TMR_EL02_PHYS);
> +						  AIC_FIQ_HWIRQ(AIC_TMR_EL02_PHYS));
>  
>  		if ((enabled & VM_TMR_FIQ_ENABLE_V) &&
>  		    TIMER_FIRING(read_sysreg_s(SYS_CNTV_CTL_EL02)))
>  			generic_handle_domain_irq(aic_irqc->hw_domain,
> -						  aic_irqc->nr_hw + AIC_TMR_EL02_VIRT);
> +						  AIC_FIQ_HWIRQ(AIC_TMR_EL02_VIRT));
>  	}
>  
>  	if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) ==
> @@ -492,13 +497,13 @@ static struct irq_chip fiq_chip = {
>  static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq,
>  			      irq_hw_number_t hw)
>  {
> -	struct aic_irq_chip *ic = id->host_data;
> +	u32 type = FIELD_GET(AIC_EVENT_TYPE, hw);
>  
> -	if (hw < ic->nr_hw) {
> +	if (type == AIC_EVENT_TYPE_HW) {
>  		irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data,
>  				    handle_fasteoi_irq, NULL, NULL);
>  		irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));
> -	} else {
> +	} else if (type == AIC_EVENT_TYPE_FIQ) {

Do we need to check for FIQ? This should be the case by construction,
right? If there is a risk that it isn't the case, then we probably
need a default case (and the whole thing would be better written as a
switch() statement).

>  		irq_set_percpu_devid(irq);
>  		irq_domain_set_info(id, irq, hw, &fiq_chip, id->host_data,
>  				    handle_percpu_devid_irq, NULL, NULL);
> @@ -519,14 +524,15 @@ static int aic_irq_domain_translate(struct irq_domain *id,
>  
>  	switch (fwspec->param[0]) {
>  	case AIC_IRQ:
> -		if (fwspec->param[1] >= ic->nr_hw)
> +		if (fwspec->param[1] >= ic->nr_irq)
>  			return -EINVAL;
> -		*hwirq = fwspec->param[1];
> +		*hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) |
> +			  FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1]));
>  		break;
>  	case AIC_FIQ:
>  		if (fwspec->param[1] >= AIC_NR_FIQ)
>  			return -EINVAL;
> -		*hwirq = ic->nr_hw + fwspec->param[1];
> +		*hwirq = AIC_FIQ_HWIRQ(fwspec->param[1]);
>  
>  		/*
>  		 * In EL1 the non-redirected registers are the guest's,
> @@ -535,10 +541,10 @@ static int aic_irq_domain_translate(struct irq_domain *id,
>  		if (!is_kernel_in_hyp_mode()) {
>  			switch (fwspec->param[1]) {
>  			case AIC_TMR_GUEST_PHYS:
> -				*hwirq = ic->nr_hw + AIC_TMR_EL0_PHYS;
> +				*hwirq = AIC_FIQ_HWIRQ(AIC_TMR_EL0_PHYS);
>  				break;
>  			case AIC_TMR_GUEST_VIRT:
> -				*hwirq = ic->nr_hw + AIC_TMR_EL0_VIRT;
> +				*hwirq = AIC_FIQ_HWIRQ(AIC_TMR_EL0_VIRT);
>  				break;
>  			case AIC_TMR_HV_PHYS:
>  			case AIC_TMR_HV_VIRT:
> @@ -894,11 +900,10 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	aic_irqc = irqc;
>  
>  	info = aic_ic_read(irqc, AIC_INFO);
> -	irqc->nr_hw = FIELD_GET(AIC_INFO_NR_HW, info);
> +	irqc->nr_irq = FIELD_GET(AIC_INFO_NR_IRQ, info);
>  
> -	irqc->hw_domain = irq_domain_create_linear(of_node_to_fwnode(node),
> -						   irqc->nr_hw + AIC_NR_FIQ,
> -						   &aic_irq_domain_ops, irqc);
> +	irqc->hw_domain = irq_domain_create_tree(of_node_to_fwnode(node),
> +						 &aic_irq_domain_ops, irqc);
>  	if (WARN_ON(!irqc->hw_domain)) {
>  		iounmap(irqc->base);
>  		kfree(irqc);
> @@ -917,11 +922,11 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	set_handle_irq(aic_handle_irq);
>  	set_handle_fiq(aic_handle_fiq);
>  
> -	for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++)
> +	for (i = 0; i < BITS_TO_U32(irqc->nr_irq); i++)
>  		aic_ic_write(irqc, AIC_MASK_SET + i * 4, U32_MAX);
> -	for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++)
> +	for (i = 0; i < BITS_TO_U32(irqc->nr_irq); i++)
>  		aic_ic_write(irqc, AIC_SW_CLR + i * 4, U32_MAX);
> -	for (i = 0; i < irqc->nr_hw; i++)
> +	for (i = 0; i < irqc->nr_irq; i++)
>  		aic_ic_write(irqc, AIC_TARGET_CPU + i * 4, 1);
>  
>  	if (!is_kernel_in_hyp_mode())
> @@ -937,7 +942,7 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p
>  	vgic_set_kvm_info(&vgic_info);
>  
>  	pr_info("Initialized with %d IRQs, %d FIQs, %d vIPIs\n",
> -		irqc->nr_hw, AIC_NR_FIQ, AIC_NR_SWIPI);
> +		irqc->nr_irq, AIC_NR_FIQ, AIC_NR_SWIPI);
>  
>  	return 0;
>  }

Thanks,

	M.

Marc Zyngier Dec. 12, 2021, 6:26 p.m. UTC | #3

On Thu, 09 Dec 2021 04:32:47 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> This allows us to support AIC variants with different numbers of IRQs
> based on capability registers.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/irqchip/irq-apple-aic.c | 73 +++++++++++++++++++++++++--------
>  1 file changed, 56 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
> index 572d1af175fc..d03caed51d56 100644
> --- a/drivers/irqchip/irq-apple-aic.c
> +++ b/drivers/irqchip/irq-apple-aic.c
> @@ -312,12 +326,15 @@ static int aic_irq_set_affinity(struct irq_data *d,
>  	struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
>  	int cpu;
>  
> +	if (!ic->info.target_cpu)
> +		return -EINVAL;

Can this even happen? And if it did, this should scream loudly,
shouldn't it?

	M.

Marc Zyngier Dec. 12, 2021, 6:47 p.m. UTC | #4

On Thu, 09 Dec 2021 04:32:49 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> Introduce support for the new AICv2 hardware block in t6000/t6001 SoCs.
> 
> It seems these blocks are missing the information required to compute
> the event register offset in the capability registers, so we specify
> that in the DT.
> 
> Signed-off-by: Hector Martin <marcan@marcan.st>
> ---
>  drivers/irqchip/irq-apple-aic.c | 146 ++++++++++++++++++++++++++++----
>  1 file changed, 128 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
> index 46b7750548a0..226d5232dd14 100644
> --- a/drivers/irqchip/irq-apple-aic.c
> +++ b/drivers/irqchip/irq-apple-aic.c
> @@ -101,6 +101,57 @@
>  
>  #define AIC_MAX_IRQ		0x400
>  
> +/*
> + * AIC v2 registers (MMIO)
> + */
> +
> +#define AIC2_VERSION		0x0000
> +#define AIC2_VERSION_VER	GENMASK(7, 0)
> +
> +#define AIC2_INFO1		0x0004
> +#define AIC2_INFO1_NR_IRQ	GENMASK(15, 0)
> +#define AIC2_INFO1_LAST_DIE	GENMASK(27, 24)
> +
> +#define AIC2_INFO2		0x0008
> +
> +#define AIC2_INFO3		0x000c
> +#define AIC2_INFO3_MAX_IRQ	GENMASK(15, 0)
> +#define AIC2_INFO3_MAX_DIE	GENMASK(27, 24)
> +
> +#define AIC2_RESET		0x0010
> +#define AIC2_RESET_RESET	BIT(0)
> +
> +#define AIC2_CONFIG		0x0014
> +#define AIC2_CONFIG_ENABLE	BIT(0)
> +#define AIC2_CONFIG_PREFER_PCPU	BIT(28)
> +
> +#define AIC2_TIMEOUT		0x0028
> +#define AIC2_CLUSTER_PRIO	0x0030
> +#define AIC2_DELAY_GROUPS	0x0100
> +
> +#define AIC2_IRQ_CFG		0x2000
> +
> +/*
> + * AIC2 registers are laid out like this, starting at AIC2_IRQ_CFG:
> + *
> + * Repeat for each die:
> + *   IRQ_CFG: u32 * MAX_IRQS
> + *   SW_SET: u32 * (MAX_IRQS / 32)
> + *   SW_CLR: u32 * (MAX_IRQS / 32)
> + *   MASK_SET: u32 * (MAX_IRQS / 32)
> + *   MASK_CLR: u32 * (MAX_IRQS / 32)
> + *   HW_STATE: u32 * (MAX_IRQS / 32)
> + *
> + * This is followed by a set of event registers, each 16K page aligned.
> + * The first one is the AP event register we will use. Unfortunately,
> + * the actual implemented die count is not specified anywhere in the
> + * capability registers, so we have to explcitly specify the event

explicitly

> + * register offset in the device tree to remain forward-compatible.

Do the current machines actually have more than a single die?

> + */
> +
> +#define AIC2_IRQ_CFG_TARGET	GENMASK(3, 0)
> +#define AIC2_IRQ_CFG_DELAY_IDX	GENMASK(7, 5)
> +
>  #define MASK_REG(x)		(4 * ((x) >> 5))
>  #define MASK_BIT(x)		BIT((x) & GENMASK(4, 0))
>  
> @@ -187,6 +238,7 @@ struct aic_info {
>  	/* Register offsets */
>  	u32 event;
>  	u32 target_cpu;
> +	u32 irq_cfg;
>  	u32 sw_set;
>  	u32 sw_clr;
>  	u32 mask_set;
> @@ -214,6 +266,14 @@ static const struct aic_info aic1_fipi_info = {
>  	.fast_ipi	= true,
>  };
>  
> +static const struct aic_info aic2_info = {
> +	.version	= 2,
> +
> +	.irq_cfg	= AIC2_IRQ_CFG,
> +
> +	.fast_ipi	= true,
> +};
> +
>  static const struct of_device_id aic_info_match[] = {
>  	{
>  		.compatible = "apple,t8103-aic",
> @@ -223,6 +283,10 @@ static const struct of_device_id aic_info_match[] = {
>  		.compatible = "apple,aic",
>  		.data = &aic1_info,
>  	},
> +	{
> +		.compatible = "apple,aic2",
> +		.data = &aic2_info,
> +	},
>  	{}
>  };
>  
> @@ -368,6 +432,14 @@ static struct irq_chip aic_chip = {
>  	.irq_set_type = aic_irq_set_type,
>  };
>  
> +static struct irq_chip aic2_chip = {
> +	.name = "AIC2",
> +	.irq_mask = aic_irq_mask,
> +	.irq_unmask = aic_irq_unmask,
> +	.irq_eoi = aic_irq_eoi,
> +	.irq_set_type = aic_irq_set_type,
> +};

How is the affinity managed if you don't have a callback? A number of
things are bound to break if you don't have one. And a description of
how an interrupt gets routed wouldn't go amiss!

> +
>  /*
>   * FIQ irqchip
>   */
> @@ -524,10 +596,15 @@ static struct irq_chip fiq_chip = {
>  static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq,
>  			      irq_hw_number_t hw)
>  {
> +	struct aic_irq_chip *ic = id->host_data;
>  	u32 type = FIELD_GET(AIC_EVENT_TYPE, hw);
> +	struct irq_chip *chip = &aic_chip;
> +
> +	if (ic->info.version == 2)
> +		chip = &aic2_chip;
>  
>  	if (type == AIC_EVENT_TYPE_HW) {
> -		irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data,
> +		irq_domain_set_info(id, irq, hw, chip, id->host_data,
>  				    handle_fasteoi_irq, NULL, NULL);
>  		irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));
>  	} else if (type == AIC_EVENT_TYPE_FIQ) {
> @@ -882,23 +959,25 @@ static int aic_init_cpu(unsigned int cpu)
>  	/* Commit all of the above */
>  	isb();
>  
> -	/*
> -	 * Make sure the kernel's idea of logical CPU order is the same as AIC's
> -	 * If we ever end up with a mismatch here, we will have to introduce
> -	 * a mapping table similar to what other irqchip drivers do.
> -	 */
> -	WARN_ON(aic_ic_read(aic_irqc, AIC_WHOAMI) != smp_processor_id());
> +	if (aic_irqc->info.version == 1) {
> +		/*
> +		 * Make sure the kernel's idea of logical CPU order is the same as AIC's
> +		 * If we ever end up with a mismatch here, we will have to introduce
> +		 * a mapping table similar to what other irqchip drivers do.
> +		 */
> +		WARN_ON(aic_ic_read(aic_irqc, AIC_WHOAMI) != smp_processor_id());
>  
> -	/*
> -	 * Always keep IPIs unmasked at the hardware level (except auto-masking
> -	 * by AIC during processing). We manage masks at the vIPI level.
> -	 */
> -	aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER);
> -	if (!aic_irqc->info.fast_ipi) {
> -		aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF);
> -		aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
> -	} else {
> -		aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER);
> +		/*
> +		 * Always keep IPIs unmasked at the hardware level (except auto-masking
> +		 * by AIC during processing). We manage masks at the vIPI level.
> +		 */
> +		aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER);
> +		if (!aic_irqc->info.fast_ipi) {
> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF);
> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
> +		} else {
> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER);
> +		}

Why is this specific to v1 and not affecting v2? I'm sure there is a
good reason, but documenting these differences would certainly help
reviewing (which version implement which registers, for example).

Thanks,

	M.

Hector Martin Dec. 18, 2021, 5:31 a.m. UTC | #5

On 12/12/2021 21.21, Marc Zyngier wrote:
>> +/* MPIDR fields */
>> +#define MPIDR_CPU			GENMASK(7, 0)
>> +#define MPIDR_CLUSTER			GENMASK(15, 8)
> 
> This should be defined in terms of MPIDR_AFFINITY_LEVEL() and co.

Yeah, I found out about that macro from your PMU driver... :)

>> +static const struct aic_info aic1_fipi_info = {
>> +	.version	= 1,
>> +
>> +	.fast_ipi	= true,
> 
> Do you anticipate multiple feature flags like this? If so, maybe we
> should consider biting the bullet and making this an unsigned long
> populated with discrete flags.
> 
> Not something we need to decide now though.

Probably not, but who knows! It's easy to change it later, though.

>>  	if (read_sysreg_s(SYS_IMP_APL_IPI_SR_EL1) & IPI_SR_PENDING) {
>> -		pr_err_ratelimited("Fast IPI fired. Acking.\n");
>> -		write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1);
>> +		if (aic_irqc->info.fast_ipi) {
> 
> On the other hand, this is likely to hit on the fast path. Given that
> we know at probe time whether we support SR-based IPIs, we can turn
> this into a static key and save a few fetches on every IPI. It applies
> everywhere you look at this flag at runtime.

Good point, I'll see about refactoring this to use static keys.

>> +static void aic_ipi_send_fast(int cpu)
>> +{
>> +	u64 mpidr = cpu_logical_map(cpu);
>> +	u64 my_mpidr = cpu_logical_map(smp_processor_id());
> 
> This is the equivalent of reading MPIDR_EL1. My gut feeling is that it
> is a bit faster to access the sysreg than a percpu lookup, a function
> call and another memory access.

Yeah, I saw other IRQ drivers doing this, but I wasn't sure it made
sense over just reading MPIDR_EL1... I'll switch to that.

>> +	u64 idx = FIELD_GET(MPIDR_CPU, mpidr);
>> +
>> +	if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster)
>> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx),
>> +			       SYS_IMP_APL_IPI_RR_LOCAL_EL1);
>> +	else
>> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster),
>> +			       SYS_IMP_APL_IPI_RR_GLOBAL_EL1);
> 
> Don't you need an ISB, either here or in the two callers? At the
> moment, I don't see what will force the execution of these writes, and
> they could be arbitrarily delayed.

Is there any requirement for timeliness sending IPIs? They're going to
another CPU after all, they could be arbitrarily delayed because it has
FIQs masked.

>> -	if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit)
>> -		aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id()));
>> +	if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) {
>> +		if (ic->info.fast_ipi)
>> +			aic_ipi_send_fast(smp_processor_id());
> 
> nit: if this is common enough, maybe having an aic_ipi_send_self_fast
> could be better. Needs evaluation though.

I'll do some printing to see how common self-IPIs are when running
common workloads, let's see. If it's common enough it's easy enough to add.

>> +	irqc->info = *(struct aic_info *)match->data;
> 
> Why the copy? All the data is const, and isn't going away.

... for now, but later patches then start computing register offsets and
putting them into this structure :)

Hector Martin Dec. 18, 2021, 5:36 a.m. UTC | #6

On 12/12/2021 23.37, Marc Zyngier wrote:
>> @@ -75,6 +75,7 @@
>>  #define AIC_EVENT_TYPE		GENMASK(31, 16)
>>  #define AIC_EVENT_NUM		GENMASK(15, 0)
>>  
>> +#define AIC_EVENT_TYPE_FIQ	0 /* Software use */
> 
> What does 'SW use' mean? Are you using the fact that the event
> register never returns 0 in the top bits?

Yeah. Since we're switching to tree IRQs we can use the raw hardware
event as the hwirq, and save some cycles munging fields, but we still
need a place for FIQ hwirq numbers to live. Zero here means "no
IRQ/spurious" to the hardware, so it's a convenient place to stick FIQs.
Note that the top-level IRQ handler function does check this field, so
even if newer hardware starts doing something silly here (which I highly
doubt...) it would never end up forwarded to the IRQ domain code without
further code changes.

>> -	aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irqd_to_hwirq(d)),
>> -		     MASK_BIT(irqd_to_hwirq(d)));
>> +	u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq);
> 
> This expression is used quite a few times, and could use a helper
> clarifying its purpose (converting the event/hwirq to an index?).
> 'irq' is a bit of a misnomer too, but I struggle to find another
> name...

Ack, I'll add a helper. It's extracting the IRQ number field from the
hwirq number. This is relatively trivial at this point in the patch set
(where the only other field is the constant type = 1), but it makes more
sense once I add the die field later on.

>> @@ -492,13 +497,13 @@ static struct irq_chip fiq_chip = {
>>  static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq,
>>  			      irq_hw_number_t hw)
>>  {
>> -	struct aic_irq_chip *ic = id->host_data;
>> +	u32 type = FIELD_GET(AIC_EVENT_TYPE, hw);
>>  
>> -	if (hw < ic->nr_hw) {
>> +	if (type == AIC_EVENT_TYPE_HW) {
>>  		irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data,
>>  				    handle_fasteoi_irq, NULL, NULL);
>>  		irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq)));
>> -	} else {
>> +	} else if (type == AIC_EVENT_TYPE_FIQ) {
> 
> Do we need to check for FIQ? This should be the case by construction,
> right? If there is a risk that it isn't the case, then we probably
> need a default case (and the whole thing would be better written as a
> switch() statement).

Yes, it should be the case by construction; this can just be an else.
I'll change it.

Hector Martin Dec. 18, 2021, 5:37 a.m. UTC | #7

On 13/12/2021 03.26, Marc Zyngier wrote:
> On Thu, 09 Dec 2021 04:32:47 +0000,
> Hector Martin <marcan@marcan.st> wrote:
>>
>> This allows us to support AIC variants with different numbers of IRQs
>> based on capability registers.
>>
>> Signed-off-by: Hector Martin <marcan@marcan.st>
>> ---
>>  drivers/irqchip/irq-apple-aic.c | 73 +++++++++++++++++++++++++--------
>>  1 file changed, 56 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c
>> index 572d1af175fc..d03caed51d56 100644
>> --- a/drivers/irqchip/irq-apple-aic.c
>> +++ b/drivers/irqchip/irq-apple-aic.c
>> @@ -312,12 +326,15 @@ static int aic_irq_set_affinity(struct irq_data *d,
>>  	struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d);
>>  	int cpu;
>>  
>> +	if (!ic->info.target_cpu)
>> +		return -EINVAL;
> 
> Can this even happen? And if it did, this should scream loudly,
> shouldn't it?

Yeah, it can't happen, so this really should be a BUG_ON. This is mostly
there in case somehow we end up with confusion between AIC versions and
register offsets later, since AIC2 does not use this field but also
shouldn't be setting up an irqchip that calls this function.

Hector Martin Dec. 18, 2021, 5:39 a.m. UTC | #8

On 14/12/2021 01.10, Marc Zyngier wrote:
>>  	switch (fwspec->param[0]) {
>>  	case AIC_IRQ:
>> -		if (fwspec->param[1] >= ic->nr_irq)
>> +		if (die >= ic->nr_die)
>> +			return -EINVAL;
>> +		if (args[0] >= ic->nr_irq)
>>  			return -EINVAL;
>> -		*hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) |
>> -			  FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1]));
>> +		*hwirq = AIC_IRQ_HWIRQ(die, args[0]);
>>  		break;
> 
> A side issue with this is that it breaks MSIs, due to the way we
> construct the intspec (I did hit that when upgrading the M1 intspec to
> 4 cells for the PMU). I have the following hack locally:
> 
> diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
> index b090924b41fe..f7b4a67b13cf 100644
> --- a/drivers/pci/controller/pcie-apple.c
> +++ b/drivers/pci/controller/pcie-apple.c
> @@ -218,7 +218,7 @@ static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
>  	if (hwirq < 0)
>  		return -ENOSPC;
>  
> -	fwspec.param[1] += hwirq;
> +	fwspec.param[1 + (fwspec.param_count == 4)] += hwirq;
>  
>  	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec);
>  	if (ret)
> 

Heh, I never noticed; probably because I guess the SD card reader on the
machine I've been testing this on doesn't use MSIs, and I haven't tried
WiFi yet.

Perhaps (fwspec.param_count - 2)? It's probably a safer long-term
assumption that the last two cells will always be leaf IRQ number and type.

Hector Martin Dec. 18, 2021, 6:02 a.m. UTC | #9

On 13/12/2021 03.47, Marc Zyngier wrote:
>> + * This is followed by a set of event registers, each 16K page aligned.
>> + * The first one is the AP event register we will use. Unfortunately,
>> + * the actual implemented die count is not specified anywhere in the
>> + * capability registers, so we have to explcitly specify the event
> 
> explicitly

Thanks, fixed!

> 
>> + * register offset in the device tree to remain forward-compatible.
> 
> Do the current machines actually have more than a single die?

Not the current ones, but there are loud rumors everywhere of multi-die
products... might as well try to support them ahead of time. The current
machines *do* have two register sets, implying support for 2-die
configurations, and although no IRQs are ever asserted from hardware,
SW_GEN mode works and you can trigger die-ID 1 events.

The interpretation of the capability registers comes from what the macOS
driver does (that's the only part I looked at it for, since it's kind of
hard to divine with only a single data point from the hardware). Their
driver is definitely designed for multi die machines already. The
register layout I worked out by probing the hardware; it was blatantly
obvious that there was a second set of IRQ mask arrays after the first,
that macOS didn't use (yet)...

>> +static struct irq_chip aic2_chip = {
>> +	.name = "AIC2",
>> +	.irq_mask = aic_irq_mask,
>> +	.irq_unmask = aic_irq_unmask,
>> +	.irq_eoi = aic_irq_eoi,
>> +	.irq_set_type = aic_irq_set_type,
>> +};
> 
> How is the affinity managed if you don't have a callback? A number of
> things are bound to break if you don't have one. And a description of
> how an interrupt gets routed wouldn't go amiss!

It isn't... we don't know all the details yet, but it seems to be Some
Kind Of Magic™.

There definitely is no way of individually mapping IRQs to specific
CPUs; there just aren't enough implemented register bits to allow that.

What we do have is a per-IRQ config consisting of:

- Target CPU, 4 bits. This seems to be for pointing IRQs at coprocessors
(there's actually an init dance to map a few IRQs to specific
coprocessors; m1n1 takes care of that right now*). Only 0 sends IRQs to
the AP here, so this is not useful to us.

- IRQ config group, 3 bits. This selects one of 8 IRQ config registers.
These do indirectly control how the IRQ is delivered; at least they have
some kind of delay value (coalescing?) and I suspect may do some kind of
priority control, though the details of that aren't clear yet. I don't
recall seeing macOS do anything interesting with these groups, I think
it always uses group 0.

Then each CPU has an IMP-DEF sysreg that allows it to opt-in or opt-out
of receiving IRQs (!). It actually has two bits, so there may be some
kind of priority/subset control here too. By default all other CPUs are
opted out, which isn't great... so m1n1 initializes it to opt in all
CPUs to IRQ delivery.

The actual delivery flow here seems to be something like AIC/something
picks a CPU (using some kind of heuristic/CPU state? I noticed WFI seems
to have an effect here) for initial delivery, and if the IRQ isn't acked
in a timely manner, it punts and broadcasts the IRQ to all CPUs. The IRQ
ack register is shared by all CPUs; I don't know if there is some kind
of per-CPU difference in what it can return, but I haven't observed that
yet, so I guess whatever CPU gets the IRQ gets to handle anything that
is pending.

There are also some extra features; e.g. there is definitely a set of
registers for measuring IRQ latency (tells you how long it took from IRQ
assertion to the CPU acking it). There's also some kind of global
control over which CPU *cluster* is tried first for delivery (defaults
to e-cluster, but you can change it to either p-cluster). We don't use
those right now.

So there is definitely room for further research here, but the current
state of affairs is the driver doesn't do affinity at all, and IRQs are
handled by "some" CPU. In practice, I see a decent (but not completely
uniform) spread of which CPU handles any given IRQ. I assume it's
something like it prefers a busy CPU, to avoid waking up a core just to
handle an IRQ.

* I don't know how masks are supposed to be managed for those IRQs used
by copros; I guess we'll find out when we get there and notice something
is broken if we don't unmask them... but I guess given the IRQ handling
flow here, that copro should be doing the masking/unmasking itself.

>> +		aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER);
>> +		if (!aic_irqc->info.fast_ipi) {
>> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF);
>> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER);
>> +		} else {
>> +			aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER);
>> +		}
> 
> Why is this specific to v1 and not affecting v2? I'm sure there is a
> good reason, but documenting these differences would certainly help
> reviewing (which version implement which registers, for example).

Only v1 has legacy IPIs (which is why we had to implement Fast IPIs for
this one). AIC_IPI_* registers are for AICv1 specifically; other than
the event register fields which are the same (but not the register
offset itself), and the general concept of the mask/sw_gen/hw_status
register arrays, there aren't really any shared registers between AICv1
and AICv2.

I'll add a comment to clarify this.

Marc Zyngier Dec. 20, 2021, 12:43 p.m. UTC | #10

On Sat, 18 Dec 2021 05:31:28 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> >> +	u64 idx = FIELD_GET(MPIDR_CPU, mpidr);
> >> +
> >> +	if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster)
> >> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx),
> >> +			       SYS_IMP_APL_IPI_RR_LOCAL_EL1);
> >> +	else
> >> +		write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster),
> >> +			       SYS_IMP_APL_IPI_RR_GLOBAL_EL1);
> > 
> > Don't you need an ISB, either here or in the two callers? At the
> > moment, I don't see what will force the execution of these writes, and
> > they could be arbitrarily delayed.
> 
> Is there any requirement for timeliness sending IPIs? They're going to
> another CPU after all, they could be arbitrarily delayed because it has
> FIQs masked.

They absolutely could, but this has a potential impact on the
scheduling if you delay it (the vast majority of these IPIs are to
indicate to the remote CPU that it needs to go and schedule something
else). So there is an incentive for making it happen ASAP.

Thanks,

	M.

Marc Zyngier Dec. 20, 2021, 1:38 p.m. UTC | #11

On Sat, 18 Dec 2021 05:39:04 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> On 14/12/2021 01.10, Marc Zyngier wrote:
> >>  	switch (fwspec->param[0]) {
> >>  	case AIC_IRQ:
> >> -		if (fwspec->param[1] >= ic->nr_irq)
> >> +		if (die >= ic->nr_die)
> >> +			return -EINVAL;
> >> +		if (args[0] >= ic->nr_irq)
> >>  			return -EINVAL;
> >> -		*hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) |
> >> -			  FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1]));
> >> +		*hwirq = AIC_IRQ_HWIRQ(die, args[0]);
> >>  		break;
> > 
> > A side issue with this is that it breaks MSIs, due to the way we
> > construct the intspec (I did hit that when upgrading the M1 intspec to
> > 4 cells for the PMU). I have the following hack locally:
> > 
> > diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c
> > index b090924b41fe..f7b4a67b13cf 100644
> > --- a/drivers/pci/controller/pcie-apple.c
> > +++ b/drivers/pci/controller/pcie-apple.c
> > @@ -218,7 +218,7 @@ static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> >  	if (hwirq < 0)
> >  		return -ENOSPC;
> >  
> > -	fwspec.param[1] += hwirq;
> > +	fwspec.param[1 + (fwspec.param_count == 4)] += hwirq;
> >  
> >  	ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec);
> >  	if (ret)
> > 
> 
> Heh, I never noticed; probably because I guess the SD card reader on the
> machine I've been testing this on doesn't use MSIs, and I haven't tried
> WiFi yet.
> 
> Perhaps (fwspec.param_count - 2)? It's probably a safer long-term
> assumption that the last two cells will always be leaf IRQ number and type.

Yup, that'd work as well, as long as we make this assumption explicit.

	M.

Marc Zyngier Dec. 20, 2021, 1:52 p.m. UTC | #12

On Sat, 18 Dec 2021 06:02:24 +0000,
Hector Martin <marcan@marcan.st> wrote:
> 
> On 13/12/2021 03.47, Marc Zyngier wrote:
> >> + * This is followed by a set of event registers, each 16K page aligned.
> >> + * The first one is the AP event register we will use. Unfortunately,
> >> + * the actual implemented die count is not specified anywhere in the
> >> + * capability registers, so we have to explcitly specify the event
> > 
> > explicitly
> 
> Thanks, fixed!
> 
> > 
> >> + * register offset in the device tree to remain forward-compatible.
> > 
> > Do the current machines actually have more than a single die?
> 
> Not the current ones, but there are loud rumors everywhere of multi-die
> products... might as well try to support them ahead of time. The current
> machines *do* have two register sets, implying support for 2-die
> configurations, and although no IRQs are ever asserted from hardware,
> SW_GEN mode works and you can trigger die-ID 1 events.
> 
> The interpretation of the capability registers comes from what the macOS
> driver does (that's the only part I looked at it for, since it's kind of
> hard to divine with only a single data point from the hardware). Their
> driver is definitely designed for multi die machines already. The
> register layout I worked out by probing the hardware; it was blatantly
> obvious that there was a second set of IRQ mask arrays after the first,
> that macOS didn't use (yet)...
> 
> >> +static struct irq_chip aic2_chip = {
> >> +	.name = "AIC2",
> >> +	.irq_mask = aic_irq_mask,
> >> +	.irq_unmask = aic_irq_unmask,
> >> +	.irq_eoi = aic_irq_eoi,
> >> +	.irq_set_type = aic_irq_set_type,
> >> +};
> > 
> > How is the affinity managed if you don't have a callback? A number of
> > things are bound to break if you don't have one. And a description of
> > how an interrupt gets routed wouldn't go amiss!
> 
> It isn't... we don't know all the details yet, but it seems to be Some
> Kind Of Magic™.
> 
> There definitely is no way of individually mapping IRQs to specific
> CPUs; there just aren't enough implemented register bits to allow that.
> 
> What we do have is a per-IRQ config consisting of:
> 
> - Target CPU, 4 bits. This seems to be for pointing IRQs at coprocessors
> (there's actually an init dance to map a few IRQs to specific
> coprocessors; m1n1 takes care of that right now*). Only 0 sends IRQs to
> the AP here, so this is not useful to us.
> 
> - IRQ config group, 3 bits. This selects one of 8 IRQ config registers.
> These do indirectly control how the IRQ is delivered; at least they have
> some kind of delay value (coalescing?) and I suspect may do some kind of
> priority control, though the details of that aren't clear yet. I don't
> recall seeing macOS do anything interesting with these groups, I think
> it always uses group 0.
> 
> Then each CPU has an IMP-DEF sysreg that allows it to opt-in or opt-out
> of receiving IRQs (!). It actually has two bits, so there may be some
> kind of priority/subset control here too. By default all other CPUs are
> opted out, which isn't great... so m1n1 initializes it to opt in all
> CPUs to IRQ delivery.
> 
> The actual delivery flow here seems to be something like AIC/something
> picks a CPU (using some kind of heuristic/CPU state? I noticed WFI seems
> to have an effect here) for initial delivery, and if the IRQ isn't acked
> in a timely manner, it punts and broadcasts the IRQ to all CPUs. The IRQ
> ack register is shared by all CPUs; I don't know if there is some kind
> of per-CPU difference in what it can return, but I haven't observed that
> yet, so I guess whatever CPU gets the IRQ gets to handle anything that
> is pending.
> 
> There are also some extra features; e.g. there is definitely a set of
> registers for measuring IRQ latency (tells you how long it took from IRQ
> assertion to the CPU acking it). There's also some kind of global
> control over which CPU *cluster* is tried first for delivery (defaults
> to e-cluster, but you can change it to either p-cluster). We don't use
> those right now.
> 
> So there is definitely room for further research here, but the current
> state of affairs is the driver doesn't do affinity at all, and IRQs are
> handled by "some" CPU. In practice, I see a decent (but not completely
> uniform) spread of which CPU handles any given IRQ. I assume it's
> something like it prefers a busy CPU, to avoid waking up a core just to
> handle an IRQ.

The main issue with such magic is that a number of things will break
in a spectacular way for a bunch of drivers. We have a whole class of
(mostly PCI) devices that have per-queue interrupts, each one bound to
a CPU core. The drivers fully expect the interrupt for a given queue
to fire on a given CPU, and *only* this one as they would, for
example, use per-CPU data to get the context of the queue.

With an automatic spread of interrupts, this totally breaks. Probably
because the core will refuse to use managed interrupts due to the lack
of affinity setting callback. And even if you provide a dummy one, it
is the endpoint drivers that will explode.

The only way I can imagine to make this work is to force these
interrupts to be threaded so that the thread can run on a different
CPU than the one the interrupt has been taken on. Performance-wise,
this is likely to be a pig.

I guess we will have to find ways to live with this in the long run,
and maybe teach the core code of these weird behaviours.

Thanks,

	M.

[0/6] irqchip/apple-aic: Add support for AICv2

Message

Comments