Message ID | 20211209043249.65474-1-marcan@marcan.st |
---|---|
Headers | show |
Series | irqchip/apple-aic: Add support for AICv2 | expand |
On Thu, 09 Dec 2021 04:32:45 +0000, Hector Martin <marcan@marcan.st> wrote: > > The newer AICv2 present in t600x SoCs does not have legacy IPI support > at all. Since t8103 also supports Fast IPIs, implement support for this > first. The legacy IPI code is left as a fallback, so it can be > potentially used by older SoCs in the future. > > The vIPI code is shared; only the IPI firing/acking bits change for Fast > IPIs. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/irqchip/irq-apple-aic.c | 112 ++++++++++++++++++++++++++++---- > 1 file changed, 98 insertions(+), 14 deletions(-) > > diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c > index 3759dc36cc8f..1aa63580cae4 100644 > --- a/drivers/irqchip/irq-apple-aic.c > +++ b/drivers/irqchip/irq-apple-aic.c > @@ -24,7 +24,7 @@ > * - Default "this CPU" register view and explicit per-CPU views > * > * In addition, this driver also handles FIQs, as these are routed to the same > - * IRQ vector. These are used for Fast IPIs (TODO), the ARMv8 timer IRQs, and > + * IRQ vector. These are used for Fast IPIs, the ARMv8 timer IRQs, and > * performance counters (TODO). > * > * Implementation notes: > @@ -106,7 +106,6 @@ > > /* > * IMP-DEF sysregs that control FIQ sources > - * Note: sysreg-based IPIs are not supported yet. > */ > > /* Core PMC control register */ > @@ -155,6 +154,10 @@ > #define SYS_IMP_APL_UPMSR_EL1 sys_reg(3, 7, 15, 6, 4) > #define UPMSR_IACT BIT(0) > > +/* MPIDR fields */ > +#define MPIDR_CPU GENMASK(7, 0) > +#define MPIDR_CLUSTER GENMASK(15, 8) This should be defined in terms of MPIDR_AFFINITY_LEVEL() and co. > + > #define AIC_NR_FIQ 4 > #define AIC_NR_SWIPI 32 > > @@ -173,12 +176,42 @@ > #define AIC_TMR_EL02_PHYS AIC_TMR_GUEST_PHYS > #define AIC_TMR_EL02_VIRT AIC_TMR_GUEST_VIRT > > +struct aic_info { > + int version; > + > + /* Features */ > + bool fast_ipi; > +}; > + > +static const struct aic_info aic1_info = { > + .version = 1, > +}; > + > +static const struct aic_info aic1_fipi_info = { > + .version = 1, > + > + .fast_ipi = true, Do you anticipate multiple feature flags like this? If so, maybe we should consider biting the bullet and making this an unsigned long populated with discrete flags. Not something we need to decide now though. > +}; > + > +static const struct of_device_id aic_info_match[] = { > + { > + .compatible = "apple,t8103-aic", > + .data = &aic1_fipi_info, > + }, > + { > + .compatible = "apple,aic", > + .data = &aic1_info, > + }, > + {} > +}; > + > struct aic_irq_chip { > void __iomem *base; > struct irq_domain *hw_domain; > struct irq_domain *ipi_domain; > int nr_hw; > - int ipi_hwirq; > + > + struct aic_info info; > }; > > static DEFINE_PER_CPU(uint32_t, aic_fiq_unmasked); > @@ -387,8 +420,12 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs) > */ > > if (read_sysreg_s(SYS_IMP_APL_IPI_SR_EL1) & IPI_SR_PENDING) { > - pr_err_ratelimited("Fast IPI fired. Acking.\n"); > - write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1); > + if (aic_irqc->info.fast_ipi) { On the other hand, this is likely to hit on the fast path. Given that we know at probe time whether we support SR-based IPIs, we can turn this into a static key and save a few fetches on every IPI. It applies everywhere you look at this flag at runtime. > + aic_handle_ipi(regs); > + } else { > + pr_err_ratelimited("Fast IPI fired. Acking.\n"); > + write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1); > + } > } > > if (TIMER_FIRING(read_sysreg(cntp_ctl_el0))) > @@ -564,6 +601,21 @@ static const struct irq_domain_ops aic_irq_domain_ops = { > * IPI irqchip > */ > > +static void aic_ipi_send_fast(int cpu) > +{ > + u64 mpidr = cpu_logical_map(cpu); > + u64 my_mpidr = cpu_logical_map(smp_processor_id()); This is the equivalent of reading MPIDR_EL1. My gut feeling is that it is a bit faster to access the sysreg than a percpu lookup, a function call and another memory access. > + u64 cluster = FIELD_GET(MPIDR_CLUSTER, mpidr); See my earlier comment. This really should be: u64 cluster = MIPDR_AFFINITY_LEVEL(mpidr, 1); rather than inventing a new decoding scheme. > + u64 idx = FIELD_GET(MPIDR_CPU, mpidr); > + > + if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster) > + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx), > + SYS_IMP_APL_IPI_RR_LOCAL_EL1); > + else > + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster), > + SYS_IMP_APL_IPI_RR_GLOBAL_EL1); Don't you need an ISB, either here or in the two callers? At the moment, I don't see what will force the execution of these writes, and they could be arbitrarily delayed. > +} > + > static void aic_ipi_mask(struct irq_data *d) > { > u32 irq_bit = BIT(irqd_to_hwirq(d)); > @@ -589,8 +641,12 @@ static void aic_ipi_unmask(struct irq_data *d) > * If a pending vIPI was unmasked, raise a HW IPI to ourselves. > * No barriers needed here since this is a self-IPI. > */ > - if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) > - aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id())); > + if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) { > + if (ic->info.fast_ipi) > + aic_ipi_send_fast(smp_processor_id()); nit: if this is common enough, maybe having an aic_ipi_send_self_fast could be better. Needs evaluation though. > + else > + aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id())); > + } > } > > static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) > @@ -618,8 +674,12 @@ static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) > smp_mb__after_atomic(); > > if (!(pending & irq_bit) && > - (atomic_read(per_cpu_ptr(&aic_vipi_enable, cpu)) & irq_bit)) > - send |= AIC_IPI_SEND_CPU(cpu); > + (atomic_read(per_cpu_ptr(&aic_vipi_enable, cpu)) & irq_bit)) { > + if (ic->info.fast_ipi) > + aic_ipi_send_fast(cpu); > + else > + send |= AIC_IPI_SEND_CPU(cpu); > + } > } > > /* > @@ -651,8 +711,16 @@ static void aic_handle_ipi(struct pt_regs *regs) > /* > * Ack the IPI. We need to order this after the AIC event read, but > * that is enforced by normal MMIO ordering guarantees. > + * > + * For the Fast IPI case, this needs to be ordered before the vIPI > + * handling below, so we need to isb(); > */ > - aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER); > + if (aic_irqc->info.fast_ipi) { > + write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1); > + isb(); > + } else { > + aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER); > + } > > /* > * The mask read does not need to be ordered. Only we can change > @@ -680,7 +748,8 @@ static void aic_handle_ipi(struct pt_regs *regs) > * No ordering needed here; at worst this just changes the timing of > * when the next IPI will be delivered. > */ > - aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > + if (!aic_irqc->info.fast_ipi) > + aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > } > > static int aic_ipi_alloc(struct irq_domain *d, unsigned int virq, > @@ -779,8 +848,12 @@ static int aic_init_cpu(unsigned int cpu) > * by AIC during processing). We manage masks at the vIPI level. > */ > aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER); > - aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF); > - aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > + if (!aic_irqc->info.fast_ipi) { > + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF); > + aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > + } else { > + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER); > + } > > /* Initialize the local mask state */ > __this_cpu_write(aic_fiq_unmasked, 0); > @@ -800,6 +873,7 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > void __iomem *regs; > u32 info; > struct aic_irq_chip *irqc; > + const struct of_device_id *match; > > regs = of_iomap(node, 0); > if (WARN_ON(!regs)) > @@ -809,9 +883,16 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > if (!irqc) > return -ENOMEM; > > - aic_irqc = irqc; > irqc->base = regs; > > + match = of_match_node(aic_info_match, node); > + if (!match) > + return -ENODEV; > + > + irqc->info = *(struct aic_info *)match->data; Why the copy? All the data is const, and isn't going away. > + > + aic_irqc = irqc; > + > info = aic_ic_read(irqc, AIC_INFO); > irqc->nr_hw = FIELD_GET(AIC_INFO_NR_HW, info); > > @@ -846,6 +927,9 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > if (!is_kernel_in_hyp_mode()) > pr_info("Kernel running in EL1, mapping interrupts"); > > + if (irqc->info.fast_ipi) > + pr_info("Using Fast IPIs"); > + > cpuhp_setup_state(CPUHP_AP_IRQ_APPLE_AIC_STARTING, > "irqchip/apple-aic/ipi:starting", > aic_init_cpu, NULL); Thanks, M.
On Thu, 09 Dec 2021 04:32:46 +0000, Hector Martin <marcan@marcan.st> wrote: > > This allows us to directly use the hardware event number as the hwirq > number. Since IRQ events have bit 16 set (type=1), FIQs now move to > starting at hwirq number 0. > > This will become more important once multi-die support is introduced in > a later commit. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/irqchip/irq-apple-aic.c | 67 ++++++++++++++++++--------------- > 1 file changed, 36 insertions(+), 31 deletions(-) > > diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c > index 1aa63580cae4..572d1af175fc 100644 > --- a/drivers/irqchip/irq-apple-aic.c > +++ b/drivers/irqchip/irq-apple-aic.c > @@ -66,7 +66,7 @@ > */ > > #define AIC_INFO 0x0004 > -#define AIC_INFO_NR_HW GENMASK(15, 0) > +#define AIC_INFO_NR_IRQ GENMASK(15, 0) > > #define AIC_CONFIG 0x0010 > > @@ -75,6 +75,7 @@ > #define AIC_EVENT_TYPE GENMASK(31, 16) > #define AIC_EVENT_NUM GENMASK(15, 0) > > +#define AIC_EVENT_TYPE_FIQ 0 /* Software use */ What does 'SW use' mean? Are you using the fact that the event register never returns 0 in the top bits? > #define AIC_EVENT_TYPE_HW 1 > #define AIC_EVENT_TYPE_IPI 4 > #define AIC_EVENT_IPI_OTHER 1 > @@ -158,6 +159,8 @@ > #define MPIDR_CPU GENMASK(7, 0) > #define MPIDR_CLUSTER GENMASK(15, 8) > > +#define AIC_FIQ_HWIRQ(x) (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_FIQ) | \ > + FIELD_PREP(AIC_EVENT_NUM, x)) > #define AIC_NR_FIQ 4 > #define AIC_NR_SWIPI 32 > > @@ -209,7 +212,7 @@ struct aic_irq_chip { > void __iomem *base; > struct irq_domain *hw_domain; > struct irq_domain *ipi_domain; > - int nr_hw; > + int nr_irq; > > struct aic_info info; > }; > @@ -239,18 +242,22 @@ static void aic_ic_write(struct aic_irq_chip *ic, u32 reg, u32 val) > > static void aic_irq_mask(struct irq_data *d) > { > + irq_hw_number_t hwirq = irqd_to_hwirq(d); > struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > > - aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irqd_to_hwirq(d)), > - MASK_BIT(irqd_to_hwirq(d))); > + u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq); This expression is used quite a few times, and could use a helper clarifying its purpose (converting the event/hwirq to an index?). 'irq' is a bit of a misnomer too, but I struggle to find another name... > + > + aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irq), MASK_BIT(irq)); > } > > static void aic_irq_unmask(struct irq_data *d) > { > + irq_hw_number_t hwirq = irqd_to_hwirq(d); > struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > > - aic_ic_write(ic, AIC_MASK_CLR + MASK_REG(d->hwirq), > - MASK_BIT(irqd_to_hwirq(d))); > + u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq); > + > + aic_ic_write(ic, AIC_MASK_CLR + MASK_REG(irq), MASK_BIT(irq)); > } > > static void aic_irq_eoi(struct irq_data *d) > @@ -278,7 +285,7 @@ static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs) > irq = FIELD_GET(AIC_EVENT_NUM, event); > > if (type == AIC_EVENT_TYPE_HW) > - generic_handle_domain_irq(aic_irqc->hw_domain, irq); > + generic_handle_domain_irq(aic_irqc->hw_domain, event); > else if (type == AIC_EVENT_TYPE_IPI && irq == 1) > aic_handle_ipi(regs); > else if (event != 0) > @@ -310,7 +317,7 @@ static int aic_irq_set_affinity(struct irq_data *d, > else > cpu = cpumask_any_and(mask_val, cpu_online_mask); > > - aic_ic_write(ic, AIC_TARGET_CPU + hwirq * 4, BIT(cpu)); > + aic_ic_write(ic, AIC_TARGET_CPU + FIELD_GET(AIC_EVENT_NUM, hwirq) * 4, BIT(cpu)); > irq_data_update_effective_affinity(d, cpumask_of(cpu)); > > return IRQ_SET_MASK_OK; > @@ -340,9 +347,7 @@ static struct irq_chip aic_chip = { > > static unsigned long aic_fiq_get_idx(struct irq_data *d) > { > - struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > - > - return irqd_to_hwirq(d) - ic->nr_hw; > + return FIELD_GET(AIC_EVENT_NUM, irqd_to_hwirq(d)); > } > > static void aic_fiq_set_mask(struct irq_data *d) > @@ -430,11 +435,11 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs) > > if (TIMER_FIRING(read_sysreg(cntp_ctl_el0))) > generic_handle_domain_irq(aic_irqc->hw_domain, > - aic_irqc->nr_hw + AIC_TMR_EL0_PHYS); > + AIC_FIQ_HWIRQ(AIC_TMR_EL0_PHYS)); > > if (TIMER_FIRING(read_sysreg(cntv_ctl_el0))) > generic_handle_domain_irq(aic_irqc->hw_domain, > - aic_irqc->nr_hw + AIC_TMR_EL0_VIRT); > + AIC_FIQ_HWIRQ(AIC_TMR_EL0_VIRT)); > > if (is_kernel_in_hyp_mode()) { > uint64_t enabled = read_sysreg_s(SYS_IMP_APL_VM_TMR_FIQ_ENA_EL2); > @@ -442,12 +447,12 @@ static void __exception_irq_entry aic_handle_fiq(struct pt_regs *regs) > if ((enabled & VM_TMR_FIQ_ENABLE_P) && > TIMER_FIRING(read_sysreg_s(SYS_CNTP_CTL_EL02))) > generic_handle_domain_irq(aic_irqc->hw_domain, > - aic_irqc->nr_hw + AIC_TMR_EL02_PHYS); > + AIC_FIQ_HWIRQ(AIC_TMR_EL02_PHYS)); > > if ((enabled & VM_TMR_FIQ_ENABLE_V) && > TIMER_FIRING(read_sysreg_s(SYS_CNTV_CTL_EL02))) > generic_handle_domain_irq(aic_irqc->hw_domain, > - aic_irqc->nr_hw + AIC_TMR_EL02_VIRT); > + AIC_FIQ_HWIRQ(AIC_TMR_EL02_VIRT)); > } > > if ((read_sysreg_s(SYS_IMP_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) == > @@ -492,13 +497,13 @@ static struct irq_chip fiq_chip = { > static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq, > irq_hw_number_t hw) > { > - struct aic_irq_chip *ic = id->host_data; > + u32 type = FIELD_GET(AIC_EVENT_TYPE, hw); > > - if (hw < ic->nr_hw) { > + if (type == AIC_EVENT_TYPE_HW) { > irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data, > handle_fasteoi_irq, NULL, NULL); > irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq))); > - } else { > + } else if (type == AIC_EVENT_TYPE_FIQ) { Do we need to check for FIQ? This should be the case by construction, right? If there is a risk that it isn't the case, then we probably need a default case (and the whole thing would be better written as a switch() statement). > irq_set_percpu_devid(irq); > irq_domain_set_info(id, irq, hw, &fiq_chip, id->host_data, > handle_percpu_devid_irq, NULL, NULL); > @@ -519,14 +524,15 @@ static int aic_irq_domain_translate(struct irq_domain *id, > > switch (fwspec->param[0]) { > case AIC_IRQ: > - if (fwspec->param[1] >= ic->nr_hw) > + if (fwspec->param[1] >= ic->nr_irq) > return -EINVAL; > - *hwirq = fwspec->param[1]; > + *hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) | > + FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1])); > break; > case AIC_FIQ: > if (fwspec->param[1] >= AIC_NR_FIQ) > return -EINVAL; > - *hwirq = ic->nr_hw + fwspec->param[1]; > + *hwirq = AIC_FIQ_HWIRQ(fwspec->param[1]); > > /* > * In EL1 the non-redirected registers are the guest's, > @@ -535,10 +541,10 @@ static int aic_irq_domain_translate(struct irq_domain *id, > if (!is_kernel_in_hyp_mode()) { > switch (fwspec->param[1]) { > case AIC_TMR_GUEST_PHYS: > - *hwirq = ic->nr_hw + AIC_TMR_EL0_PHYS; > + *hwirq = AIC_FIQ_HWIRQ(AIC_TMR_EL0_PHYS); > break; > case AIC_TMR_GUEST_VIRT: > - *hwirq = ic->nr_hw + AIC_TMR_EL0_VIRT; > + *hwirq = AIC_FIQ_HWIRQ(AIC_TMR_EL0_VIRT); > break; > case AIC_TMR_HV_PHYS: > case AIC_TMR_HV_VIRT: > @@ -894,11 +900,10 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > aic_irqc = irqc; > > info = aic_ic_read(irqc, AIC_INFO); > - irqc->nr_hw = FIELD_GET(AIC_INFO_NR_HW, info); > + irqc->nr_irq = FIELD_GET(AIC_INFO_NR_IRQ, info); > > - irqc->hw_domain = irq_domain_create_linear(of_node_to_fwnode(node), > - irqc->nr_hw + AIC_NR_FIQ, > - &aic_irq_domain_ops, irqc); > + irqc->hw_domain = irq_domain_create_tree(of_node_to_fwnode(node), > + &aic_irq_domain_ops, irqc); > if (WARN_ON(!irqc->hw_domain)) { > iounmap(irqc->base); > kfree(irqc); > @@ -917,11 +922,11 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > set_handle_irq(aic_handle_irq); > set_handle_fiq(aic_handle_fiq); > > - for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++) > + for (i = 0; i < BITS_TO_U32(irqc->nr_irq); i++) > aic_ic_write(irqc, AIC_MASK_SET + i * 4, U32_MAX); > - for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++) > + for (i = 0; i < BITS_TO_U32(irqc->nr_irq); i++) > aic_ic_write(irqc, AIC_SW_CLR + i * 4, U32_MAX); > - for (i = 0; i < irqc->nr_hw; i++) > + for (i = 0; i < irqc->nr_irq; i++) > aic_ic_write(irqc, AIC_TARGET_CPU + i * 4, 1); > > if (!is_kernel_in_hyp_mode()) > @@ -937,7 +942,7 @@ static int __init aic_of_ic_init(struct device_node *node, struct device_node *p > vgic_set_kvm_info(&vgic_info); > > pr_info("Initialized with %d IRQs, %d FIQs, %d vIPIs\n", > - irqc->nr_hw, AIC_NR_FIQ, AIC_NR_SWIPI); > + irqc->nr_irq, AIC_NR_FIQ, AIC_NR_SWIPI); > > return 0; > } Thanks, M.
On Thu, 09 Dec 2021 04:32:47 +0000, Hector Martin <marcan@marcan.st> wrote: > > This allows us to support AIC variants with different numbers of IRQs > based on capability registers. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/irqchip/irq-apple-aic.c | 73 +++++++++++++++++++++++++-------- > 1 file changed, 56 insertions(+), 17 deletions(-) > > diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c > index 572d1af175fc..d03caed51d56 100644 > --- a/drivers/irqchip/irq-apple-aic.c > +++ b/drivers/irqchip/irq-apple-aic.c > @@ -312,12 +326,15 @@ static int aic_irq_set_affinity(struct irq_data *d, > struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > int cpu; > > + if (!ic->info.target_cpu) > + return -EINVAL; Can this even happen? And if it did, this should scream loudly, shouldn't it? M.
On Thu, 09 Dec 2021 04:32:49 +0000, Hector Martin <marcan@marcan.st> wrote: > > Introduce support for the new AICv2 hardware block in t6000/t6001 SoCs. > > It seems these blocks are missing the information required to compute > the event register offset in the capability registers, so we specify > that in the DT. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/irqchip/irq-apple-aic.c | 146 ++++++++++++++++++++++++++++---- > 1 file changed, 128 insertions(+), 18 deletions(-) > > diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c > index 46b7750548a0..226d5232dd14 100644 > --- a/drivers/irqchip/irq-apple-aic.c > +++ b/drivers/irqchip/irq-apple-aic.c > @@ -101,6 +101,57 @@ > > #define AIC_MAX_IRQ 0x400 > > +/* > + * AIC v2 registers (MMIO) > + */ > + > +#define AIC2_VERSION 0x0000 > +#define AIC2_VERSION_VER GENMASK(7, 0) > + > +#define AIC2_INFO1 0x0004 > +#define AIC2_INFO1_NR_IRQ GENMASK(15, 0) > +#define AIC2_INFO1_LAST_DIE GENMASK(27, 24) > + > +#define AIC2_INFO2 0x0008 > + > +#define AIC2_INFO3 0x000c > +#define AIC2_INFO3_MAX_IRQ GENMASK(15, 0) > +#define AIC2_INFO3_MAX_DIE GENMASK(27, 24) > + > +#define AIC2_RESET 0x0010 > +#define AIC2_RESET_RESET BIT(0) > + > +#define AIC2_CONFIG 0x0014 > +#define AIC2_CONFIG_ENABLE BIT(0) > +#define AIC2_CONFIG_PREFER_PCPU BIT(28) > + > +#define AIC2_TIMEOUT 0x0028 > +#define AIC2_CLUSTER_PRIO 0x0030 > +#define AIC2_DELAY_GROUPS 0x0100 > + > +#define AIC2_IRQ_CFG 0x2000 > + > +/* > + * AIC2 registers are laid out like this, starting at AIC2_IRQ_CFG: > + * > + * Repeat for each die: > + * IRQ_CFG: u32 * MAX_IRQS > + * SW_SET: u32 * (MAX_IRQS / 32) > + * SW_CLR: u32 * (MAX_IRQS / 32) > + * MASK_SET: u32 * (MAX_IRQS / 32) > + * MASK_CLR: u32 * (MAX_IRQS / 32) > + * HW_STATE: u32 * (MAX_IRQS / 32) > + * > + * This is followed by a set of event registers, each 16K page aligned. > + * The first one is the AP event register we will use. Unfortunately, > + * the actual implemented die count is not specified anywhere in the > + * capability registers, so we have to explcitly specify the event explicitly > + * register offset in the device tree to remain forward-compatible. Do the current machines actually have more than a single die? > + */ > + > +#define AIC2_IRQ_CFG_TARGET GENMASK(3, 0) > +#define AIC2_IRQ_CFG_DELAY_IDX GENMASK(7, 5) > + > #define MASK_REG(x) (4 * ((x) >> 5)) > #define MASK_BIT(x) BIT((x) & GENMASK(4, 0)) > > @@ -187,6 +238,7 @@ struct aic_info { > /* Register offsets */ > u32 event; > u32 target_cpu; > + u32 irq_cfg; > u32 sw_set; > u32 sw_clr; > u32 mask_set; > @@ -214,6 +266,14 @@ static const struct aic_info aic1_fipi_info = { > .fast_ipi = true, > }; > > +static const struct aic_info aic2_info = { > + .version = 2, > + > + .irq_cfg = AIC2_IRQ_CFG, > + > + .fast_ipi = true, > +}; > + > static const struct of_device_id aic_info_match[] = { > { > .compatible = "apple,t8103-aic", > @@ -223,6 +283,10 @@ static const struct of_device_id aic_info_match[] = { > .compatible = "apple,aic", > .data = &aic1_info, > }, > + { > + .compatible = "apple,aic2", > + .data = &aic2_info, > + }, > {} > }; > > @@ -368,6 +432,14 @@ static struct irq_chip aic_chip = { > .irq_set_type = aic_irq_set_type, > }; > > +static struct irq_chip aic2_chip = { > + .name = "AIC2", > + .irq_mask = aic_irq_mask, > + .irq_unmask = aic_irq_unmask, > + .irq_eoi = aic_irq_eoi, > + .irq_set_type = aic_irq_set_type, > +}; How is the affinity managed if you don't have a callback? A number of things are bound to break if you don't have one. And a description of how an interrupt gets routed wouldn't go amiss! > + > /* > * FIQ irqchip > */ > @@ -524,10 +596,15 @@ static struct irq_chip fiq_chip = { > static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq, > irq_hw_number_t hw) > { > + struct aic_irq_chip *ic = id->host_data; > u32 type = FIELD_GET(AIC_EVENT_TYPE, hw); > + struct irq_chip *chip = &aic_chip; > + > + if (ic->info.version == 2) > + chip = &aic2_chip; > > if (type == AIC_EVENT_TYPE_HW) { > - irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data, > + irq_domain_set_info(id, irq, hw, chip, id->host_data, > handle_fasteoi_irq, NULL, NULL); > irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq))); > } else if (type == AIC_EVENT_TYPE_FIQ) { > @@ -882,23 +959,25 @@ static int aic_init_cpu(unsigned int cpu) > /* Commit all of the above */ > isb(); > > - /* > - * Make sure the kernel's idea of logical CPU order is the same as AIC's > - * If we ever end up with a mismatch here, we will have to introduce > - * a mapping table similar to what other irqchip drivers do. > - */ > - WARN_ON(aic_ic_read(aic_irqc, AIC_WHOAMI) != smp_processor_id()); > + if (aic_irqc->info.version == 1) { > + /* > + * Make sure the kernel's idea of logical CPU order is the same as AIC's > + * If we ever end up with a mismatch here, we will have to introduce > + * a mapping table similar to what other irqchip drivers do. > + */ > + WARN_ON(aic_ic_read(aic_irqc, AIC_WHOAMI) != smp_processor_id()); > > - /* > - * Always keep IPIs unmasked at the hardware level (except auto-masking > - * by AIC during processing). We manage masks at the vIPI level. > - */ > - aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER); > - if (!aic_irqc->info.fast_ipi) { > - aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF); > - aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > - } else { > - aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER); > + /* > + * Always keep IPIs unmasked at the hardware level (except auto-masking > + * by AIC during processing). We manage masks at the vIPI level. > + */ > + aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER); > + if (!aic_irqc->info.fast_ipi) { > + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF); > + aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); > + } else { > + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER); > + } Why is this specific to v1 and not affecting v2? I'm sure there is a good reason, but documenting these differences would certainly help reviewing (which version implement which registers, for example). Thanks, M.
On 12/12/2021 21.21, Marc Zyngier wrote: >> +/* MPIDR fields */ >> +#define MPIDR_CPU GENMASK(7, 0) >> +#define MPIDR_CLUSTER GENMASK(15, 8) > > This should be defined in terms of MPIDR_AFFINITY_LEVEL() and co. Yeah, I found out about that macro from your PMU driver... :) >> +static const struct aic_info aic1_fipi_info = { >> + .version = 1, >> + >> + .fast_ipi = true, > > Do you anticipate multiple feature flags like this? If so, maybe we > should consider biting the bullet and making this an unsigned long > populated with discrete flags. > > Not something we need to decide now though. Probably not, but who knows! It's easy to change it later, though. >> if (read_sysreg_s(SYS_IMP_APL_IPI_SR_EL1) & IPI_SR_PENDING) { >> - pr_err_ratelimited("Fast IPI fired. Acking.\n"); >> - write_sysreg_s(IPI_SR_PENDING, SYS_IMP_APL_IPI_SR_EL1); >> + if (aic_irqc->info.fast_ipi) { > > On the other hand, this is likely to hit on the fast path. Given that > we know at probe time whether we support SR-based IPIs, we can turn > this into a static key and save a few fetches on every IPI. It applies > everywhere you look at this flag at runtime. Good point, I'll see about refactoring this to use static keys. >> +static void aic_ipi_send_fast(int cpu) >> +{ >> + u64 mpidr = cpu_logical_map(cpu); >> + u64 my_mpidr = cpu_logical_map(smp_processor_id()); > > This is the equivalent of reading MPIDR_EL1. My gut feeling is that it > is a bit faster to access the sysreg than a percpu lookup, a function > call and another memory access. Yeah, I saw other IRQ drivers doing this, but I wasn't sure it made sense over just reading MPIDR_EL1... I'll switch to that. >> + u64 idx = FIELD_GET(MPIDR_CPU, mpidr); >> + >> + if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster) >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx), >> + SYS_IMP_APL_IPI_RR_LOCAL_EL1); >> + else >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster), >> + SYS_IMP_APL_IPI_RR_GLOBAL_EL1); > > Don't you need an ISB, either here or in the two callers? At the > moment, I don't see what will force the execution of these writes, and > they could be arbitrarily delayed. Is there any requirement for timeliness sending IPIs? They're going to another CPU after all, they could be arbitrarily delayed because it has FIQs masked. >> - if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) >> - aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(smp_processor_id())); >> + if (atomic_read(this_cpu_ptr(&aic_vipi_flag)) & irq_bit) { >> + if (ic->info.fast_ipi) >> + aic_ipi_send_fast(smp_processor_id()); > > nit: if this is common enough, maybe having an aic_ipi_send_self_fast > could be better. Needs evaluation though. I'll do some printing to see how common self-IPIs are when running common workloads, let's see. If it's common enough it's easy enough to add. >> + irqc->info = *(struct aic_info *)match->data; > > Why the copy? All the data is const, and isn't going away. ... for now, but later patches then start computing register offsets and putting them into this structure :)
On 12/12/2021 23.37, Marc Zyngier wrote: >> @@ -75,6 +75,7 @@ >> #define AIC_EVENT_TYPE GENMASK(31, 16) >> #define AIC_EVENT_NUM GENMASK(15, 0) >> >> +#define AIC_EVENT_TYPE_FIQ 0 /* Software use */ > > What does 'SW use' mean? Are you using the fact that the event > register never returns 0 in the top bits? Yeah. Since we're switching to tree IRQs we can use the raw hardware event as the hwirq, and save some cycles munging fields, but we still need a place for FIQ hwirq numbers to live. Zero here means "no IRQ/spurious" to the hardware, so it's a convenient place to stick FIQs. Note that the top-level IRQ handler function does check this field, so even if newer hardware starts doing something silly here (which I highly doubt...) it would never end up forwarded to the IRQ domain code without further code changes. >> - aic_ic_write(ic, AIC_MASK_SET + MASK_REG(irqd_to_hwirq(d)), >> - MASK_BIT(irqd_to_hwirq(d))); >> + u32 irq = FIELD_GET(AIC_EVENT_NUM, hwirq); > > This expression is used quite a few times, and could use a helper > clarifying its purpose (converting the event/hwirq to an index?). > 'irq' is a bit of a misnomer too, but I struggle to find another > name... Ack, I'll add a helper. It's extracting the IRQ number field from the hwirq number. This is relatively trivial at this point in the patch set (where the only other field is the constant type = 1), but it makes more sense once I add the die field later on. >> @@ -492,13 +497,13 @@ static struct irq_chip fiq_chip = { >> static int aic_irq_domain_map(struct irq_domain *id, unsigned int irq, >> irq_hw_number_t hw) >> { >> - struct aic_irq_chip *ic = id->host_data; >> + u32 type = FIELD_GET(AIC_EVENT_TYPE, hw); >> >> - if (hw < ic->nr_hw) { >> + if (type == AIC_EVENT_TYPE_HW) { >> irq_domain_set_info(id, irq, hw, &aic_chip, id->host_data, >> handle_fasteoi_irq, NULL, NULL); >> irqd_set_single_target(irq_desc_get_irq_data(irq_to_desc(irq))); >> - } else { >> + } else if (type == AIC_EVENT_TYPE_FIQ) { > > Do we need to check for FIQ? This should be the case by construction, > right? If there is a risk that it isn't the case, then we probably > need a default case (and the whole thing would be better written as a > switch() statement). Yes, it should be the case by construction; this can just be an else. I'll change it.
On 13/12/2021 03.26, Marc Zyngier wrote: > On Thu, 09 Dec 2021 04:32:47 +0000, > Hector Martin <marcan@marcan.st> wrote: >> >> This allows us to support AIC variants with different numbers of IRQs >> based on capability registers. >> >> Signed-off-by: Hector Martin <marcan@marcan.st> >> --- >> drivers/irqchip/irq-apple-aic.c | 73 +++++++++++++++++++++++++-------- >> 1 file changed, 56 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/irqchip/irq-apple-aic.c b/drivers/irqchip/irq-apple-aic.c >> index 572d1af175fc..d03caed51d56 100644 >> --- a/drivers/irqchip/irq-apple-aic.c >> +++ b/drivers/irqchip/irq-apple-aic.c >> @@ -312,12 +326,15 @@ static int aic_irq_set_affinity(struct irq_data *d, >> struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); >> int cpu; >> >> + if (!ic->info.target_cpu) >> + return -EINVAL; > > Can this even happen? And if it did, this should scream loudly, > shouldn't it? Yeah, it can't happen, so this really should be a BUG_ON. This is mostly there in case somehow we end up with confusion between AIC versions and register offsets later, since AIC2 does not use this field but also shouldn't be setting up an irqchip that calls this function.
On 14/12/2021 01.10, Marc Zyngier wrote: >> switch (fwspec->param[0]) { >> case AIC_IRQ: >> - if (fwspec->param[1] >= ic->nr_irq) >> + if (die >= ic->nr_die) >> + return -EINVAL; >> + if (args[0] >= ic->nr_irq) >> return -EINVAL; >> - *hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) | >> - FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1])); >> + *hwirq = AIC_IRQ_HWIRQ(die, args[0]); >> break; > > A side issue with this is that it breaks MSIs, due to the way we > construct the intspec (I did hit that when upgrading the M1 intspec to > 4 cells for the PMU). I have the following hack locally: > > diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c > index b090924b41fe..f7b4a67b13cf 100644 > --- a/drivers/pci/controller/pcie-apple.c > +++ b/drivers/pci/controller/pcie-apple.c > @@ -218,7 +218,7 @@ static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, > if (hwirq < 0) > return -ENOSPC; > > - fwspec.param[1] += hwirq; > + fwspec.param[1 + (fwspec.param_count == 4)] += hwirq; > > ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec); > if (ret) > Heh, I never noticed; probably because I guess the SD card reader on the machine I've been testing this on doesn't use MSIs, and I haven't tried WiFi yet. Perhaps (fwspec.param_count - 2)? It's probably a safer long-term assumption that the last two cells will always be leaf IRQ number and type.
On 13/12/2021 03.47, Marc Zyngier wrote: >> + * This is followed by a set of event registers, each 16K page aligned. >> + * The first one is the AP event register we will use. Unfortunately, >> + * the actual implemented die count is not specified anywhere in the >> + * capability registers, so we have to explcitly specify the event > > explicitly Thanks, fixed! > >> + * register offset in the device tree to remain forward-compatible. > > Do the current machines actually have more than a single die? Not the current ones, but there are loud rumors everywhere of multi-die products... might as well try to support them ahead of time. The current machines *do* have two register sets, implying support for 2-die configurations, and although no IRQs are ever asserted from hardware, SW_GEN mode works and you can trigger die-ID 1 events. The interpretation of the capability registers comes from what the macOS driver does (that's the only part I looked at it for, since it's kind of hard to divine with only a single data point from the hardware). Their driver is definitely designed for multi die machines already. The register layout I worked out by probing the hardware; it was blatantly obvious that there was a second set of IRQ mask arrays after the first, that macOS didn't use (yet)... >> +static struct irq_chip aic2_chip = { >> + .name = "AIC2", >> + .irq_mask = aic_irq_mask, >> + .irq_unmask = aic_irq_unmask, >> + .irq_eoi = aic_irq_eoi, >> + .irq_set_type = aic_irq_set_type, >> +}; > > How is the affinity managed if you don't have a callback? A number of > things are bound to break if you don't have one. And a description of > how an interrupt gets routed wouldn't go amiss! It isn't... we don't know all the details yet, but it seems to be Some Kind Of Magicâ„¢. There definitely is no way of individually mapping IRQs to specific CPUs; there just aren't enough implemented register bits to allow that. What we do have is a per-IRQ config consisting of: - Target CPU, 4 bits. This seems to be for pointing IRQs at coprocessors (there's actually an init dance to map a few IRQs to specific coprocessors; m1n1 takes care of that right now*). Only 0 sends IRQs to the AP here, so this is not useful to us. - IRQ config group, 3 bits. This selects one of 8 IRQ config registers. These do indirectly control how the IRQ is delivered; at least they have some kind of delay value (coalescing?) and I suspect may do some kind of priority control, though the details of that aren't clear yet. I don't recall seeing macOS do anything interesting with these groups, I think it always uses group 0. Then each CPU has an IMP-DEF sysreg that allows it to opt-in or opt-out of receiving IRQs (!). It actually has two bits, so there may be some kind of priority/subset control here too. By default all other CPUs are opted out, which isn't great... so m1n1 initializes it to opt in all CPUs to IRQ delivery. The actual delivery flow here seems to be something like AIC/something picks a CPU (using some kind of heuristic/CPU state? I noticed WFI seems to have an effect here) for initial delivery, and if the IRQ isn't acked in a timely manner, it punts and broadcasts the IRQ to all CPUs. The IRQ ack register is shared by all CPUs; I don't know if there is some kind of per-CPU difference in what it can return, but I haven't observed that yet, so I guess whatever CPU gets the IRQ gets to handle anything that is pending. There are also some extra features; e.g. there is definitely a set of registers for measuring IRQ latency (tells you how long it took from IRQ assertion to the CPU acking it). There's also some kind of global control over which CPU *cluster* is tried first for delivery (defaults to e-cluster, but you can change it to either p-cluster). We don't use those right now. So there is definitely room for further research here, but the current state of affairs is the driver doesn't do affinity at all, and IRQs are handled by "some" CPU. In practice, I see a decent (but not completely uniform) spread of which CPU handles any given IRQ. I assume it's something like it prefers a busy CPU, to avoid waking up a core just to handle an IRQ. * I don't know how masks are supposed to be managed for those IRQs used by copros; I guess we'll find out when we get there and notice something is broken if we don't unmask them... but I guess given the IRQ handling flow here, that copro should be doing the masking/unmasking itself. >> + aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_SELF | AIC_IPI_OTHER); >> + if (!aic_irqc->info.fast_ipi) { >> + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF); >> + aic_ic_write(aic_irqc, AIC_IPI_MASK_CLR, AIC_IPI_OTHER); >> + } else { >> + aic_ic_write(aic_irqc, AIC_IPI_MASK_SET, AIC_IPI_SELF | AIC_IPI_OTHER); >> + } > > Why is this specific to v1 and not affecting v2? I'm sure there is a > good reason, but documenting these differences would certainly help > reviewing (which version implement which registers, for example). Only v1 has legacy IPIs (which is why we had to implement Fast IPIs for this one). AIC_IPI_* registers are for AICv1 specifically; other than the event register fields which are the same (but not the register offset itself), and the general concept of the mask/sw_gen/hw_status register arrays, there aren't really any shared registers between AICv1 and AICv2. I'll add a comment to clarify this.
On Sat, 18 Dec 2021 05:31:28 +0000, Hector Martin <marcan@marcan.st> wrote: > > >> + u64 idx = FIELD_GET(MPIDR_CPU, mpidr); > >> + > >> + if (FIELD_GET(MPIDR_CLUSTER, my_mpidr) == cluster) > >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx), > >> + SYS_IMP_APL_IPI_RR_LOCAL_EL1); > >> + else > >> + write_sysreg_s(FIELD_PREP(IPI_RR_CPU, idx) | FIELD_PREP(IPI_RR_CLUSTER, cluster), > >> + SYS_IMP_APL_IPI_RR_GLOBAL_EL1); > > > > Don't you need an ISB, either here or in the two callers? At the > > moment, I don't see what will force the execution of these writes, and > > they could be arbitrarily delayed. > > Is there any requirement for timeliness sending IPIs? They're going to > another CPU after all, they could be arbitrarily delayed because it has > FIQs masked. They absolutely could, but this has a potential impact on the scheduling if you delay it (the vast majority of these IPIs are to indicate to the remote CPU that it needs to go and schedule something else). So there is an incentive for making it happen ASAP. Thanks, M.
On Sat, 18 Dec 2021 05:39:04 +0000, Hector Martin <marcan@marcan.st> wrote: > > On 14/12/2021 01.10, Marc Zyngier wrote: > >> switch (fwspec->param[0]) { > >> case AIC_IRQ: > >> - if (fwspec->param[1] >= ic->nr_irq) > >> + if (die >= ic->nr_die) > >> + return -EINVAL; > >> + if (args[0] >= ic->nr_irq) > >> return -EINVAL; > >> - *hwirq = (FIELD_PREP(AIC_EVENT_TYPE, AIC_EVENT_TYPE_HW) | > >> - FIELD_PREP(AIC_EVENT_NUM, fwspec->param[1])); > >> + *hwirq = AIC_IRQ_HWIRQ(die, args[0]); > >> break; > > > > A side issue with this is that it breaks MSIs, due to the way we > > construct the intspec (I did hit that when upgrading the M1 intspec to > > 4 cells for the PMU). I have the following hack locally: > > > > diff --git a/drivers/pci/controller/pcie-apple.c b/drivers/pci/controller/pcie-apple.c > > index b090924b41fe..f7b4a67b13cf 100644 > > --- a/drivers/pci/controller/pcie-apple.c > > +++ b/drivers/pci/controller/pcie-apple.c > > @@ -218,7 +218,7 @@ static int apple_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, > > if (hwirq < 0) > > return -ENOSPC; > > > > - fwspec.param[1] += hwirq; > > + fwspec.param[1 + (fwspec.param_count == 4)] += hwirq; > > > > ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, &fwspec); > > if (ret) > > > > Heh, I never noticed; probably because I guess the SD card reader on the > machine I've been testing this on doesn't use MSIs, and I haven't tried > WiFi yet. > > Perhaps (fwspec.param_count - 2)? It's probably a safer long-term > assumption that the last two cells will always be leaf IRQ number and type. Yup, that'd work as well, as long as we make this assumption explicit. M.
On Sat, 18 Dec 2021 06:02:24 +0000, Hector Martin <marcan@marcan.st> wrote: > > On 13/12/2021 03.47, Marc Zyngier wrote: > >> + * This is followed by a set of event registers, each 16K page aligned. > >> + * The first one is the AP event register we will use. Unfortunately, > >> + * the actual implemented die count is not specified anywhere in the > >> + * capability registers, so we have to explcitly specify the event > > > > explicitly > > Thanks, fixed! > > > > >> + * register offset in the device tree to remain forward-compatible. > > > > Do the current machines actually have more than a single die? > > Not the current ones, but there are loud rumors everywhere of multi-die > products... might as well try to support them ahead of time. The current > machines *do* have two register sets, implying support for 2-die > configurations, and although no IRQs are ever asserted from hardware, > SW_GEN mode works and you can trigger die-ID 1 events. > > The interpretation of the capability registers comes from what the macOS > driver does (that's the only part I looked at it for, since it's kind of > hard to divine with only a single data point from the hardware). Their > driver is definitely designed for multi die machines already. The > register layout I worked out by probing the hardware; it was blatantly > obvious that there was a second set of IRQ mask arrays after the first, > that macOS didn't use (yet)... > > >> +static struct irq_chip aic2_chip = { > >> + .name = "AIC2", > >> + .irq_mask = aic_irq_mask, > >> + .irq_unmask = aic_irq_unmask, > >> + .irq_eoi = aic_irq_eoi, > >> + .irq_set_type = aic_irq_set_type, > >> +}; > > > > How is the affinity managed if you don't have a callback? A number of > > things are bound to break if you don't have one. And a description of > > how an interrupt gets routed wouldn't go amiss! > > It isn't... we don't know all the details yet, but it seems to be Some > Kind Of Magicâ„¢. > > There definitely is no way of individually mapping IRQs to specific > CPUs; there just aren't enough implemented register bits to allow that. > > What we do have is a per-IRQ config consisting of: > > - Target CPU, 4 bits. This seems to be for pointing IRQs at coprocessors > (there's actually an init dance to map a few IRQs to specific > coprocessors; m1n1 takes care of that right now*). Only 0 sends IRQs to > the AP here, so this is not useful to us. > > - IRQ config group, 3 bits. This selects one of 8 IRQ config registers. > These do indirectly control how the IRQ is delivered; at least they have > some kind of delay value (coalescing?) and I suspect may do some kind of > priority control, though the details of that aren't clear yet. I don't > recall seeing macOS do anything interesting with these groups, I think > it always uses group 0. > > Then each CPU has an IMP-DEF sysreg that allows it to opt-in or opt-out > of receiving IRQs (!). It actually has two bits, so there may be some > kind of priority/subset control here too. By default all other CPUs are > opted out, which isn't great... so m1n1 initializes it to opt in all > CPUs to IRQ delivery. > > The actual delivery flow here seems to be something like AIC/something > picks a CPU (using some kind of heuristic/CPU state? I noticed WFI seems > to have an effect here) for initial delivery, and if the IRQ isn't acked > in a timely manner, it punts and broadcasts the IRQ to all CPUs. The IRQ > ack register is shared by all CPUs; I don't know if there is some kind > of per-CPU difference in what it can return, but I haven't observed that > yet, so I guess whatever CPU gets the IRQ gets to handle anything that > is pending. > > There are also some extra features; e.g. there is definitely a set of > registers for measuring IRQ latency (tells you how long it took from IRQ > assertion to the CPU acking it). There's also some kind of global > control over which CPU *cluster* is tried first for delivery (defaults > to e-cluster, but you can change it to either p-cluster). We don't use > those right now. > > So there is definitely room for further research here, but the current > state of affairs is the driver doesn't do affinity at all, and IRQs are > handled by "some" CPU. In practice, I see a decent (but not completely > uniform) spread of which CPU handles any given IRQ. I assume it's > something like it prefers a busy CPU, to avoid waking up a core just to > handle an IRQ. The main issue with such magic is that a number of things will break in a spectacular way for a bunch of drivers. We have a whole class of (mostly PCI) devices that have per-queue interrupts, each one bound to a CPU core. The drivers fully expect the interrupt for a given queue to fire on a given CPU, and *only* this one as they would, for example, use per-CPU data to get the context of the queue. With an automatic spread of interrupts, this totally breaks. Probably because the core will refuse to use managed interrupts due to the lack of affinity setting callback. And even if you provide a dummy one, it is the endpoint drivers that will explode. The only way I can imagine to make this work is to force these interrupts to be threaded so that the thread can run on a different CPU than the one the interrupt has been taken on. Performance-wise, this is likely to be a pig. I guess we will have to find ways to live with this in the long run, and maybe teach the core code of these weird behaviours. Thanks, M.