Message ID | 20210304213902.83903-1-marcan@marcan.st |
---|---|
Headers | show |
Series | Apple M1 SoC platform bring-up | expand |
On 05/03/2021 06.38, Hector Martin wrote: > == Merge notes == > > This patchset depends on both the nVHE changes that are already in > 5.12-rc1, as well as the FIQ support work currently being reviewed > at [1]. A tree containing this patchset on top of the required > dependencies is available at [2][3]. Alternatively, you may apply > this series on top of Mark's tree at the arm64-fiq-20210302 tag [4][5]. Important warning: these trees are all based on v5.12-rc1, which has a bad bug that causes your filesystems to go kaboom if you use a swap file [1]. This doesn't affect M1 since we don't *have* storage, but for folks testing for regressions on on e.g. Samsung or other ARM boards, please make sure you don't use swap files. [1] https://lore.kernel.org/lkml/CAHk-=wjnzdLSP3oDxhf9eMTYo7GF-QjaNLBUH1Zk3c4A7X75YA@mail.gmail.com/
On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: > This allows the devicetree to correctly represent the available set of > timers, which varies from device to device, without the need for fake > dummy interrupts for unavailable slots. > > Also add the hyp-virt timer/PPI, which is not currently used, but worth > representing. > > Signed-off-by: Hector Martin <marcan@marcan.st> > Reviewed-by: Tony Lindgren <tony@atomide.com> This is the right solution. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: > From: Arnd Bergmann <arnd@arndb.de> > > This adds more detailed descriptions of the various read/write > primitives available for use with I/O memory/ports. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > Signed-off-by: Hector Martin <marcan@marcan.st> Excellent work! Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: > This documents the newly introduced ioremap_np() along with all the > other common ioremap() variants, and some higher-level abstractions > available. > > Signed-off-by: Hector Martin <marcan@marcan.st> I like this, I just want one change: Put the common ioremap() on top in all paragraphs, so the norm comes before the exceptions. I.e. it is weird to mention ioremap_np() before mentioning ioremap(). With that change: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On 04/03/2021 22:38, Hector Martin wrote: > Instead of patching a single global ops structure depending on the port > type, use a separate s3c64xx_serial_ops for the S3C64XX type. This > allows us to mark the structures as const. > > Also split out s3c64xx_serial_shutdown into a separate function now that > we have a separate ops structure; this avoids excessive branching > control flow and mirrors s3c64xx_serial_startup. tx_claimed and > rx_claimed are only used in the S3C24XX functions. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 71 ++++++++++++++++++++++++-------- > 1 file changed, 54 insertions(+), 17 deletions(-) Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:38, Hector Martin wrote: > This simplifies the code by removing the only distinction between the > S3C2410 and S3C2440 codepaths. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/drivers/tty/serial/samsung_tty.c b/drivers/tty/serial/samsung_tty.c > index 78dc6e9240fb..33b421dbeb83 100644 > --- a/drivers/tty/serial/samsung_tty.c > +++ b/drivers/tty/serial/samsung_tty.c > @@ -70,6 +70,7 @@ struct s3c24xx_uart_info { > unsigned long num_clks; > unsigned long clksel_mask; > unsigned long clksel_shift; > + unsigned long ucon_mask; > > /* uart port features */ > > @@ -1736,14 +1737,9 @@ static void s3c24xx_serial_resetport(struct uart_port *port, > { > struct s3c24xx_uart_info *info = s3c24xx_port_to_info(port); > unsigned long ucon = rd_regl(port, S3C2410_UCON); > - unsigned int ucon_mask; > > - ucon_mask = info->clksel_mask; > - if (info->type == PORT_S3C2440) > - ucon_mask |= S3C2440_UCON0_DIVMASK; > - > - ucon &= ucon_mask; > - wr_regl(port, S3C2410_UCON, ucon | cfg->ucon); > + ucon &= (info->clksel_mask | info->ucon_mask); > + wr_regl(port, S3C2410_UCON, ucon | cfg->ucon); This line (wr_regl()) is not related, please split it to separate white-space cleanups. With the change: Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:38, Hector Martin wrote: > This decouples the TTY layer PORT_ types, which are exposed to > userspace, from the driver-internal flag of what kind of port this is. s/This decouples/Decouple > > This removes s3c24xx_serial_has_interrupt_mask, which was just checking s/This removes/Remove/ https://elixir.bootlin.com/linux/latest/source/Documentation/process/submitting-patches.rst#L89 This actually also applies to your patch 19 and 21... and maybe more. > for a specific type anyway. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 112 +++++++++++++++++++------------ > 1 file changed, 70 insertions(+), 42 deletions(-) > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:38, Hector Martin wrote: > * Split out s3c24xx_serial_tx_chars from s3c24xx_serial_tx_irq, > where only the latter acquires the port lock. This will be necessary > on platforms which have edge-triggered IRQs, as we need to call > s3c24xx_serial_tx_chars to kick off transmission from outside IRQ > context, with the port lock held. > > * Rename s3c24xx_serial_rx_chars to s3c24xx_serial_rx_irq for > consistency with the above. All it does now is call two other > functions anyway. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 34 +++++++++++++++++++------------- > 1 file changed, 20 insertions(+), 14 deletions(-) > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:38, Hector Martin wrote: > This picks up the non-posted I/O mode needed for Apple platforms to > work properly. > > This removes the request/release functions, which are no longer > necessary, since devm_ioremap_resource takes care of that already. Most > other drivers already do it this way, anyway. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 25 +++---------------------- > 1 file changed, 3 insertions(+), 22 deletions(-) Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:39, Hector Martin wrote: > Earlycon support is identical to S3C2410, but Apple SoCs also need > MMIO mapped as nGnRnE. This is handled generically for normal drivers > including the normal UART path here, but earlycon uses fixmap and > runs before that scaffolding is ready. > > Since this is the only case where we need this fix, it makes more > sense to do it here in the UART driver instead of introducing a > whole fdt nonposted-mmio resolver just for earlycon/fixmap. > > Suggested-by: Arnd Bergmann <arnd@arndb.de> > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:38, Hector Martin wrote: > Apple SoCs are a distant descendant of Samsung designs and use yet > another variant of their UART style, with different interrupt handling. > > In particular, this variant has the following differences with existing > ones: > > * It includes a built-in interrupt controller with different registers, > using only a single platform IRQ > > * Internal interrupt sources are treated as edge-triggered, even though > the IRQ output is level-triggered. This chiefly affects the TX IRQ > path: the driver can no longer rely on the TX buffer empty IRQ > immediately firing after TX is enabled, but instead must prime the > FIFO with data directly. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/Kconfig | 2 +- > drivers/tty/serial/samsung_tty.c | 238 +++++++++++++++++++++++++++++-- > include/linux/serial_s3c.h | 16 +++ > 3 files changed, 247 insertions(+), 9 deletions(-) > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Best regards, Krzysztof
On 04/03/2021 22:39, Hector Martin wrote: > This currently supports: > > * SMP (via spin-tables) > * AIC IRQs > * Serial (with earlycon) > * Framebuffer > > A number of properties are dynamic, and based on system firmware > decisions that vary from version to version. These are expected > to be filled in by the loader. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > MAINTAINERS | 1 + > arch/arm64/boot/dts/Makefile | 1 + > arch/arm64/boot/dts/apple/Makefile | 2 + > arch/arm64/boot/dts/apple/t8103-j274.dts | 45 ++++++++ > arch/arm64/boot/dts/apple/t8103.dtsi | 135 +++++++++++++++++++++++ > 5 files changed, 184 insertions(+) > create mode 100644 arch/arm64/boot/dts/apple/Makefile > create mode 100644 arch/arm64/boot/dts/apple/t8103-j274.dts > create mode 100644 arch/arm64/boot/dts/apple/t8103.dtsi > > diff --git a/MAINTAINERS b/MAINTAINERS > index 28bd46f4f7a7..d5e4d93a536a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1647,6 +1647,7 @@ C: irc://chat.freenode.net/asahi-dev > T: git https://github.com/AsahiLinux/linux.git > F: Documentation/devicetree/bindings/arm/apple.yaml > F: Documentation/devicetree/bindings/interrupt-controller/apple,aic.yaml > +F: arch/arm64/boot/dts/apple/ > F: arch/arm64/include/asm/sysreg_apple.h > F: drivers/irqchip/irq-apple-aic.c > F: include/dt-bindings/interrupt-controller/apple-aic.h > diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile > index f1173cd93594..639e01a4d855 100644 > --- a/arch/arm64/boot/dts/Makefile > +++ b/arch/arm64/boot/dts/Makefile > @@ -6,6 +6,7 @@ subdir-y += amazon > subdir-y += amd > subdir-y += amlogic > subdir-y += apm > +subdir-y += apple > subdir-y += arm > subdir-y += bitmain > subdir-y += broadcom > diff --git a/arch/arm64/boot/dts/apple/Makefile b/arch/arm64/boot/dts/apple/Makefile > new file mode 100644 > index 000000000000..cbbd701ebf05 > --- /dev/null > +++ b/arch/arm64/boot/dts/apple/Makefile > @@ -0,0 +1,2 @@ > +# SPDX-License-Identifier: GPL-2.0 > +dtb-$(CONFIG_ARCH_APPLE) += t8103-j274.dtb > diff --git a/arch/arm64/boot/dts/apple/t8103-j274.dts b/arch/arm64/boot/dts/apple/t8103-j274.dts > new file mode 100644 > index 000000000000..8afc2ed70361 > --- /dev/null > +++ b/arch/arm64/boot/dts/apple/t8103-j274.dts > @@ -0,0 +1,45 @@ > +// SPDX-License-Identifier: GPL-2.0+ OR MIT > +/* > + * Apple Mac mini (M1, 2020) > + * > + * target-type: J174 > + * > + * Copyright The Asahi Linux Contributors > + */ > + > +/dts-v1/; > + > +#include "t8103.dtsi" > + > +/ { > + compatible = "apple,j274", "apple,t8103", "apple,arm-platform"; > + model = "Apple Mac mini (M1, 2020)"; > + > + aliases { > + serial0 = &serial0; > + }; > + > + chosen { > + #address-cells = <2>; > + #size-cells = <2>; > + ranges; > + > + stdout-path = "serial0"; > + > + framebuffer0: framebuffer@0 { > + compatible = "apple,simple-framebuffer", "simple-framebuffer"; > + reg = <0 0 0 0>; /* To be filled by loader */ > + /* Format properties will be added by loader */ > + status = "disabled"; > + }; > + }; > + > + memory@800000000 { > + device_type = "memory"; > + reg = <0x8 0 0x2 0>; /* To be filled by loader */ Shouldn't this be 0x800000000 with ~0x80000000 length (or whatever is more common)? Or did I miss some ranges? Best regards, Krzysztof
On 05/03/2021 20.03, Krzysztof Kozlowski wrote: >> + memory@800000000 { >> + device_type = "memory"; >> + reg = <0x8 0 0x2 0>; /* To be filled by loader */ > > Shouldn't this be 0x800000000 with ~0x80000000 length (or whatever is > more common)? Or did I miss some ranges? The base model has 8GB of RAM, and RAM always starts at 0x800000000, hence that reg property. It's not actually useful to try to boot Linux like this, because it'll step all over device carveouts on both ends and break, but since those are potentially dynamic it doesn't really make sense to use a more specific example for the dts. E.g. on my system, with my current firmware version, this ends up getting patched to: reg = <0x8 0x0134c000 0x1 0xda294000> Thanks,
On 05/03/2021 12:14, Hector Martin wrote: > On 05/03/2021 20.03, Krzysztof Kozlowski wrote: >>> + memory@800000000 { >>> + device_type = "memory"; >>> + reg = <0x8 0 0x2 0>; /* To be filled by loader */ >> >> Shouldn't this be 0x800000000 with ~0x80000000 length (or whatever is >> more common)? Or did I miss some ranges? > > The base model has 8GB of RAM, and RAM always starts at 0x800000000, > hence that reg property. Ah, I messed up the unit addressing and number of zeros... it's OK. Best regards, Krzysztof
On Thu, Mar 4, 2021 at 11:40 PM Hector Martin <marcan@marcan.st> wrote: > > ARM64 currently defaults to posted MMIO (nGnRnE), but some devices > require the use of non-posted MMIO (nGnRE). Introduce a new ioremap() > variant to handle this case. ioremap_np() is aliased to ioremap() by > default on arches that do not implement this variant. Hmm... But isn't it basically a requirement to those device drivers to use readX()/writeX() instead of readX_relaxed() / writeX_relaxed()? ... > #define IORESOURCE_MEM_32BIT (3<<3) > #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */ > #define IORESOURCE_MEM_EXPANSIONROM (1<<6) > +#define IORESOURCE_MEM_NONPOSTED (1<<7) Not sure it's the right location (in a bit field) for this flag.
On Thu, Mar 4, 2021 at 11:41 PM Hector Martin <marcan@marcan.st> wrote: > > This is the root interrupt controller used on Apple ARM SoCs such as the > M1. This irqchip driver performs multiple functions: > > * Handles both IRQs and FIQs > > * Drives the AIC peripheral itself (which handles IRQs) > > * Dispatches FIQs to downstream hard-wired clients (currently the ARM > timer). > > * Implements a virtual IPI multiplexer to funnel multiple Linux IPIs > into a single hardware IPI ... > + * - <0 nr flags> - hwirq #nr > + * - <1 nr flags> - FIQ #nr > + * - nr=0 Physical HV timer > + * - nr=1 Virtual HV timer > + * - nr=2 Physical guest timer > + * - nr=3 Virtual guest timer > + * Unneeded blank line. > + */ ... > +#define pr_fmt(fmt) "%s: " fmt, __func__ This is not needed, really, if you have unique / distinguishable messages in the first place. Rather people include module names, which may be useful. ... > +#define MASK_REG(x) (4 * ((x) >> 5)) > +#define MASK_BIT(x) BIT((x) & 0x1f) GENMASK(4,0) ... > +/* > + * Max 31 bits in IPI SEND register (top bit is self). > + * >=32-core chips will need code changes anyway. > + */ > +#define AIC_MAX_CPUS 31 I would put it as (32 - 1) to show that the register is actually 32-bit long. ... > +static atomic_t aic_vipi_flag[AIC_MAX_CPUS]; > +static atomic_t aic_vipi_enable[AIC_MAX_CPUS]; Isn't it easier to handle these when they are full width, i.e. 32 items per the array? ... > +static int aic_irq_set_affinity(struct irq_data *d, > + const struct cpumask *mask_val, bool force) > +{ > + irq_hw_number_t hwirq = irqd_to_hwirq(d); > + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > + int cpu; > + > + if (hwirq > ic->nr_hw) >= ? > + return -EINVAL; > + > + if (force) > + cpu = cpumask_first(mask_val); > + else > + cpu = cpumask_any_and(mask_val, cpu_online_mask); > + > + aic_ic_write(ic, AIC_TARGET_CPU + hwirq * 4, BIT(cpu)); > + irq_data_update_effective_affinity(d, cpumask_of(cpu)); > + > + return IRQ_SET_MASK_OK; > +} ... > +static void aic_fiq_mask(struct irq_data *d) > +{ > + /* Only the guest timers have real mask bits, unfortunately. */ > + switch (d->hwirq) { > + case AIC_TMR_GUEST_PHYS: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, VM_TMR_FIQ_ENABLE_P, 0); > + break; > + case AIC_TMR_GUEST_VIRT: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, VM_TMR_FIQ_ENABLE_V, 0); > + break; default case? // some compilers may not be happy Ditto for all similar places in the series. > + } > +} ... > +#define TIMER_FIRING(x) \ > + (((x) & (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_MASK | \ > + ARCH_TIMER_CTRL_IT_STAT)) == \ > + (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_STAT)) It's a bit hard to read. Perhaps #define FOO_MASK (_ENABLE | _STAT) #define _FIRING ... (FOO_MASK | _MASK == FOO_MASK) ? ... > + if ((read_sysreg_s(SYS_APL_PMCR0_EL1) & (PMCR0_IMODE | PMCR0_IACT)) > + == (FIELD_PREP(PMCR0_IMODE, PMCR0_IMODE_FIQ) | PMCR0_IACT)) { It's better to have == on the previous line. ... > + for_each_set_bit(i, &firing, AIC_NR_SWIPI) { > + handle_domain_irq(aic_irqc->ipi_domain, i, regs); > + } No {} needed. ... > +static int aic_init_smp(struct aic_irq_chip *irqc, struct device_node *node) > +{ > + int base_ipi; Introducing a temporary variable may help with readability ... *d = irqc->hw_domain; > + irqc->ipi_domain = irq_domain_create_linear(irqc->hw_domain->fwnode, AIC_NR_SWIPI, > + &aic_ipi_domain_ops, irqc); > + if (WARN_ON(!irqc->ipi_domain)) > + return -ENODEV; > + > + irqc->ipi_domain->flags |= IRQ_DOMAIN_FLAG_IPI_SINGLE; > + irq_domain_update_bus_token(irqc->ipi_domain, DOMAIN_BUS_IPI); > + > + base_ipi = __irq_domain_alloc_irqs(irqc->ipi_domain, -1, AIC_NR_SWIPI, > + NUMA_NO_NODE, NULL, false, NULL); > + > + if (WARN_ON(!base_ipi)) { > + irq_domain_remove(irqc->ipi_domain); > + return -ENODEV; > + } > + > + set_smp_ipi_range(base_ipi, AIC_NR_SWIPI); > + > + return 0; > +} ... > + return 0; > + Extra blank line. ... > + irqc->hw_domain = irq_domain_create_linear(of_node_to_fwnode(node), > + irqc->nr_hw + AIC_NR_FIQ, > + &aic_irq_domain_ops, irqc); If you are sure it will be always OF-only, why not to use irq_domain_add_linear()? ... > + for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++) > + aic_ic_write(irqc, AIC_MASK_SET + i * 4, ~0); > + for (i = 0; i < BITS_TO_U32(irqc->nr_hw); i++) > + aic_ic_write(irqc, AIC_SW_CLR + i * 4, ~0); ~0 is a beast when it suddenly gets into > int size. I would recommend using either GENMASK() if it's a bit field, or type_MAX values if it's a plain number.
On Fri, Mar 5, 2021 at 12:25 PM Linus Walleij <linus.walleij@linaro.org> wrote: > On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: > > > This documents the newly introduced ioremap_np() along with all the > > other common ioremap() variants, and some higher-level abstractions > > available. > > > > Signed-off-by: Hector Martin <marcan@marcan.st> > > I like this, I just want one change: > > Put the common ioremap() on top in all paragraphs, so the norm > comes before the exceptions. > > I.e. it is weird to mention ioremap_np() before mentioning ioremap(). +1 here. That is what I have stumbled upon reading carefully.
On Thu, Mar 4, 2021 at 11:41 PM Hector Martin <marcan@marcan.st> wrote: > > * Split out s3c24xx_serial_tx_chars from s3c24xx_serial_tx_irq, > where only the latter acquires the port lock. This will be necessary > on platforms which have edge-triggered IRQs, as we need to call > s3c24xx_serial_tx_chars to kick off transmission from outside IRQ > context, with the port lock held. > > * Rename s3c24xx_serial_rx_chars to s3c24xx_serial_rx_irq for > consistency with the above. All it does now is call two other > functions anyway. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > drivers/tty/serial/samsung_tty.c | 34 +++++++++++++++++++------------- > 1 file changed, 20 insertions(+), 14 deletions(-) > > diff --git a/drivers/tty/serial/samsung_tty.c b/drivers/tty/serial/samsung_tty.c > index 39b2eb165bdc..7106eb238d8c 100644 > --- a/drivers/tty/serial/samsung_tty.c > +++ b/drivers/tty/serial/samsung_tty.c > @@ -827,7 +827,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_pio(void *dev_id) > return IRQ_HANDLED; > } > > -static irqreturn_t s3c24xx_serial_rx_chars(int irq, void *dev_id) > +static irqreturn_t s3c24xx_serial_rx_irq(int irq, void *dev_id) > { > struct s3c24xx_uart_port *ourport = dev_id; > > @@ -836,16 +836,12 @@ static irqreturn_t s3c24xx_serial_rx_chars(int irq, void *dev_id) > return s3c24xx_serial_rx_chars_pio(dev_id); > } > > -static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id) > +static void s3c24xx_serial_tx_chars(struct s3c24xx_uart_port *ourport) > { > - struct s3c24xx_uart_port *ourport = id; > struct uart_port *port = &ourport->port; > struct circ_buf *xmit = &port->state->xmit; > - unsigned long flags; > int count, dma_count = 0; > > - spin_lock_irqsave(&port->lock, flags); > - > count = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE); > > if (ourport->dma && ourport->dma->tx_chan && > @@ -862,7 +858,7 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id) > wr_reg(port, S3C2410_UTXH, port->x_char); > port->icount.tx++; > port->x_char = 0; > - goto out; > + return; > } > > /* if there isn't anything more to transmit, or the uart is now > @@ -871,7 +867,7 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id) > > if (uart_circ_empty(xmit) || uart_tx_stopped(port)) { > s3c24xx_serial_stop_tx(port); > - goto out; > + return; > } > > /* try and drain the buffer... */ > @@ -893,7 +889,7 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id) > > if (!count && dma_count) { > s3c24xx_serial_start_tx_dma(ourport, dma_count); > - goto out; > + return; > } > > if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) { > @@ -904,8 +900,18 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id) > > if (uart_circ_empty(xmit)) > s3c24xx_serial_stop_tx(port); > +} > + > +static irqreturn_t s3c24xx_serial_tx_irq(int irq, void *id) > +{ > + struct s3c24xx_uart_port *ourport = id; > + struct uart_port *port = &ourport->port; > + unsigned long flags; > + > + spin_lock_irqsave(&port->lock, flags); > + > + s3c24xx_serial_tx_chars(ourport); > > -out: > spin_unlock_irqrestore(&port->lock, flags); Add a separate change that removes flags from the spin lock in the IRQ handler. > return IRQ_HANDLED; > } > @@ -919,11 +925,11 @@ static irqreturn_t s3c64xx_serial_handle_irq(int irq, void *id) > irqreturn_t ret = IRQ_HANDLED; > > if (pend & S3C64XX_UINTM_RXD_MSK) { > - ret = s3c24xx_serial_rx_chars(irq, id); > + ret = s3c24xx_serial_rx_irq(irq, id); > wr_regl(port, S3C64XX_UINTP, S3C64XX_UINTM_RXD_MSK); > } > if (pend & S3C64XX_UINTM_TXD_MSK) { > - ret = s3c24xx_serial_tx_chars(irq, id); > + ret = s3c24xx_serial_tx_irq(irq, id); > wr_regl(port, S3C64XX_UINTP, S3C64XX_UINTM_TXD_MSK); > } > return ret; > @@ -1155,7 +1161,7 @@ static int s3c24xx_serial_startup(struct uart_port *port) > > ourport->rx_enabled = 1; > > - ret = request_irq(ourport->rx_irq, s3c24xx_serial_rx_chars, 0, > + ret = request_irq(ourport->rx_irq, s3c24xx_serial_rx_irq, 0, > s3c24xx_serial_portname(port), ourport); > > if (ret != 0) { > @@ -1169,7 +1175,7 @@ static int s3c24xx_serial_startup(struct uart_port *port) > > ourport->tx_enabled = 1; > > - ret = request_irq(ourport->tx_irq, s3c24xx_serial_tx_chars, 0, > + ret = request_irq(ourport->tx_irq, s3c24xx_serial_tx_irq, 0, > s3c24xx_serial_portname(port), ourport); > > if (ret) { > -- > 2.30.0 >
On Fri, Mar 5, 2021 at 12:55 PM Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> wrote: > > On 04/03/2021 22:38, Hector Martin wrote: > > This picks up the non-posted I/O mode needed for Apple platforms to > > work properly. > > > > This removes the request/release functions, which are no longer > > necessary, since devm_ioremap_resource takes care of that already. Most > > other drivers already do it this way, anyway. > > For the patches 18-22, with Krzysztof's and mine comments addressed Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > > Signed-off-by: Hector Martin <marcan@marcan.st> > > --- > > drivers/tty/serial/samsung_tty.c | 25 +++---------------------- > > 1 file changed, 3 insertions(+), 22 deletions(-) > > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > Tested-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> > > Best regards, > Krzysztof
On 05/03/2021 23.45, Andy Shevchenko wrote: > On Thu, Mar 4, 2021 at 11:40 PM Hector Martin <marcan@marcan.st> wrote: >> >> ARM64 currently defaults to posted MMIO (nGnRnE), but some devices >> require the use of non-posted MMIO (nGnRE). Introduce a new ioremap() >> variant to handle this case. ioremap_np() is aliased to ioremap() by >> default on arches that do not implement this variant. > > Hmm... But isn't it basically a requirement to those device drivers to > use readX()/writeX() instead of readX_relaxed() / writeX_relaxed()? No, the write ops used do not matter. It's just that on these Apple SoCs the fabric requires the mappings to be nGnRnE, else it just throws SErrors on all writes and ignores them. The difference between _relaxed and not is barrier behavior with regards to DMA/memory accesses; this applies regardless of whether the writes are E or nE. You can have relaxed accesses with nGnRnE and then you would still have race conditions if you do not have a barrier between the MMIO and accessing DMA memory. What nGnRnE buys you (on platforms/buses where it works properly) is that you do need a dummy read after a write to ensure completion. All of this is to some extent moot on these SoCs; it's not that we need the drivers to use nGnRnE for some correctness reason, it's that the SoCs force us to use it or else everything breaks, which was the motivation for this change. But since on most other SoCs both are valid options, this does allow some other drivers/platforms to opt into nGnRnE if they have a good reason to do so. Though you just made me notice two mistakes in the commit description: first, it describes the old v2 version, for v3 I made ioremap_np() just return NULL on arches that don't implement it. Second, nGnRnE and nGnRE are backwards. Oops. I'll fix it for the next version. >> #define IORESOURCE_MEM_32BIT (3<<3) >> #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */ >> #define IORESOURCE_MEM_EXPANSIONROM (1<<6) >> +#define IORESOURCE_MEM_NONPOSTED (1<<7) > > Not sure it's the right location (in a bit field) for this flag. Do you have a better suggestion? It seemed logical to put it here, as a flag on memory-type I/O resources.
On Thu, Mar 4, 2021 at 11:42 PM Hector Martin <marcan@marcan.st> wrote: > > Apple SoCs are a distant descendant of Samsung designs and use yet > another variant of their UART style, with different interrupt handling. > > In particular, this variant has the following differences with existing > ones: > > * It includes a built-in interrupt controller with different registers, > using only a single platform IRQ > > * Internal interrupt sources are treated as edge-triggered, even though > the IRQ output is level-triggered. This chiefly affects the TX IRQ > path: the driver can no longer rely on the TX buffer empty IRQ > immediately firing after TX is enabled, but instead must prime the > FIFO with data directly. ... > + case TYPE_APPLE_S5L: > + WARN_ON(1); // No DMA Oh, no, please use the ONCE variant. ... > + /* Apple types use these bits for IRQ masks */ > + if (ourport->info->type != TYPE_APPLE_S5L) { > + ucon &= ~(S3C64XX_UCON_TIMEOUT_MASK | > + S3C64XX_UCON_EMPTYINT_EN | > + S3C64XX_UCON_DMASUS_EN | > + S3C64XX_UCON_TIMEOUT_EN); > + ucon |= 0xf << S3C64XX_UCON_TIMEOUT_SHIFT | Can you spell 0xf with named constant(s), please? In case they are repetitive via the code, introduce either a temporary variable (in case it scoped to one function only), or define it as a constant. > + S3C64XX_UCON_TIMEOUT_EN; > + } ... > +/* interrupt handler for Apple SoC's.*/ > +static irqreturn_t apple_serial_handle_irq(int irq, void *id) > +{ > + struct s3c24xx_uart_port *ourport = id; > + struct uart_port *port = &ourport->port; > + unsigned int pend = rd_regl(port, S3C2410_UTRSTAT); > + irqreturn_t ret = IRQ_NONE; Redundant. You may return directly. > + > + if (pend & (APPLE_S5L_UTRSTAT_RXTHRESH | APPLE_S5L_UTRSTAT_RXTO)) { > + wr_regl(port, S3C2410_UTRSTAT, > + APPLE_S5L_UTRSTAT_RXTHRESH | APPLE_S5L_UTRSTAT_RXTO); > + ret = s3c24xx_serial_rx_irq(irq, id); > + } > + if (pend & APPLE_S5L_UTRSTAT_TXTHRESH) { > + wr_regl(port, S3C2410_UTRSTAT, APPLE_S5L_UTRSTAT_TXTHRESH); > + ret = s3c24xx_serial_tx_irq(irq, id); > + } No IO serialization? > + return ret; > +} ... > +static void apple_s5l_serial_shutdown(struct uart_port *port) > +{ > + struct s3c24xx_uart_port *ourport = to_ourport(port); > + Extra blank line (check your entire series for a such) > + unsigned int ucon; > + ourport->tx_in_progress = 0; > +} ... > + ourport->rx_enabled = 1; > + ourport->tx_enabled = 0; How are these protected against race? ... > + case TYPE_APPLE_S5L: { > + unsigned int ucon; > + int ret; > + > + ret = clk_prepare_enable(ourport->clk); > + if (ret) { > + dev_err(dev, "clk_enable clk failed: %d\n", ret); > + return ret; > + } > + if (!IS_ERR(ourport->baudclk)) { > + ret = clk_prepare_enable(ourport->baudclk); > + if (ret) { > + dev_err(dev, "clk_enable baudclk failed: %d\n", ret); > + clk_disable_unprepare(ourport->clk); > + return ret; > + } > + } Wouldn't it be better to use CLK bulk API? > + ucon = rd_regl(port, S3C2410_UCON); > + > + ucon &= ~(APPLE_S5L_UCON_TXTHRESH_ENA_MSK | > + APPLE_S5L_UCON_RXTHRESH_ENA_MSK | > + APPLE_S5L_UCON_RXTO_ENA_MSK); > + > + if (ourport->tx_enabled) > + ucon |= APPLE_S5L_UCON_TXTHRESH_ENA_MSK; > + if (ourport->rx_enabled) > + ucon |= APPLE_S5L_UCON_RXTHRESH_ENA_MSK | > + APPLE_S5L_UCON_RXTO_ENA_MSK; > + > + wr_regl(port, S3C2410_UCON, ucon); > + > + if (!IS_ERR(ourport->baudclk)) > + clk_disable_unprepare(ourport->baudclk); > + clk_disable_unprepare(ourport->clk); > + break; > + } ... > +#ifdef CONFIG_ARCH_APPLE Why? Wouldn't you like the one kernel to work on many SoCs? > +static struct s3c24xx_serial_drv_data s5l_serial_drv_data = { > + .info = &(struct s3c24xx_uart_info) { > + .name = "Apple S5L UART", > + .type = TYPE_APPLE_S5L, > + .port_type = PORT_8250, > + .fifosize = 16, > + .rx_fifomask = S3C2410_UFSTAT_RXMASK, > + .rx_fifoshift = S3C2410_UFSTAT_RXSHIFT, > + .rx_fifofull = S3C2410_UFSTAT_RXFULL, > + .tx_fifofull = S3C2410_UFSTAT_TXFULL, > + .tx_fifomask = S3C2410_UFSTAT_TXMASK, > + .tx_fifoshift = S3C2410_UFSTAT_TXSHIFT, > + .def_clk_sel = S3C2410_UCON_CLKSEL0, > + .num_clks = 1, > + .clksel_mask = 0, > + .clksel_shift = 0, > + }, > + .def_cfg = &(struct s3c2410_uartcfg) { > + .ucon = APPLE_S5L_UCON_DEFAULT, > + .ufcon = S3C2410_UFCON_DEFAULT, > + }, > +}; > +#define S5L_SERIAL_DRV_DATA ((kernel_ulong_t)&s5l_serial_drv_data) > +#else > +#define S5L_SERIAL_DRV_DATA ((kernel_ulong_t)NULL) > +#endif ... > +#define APPLE_S5L_UCON_RXTO_ENA_MSK (1 << APPLE_S5L_UCON_RXTO_ENA) > +#define APPLE_S5L_UCON_RXTHRESH_ENA_MSK (1 << APPLE_S5L_UCON_RXTHRESH_ENA) > +#define APPLE_S5L_UCON_TXTHRESH_ENA_MSK (1 << APPLE_S5L_UCON_TXTHRESH_ENA) BIT() ? ... > +#define APPLE_S5L_UCON_DEFAULT (S3C2410_UCON_TXIRQMODE | \ > + S3C2410_UCON_RXIRQMODE | \ > + S3C2410_UCON_RXFIFO_TOI) Indentation level is too high. Hint: start a value of the definition on the new line. ... > +#define APPLE_S5L_UTRSTAT_RXTHRESH (1<<4) > +#define APPLE_S5L_UTRSTAT_TXTHRESH (1<<5) > +#define APPLE_S5L_UTRSTAT_RXTO (1<<9) > +#define APPLE_S5L_UTRSTAT_ALL_FLAGS (0x3f0) BIT() ?
On Fri, Mar 5, 2021 at 4:09 PM Andy Shevchenko <andy.shevchenko@gmail.com> wrote: > On Fri, Mar 5, 2021 at 12:25 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: > > > > > This documents the newly introduced ioremap_np() along with all the > > > other common ioremap() variants, and some higher-level abstractions > > > available. > > > > > > Signed-off-by: Hector Martin <marcan@marcan.st> > > > > I like this, I just want one change: > > > > Put the common ioremap() on top in all paragraphs, so the norm > > comes before the exceptions. > > > > I.e. it is weird to mention ioremap_np() before mentioning ioremap(). > > +1 here. That is what I have stumbled upon reading carefully. In that case, the order should probably be: ioremap ioremap_wc ioremap_wt ioremap_np ioremap_uc ioremap_cache Going from most common to least common, rather than going from strongest to weakest. Arnd
> From: Hector Martin <marcan@marcan.st> > Date: Fri, 5 Mar 2021 20:14:10 +0900 > > On 05/03/2021 20.03, Krzysztof Kozlowski wrote: > >> + memory@800000000 { > >> + device_type = "memory"; > >> + reg = <0x8 0 0x2 0>; /* To be filled by loader */ > > > > Shouldn't this be 0x800000000 with ~0x80000000 length (or whatever is > > more common)? Or did I miss some ranges? > > The base model has 8GB of RAM, and RAM always starts at 0x800000000, > hence that reg property. > > It's not actually useful to try to boot Linux like this, because it'll > step all over device carveouts on both ends and break, but since those > are potentially dynamic it doesn't really make sense to use a more > specific example for the dts. > > E.g. on my system, with my current firmware version, this ends up > getting patched to: > > reg = <0x8 0x0134c000 0x1 0xda294000> It may be better to handle the memory reserved by the firmware using a "/reserved-memory" node. I think the benefit of that could be that it communicates the entire range of physical memory to the kernel, which means it could use large mappings in the page tables. Unless the "/reserved-memory" node defines a region that has the "no-map" property of course. That doesn't really have an impact on this diff though. The firmware/bootloader would still have to modify the property on 16GB models.
On 06/03/2021 00.17, Andy Shevchenko wrote:
> Add a separate change that removes flags from the spin lock in the IRQ handler.
This commit should have no functional changes; I am just splitting an
existing function into two, where one takes the lock and the other does
the work. Do you mean using a different locking function? I'm not
entirely sure what you're suggesting.
On Fri, Mar 5, 2021 at 6:16 PM Hector Martin <marcan@marcan.st> wrote: > > On 06/03/2021 00.17, Andy Shevchenko wrote: > > Add a separate change that removes flags from the spin lock in the IRQ handler. > > This commit should have no functional changes; Exactly my point why I'm suggesting to have _separate_ change! > I am just splitting an > existing function into two, where one takes the lock and the other does > the work. Do you mean using a different locking function? I'm not > entirely sure what you're suggesting. Yes, as a prerequisite spin_lock_irqsave -> spin_lock().
On 06/03/2021 01.20, Andy Shevchenko wrote: >> I am just splitting an >> existing function into two, where one takes the lock and the other does >> the work. Do you mean using a different locking function? I'm not >> entirely sure what you're suggesting. > > Yes, as a prerequisite > > spin_lock_irqsave -> spin_lock(). Krzysztof, is this something you want in this series? I was trying to avoid logic changes to the non-Apple paths.
On 06/03/2021 00.59, Mark Kettenis wrote: > It may be better to handle the memory reserved by the firmware using a > "/reserved-memory" node. I think the benefit of that could be that it > communicates the entire range of physical memory to the kernel, which > means it could use large mappings in the page tables. Unless the > "/reserved-memory" node defines a region that has the "no-map" > property of course. We actually need no-map, because otherwise the CPU could speculate its way into these carveouts (it's not just firmware, there's stuff in here the CPU really can't be allowed to touch, e.g. the SEP carveout). It also breaks simplefb mapping the framebuffer. I thought of the reserved-memory approach, but then figured it wouldn't buy us anything for this reason.
On 06/03/2021 00.28, Andy Shevchenko wrote: >> + case TYPE_APPLE_S5L: >> + WARN_ON(1); // No DMA > > Oh, no, please use the ONCE variant. Thanks, changing this for v4. > > ... > >> + /* Apple types use these bits for IRQ masks */ >> + if (ourport->info->type != TYPE_APPLE_S5L) { >> + ucon &= ~(S3C64XX_UCON_TIMEOUT_MASK | >> + S3C64XX_UCON_EMPTYINT_EN | >> + S3C64XX_UCON_DMASUS_EN | >> + S3C64XX_UCON_TIMEOUT_EN); >> + ucon |= 0xf << S3C64XX_UCON_TIMEOUT_SHIFT | > > Can you spell 0xf with named constant(s), please? > > In case they are repetitive via the code, introduce either a temporary > variable (in case it scoped to one function only), or define it as a > constant. I'm just moving this code; as far as I can tell this is a timeout value (so just an integer), but I don't know if there is any special meaning to 0xf here. Note that this codepath is for *non-Apple* chips, as the Apple ones don't even have this field (at least not here). >> + irqreturn_t ret = IRQ_NONE; > > Redundant. You may return directly. What if both interrupts are pending? > No IO serialization? There is no DMA on the Apple variants (as far as I know; it's not implemented anyway), so there is no need for serializing IO with DMA. In any case, dealing with that is the DMA code's job, the interrupt handler shouldn't need to care. If you mean serializing IO with the IRQ: CPU-wise, I would hope that's the irqchip's job (AIC does this with a readl on the event). If you mean ensuring all writes are complete (i.e. posted write issue), on the Apple chips everything is non-posted as explained in the previous patches. > Extra blank line (check your entire series for a such) Thanks, noted. I'll check the declaration blocks in other patches. >> + ourport->rx_enabled = 1; >> + ourport->tx_enabled = 0; > > How are these protected against race? The serial core should be holding the port mutex for pretty much every call into the driver, as far as I can tell. > > ... > >> + case TYPE_APPLE_S5L: { >> + unsigned int ucon; >> + int ret; >> + >> + ret = clk_prepare_enable(ourport->clk); >> + if (ret) { >> + dev_err(dev, "clk_enable clk failed: %d\n", ret); >> + return ret; >> + } >> + if (!IS_ERR(ourport->baudclk)) { >> + ret = clk_prepare_enable(ourport->baudclk); >> + if (ret) { >> + dev_err(dev, "clk_enable baudclk failed: %d\n", ret); >> + clk_disable_unprepare(ourport->clk); >> + return ret; >> + } >> + } > > Wouldn't it be better to use CLK bulk API? Ah, I guess that could save a line or two of code here, even though it requires setting up the array. I'll give it a shot. >> +#ifdef CONFIG_ARCH_APPLE > > Why? Wouldn't you like the one kernel to work on many SoCs? This *adds* Apple support, it is not mutually exclusive with all the other SoCs. You can enable all of those options and get a driver that works on all of them. This is the same pattern used throughout the driver for all the other Samsung variants. There is no reason to have Apple SoC support in the samsung driver if the rest of the kernel doesn't have Apple SoC support either, of course. >> +#define APPLE_S5L_UCON_RXTO_ENA_MSK (1 << APPLE_S5L_UCON_RXTO_ENA) >> +#define APPLE_S5L_UCON_RXTHRESH_ENA_MSK (1 << APPLE_S5L_UCON_RXTHRESH_ENA) >> +#define APPLE_S5L_UCON_TXTHRESH_ENA_MSK (1 << APPLE_S5L_UCON_TXTHRESH_ENA) > > BIT() ? I'm trying to keep the style of the rest of the file here, which doesn't use BIT() anywhere. I agree this header could use some work though... I wonder if I've met my required quota of cleanups to this driver for this patchset ;-) >> +#define APPLE_S5L_UCON_DEFAULT (S3C2410_UCON_TXIRQMODE | \ >> + S3C2410_UCON_RXIRQMODE | \ >> + S3C2410_UCON_RXFIFO_TOI) > > Indentation level is too high. Hint: start a value of the definition > on the new line. Is it that bad? It's within 80 cols, putting one bit per line is more readable than several on one line, and this is how the rest of the header is written. Is it really better to do #define APPLE_S5L_UCON_DEFAULT \ (S3C2410_UCON_TXIRQMODE | S3C2410_UCON_RXIRQMODE | \ S3C2410_UCON_RXFIFO_TOI) or #define APPLE_S5L_UCON_DEFAULT \ (S3C2410_UCON_TXIRQMODE | \ S3C2410_UCON_RXIRQMODE | \ S3C2410_UCON_RXFIFO_TOI) here? Those don't look like an obvious improvement to me, I'd even say overlapping the bits and the macro name in the same columns makes it less readable to my eyes. >> +#define APPLE_S5L_UTRSTAT_RXTHRESH (1<<4) >> +#define APPLE_S5L_UTRSTAT_TXTHRESH (1<<5) >> +#define APPLE_S5L_UTRSTAT_RXTO (1<<9) >> +#define APPLE_S5L_UTRSTAT_ALL_FLAGS (0x3f0) > > BIT() ? See above.
On 05/03/2021 17:29, Hector Martin wrote: > On 06/03/2021 01.20, Andy Shevchenko wrote: >>> I am just splitting an >>> existing function into two, where one takes the lock and the other does >>> the work. Do you mean using a different locking function? I'm not >>> entirely sure what you're suggesting. >> >> Yes, as a prerequisite >> >> spin_lock_irqsave -> spin_lock(). > > Krzysztof, is this something you want in this series? I was trying to > avoid logic changes to the non-Apple paths. I don't quite get the need for such change (the code will be still called in interrupt handler, right?), but assuming the "why?" is properly documented, it can be a separate patch here. Best regards, Krzysztof
On 05/03/2021 18:04, Hector Martin wrote: > On 06/03/2021 00.28, Andy Shevchenko wrote: >>> + case TYPE_APPLE_S5L: >>> + WARN_ON(1); // No DMA >> >> Oh, no, please use the ONCE variant. > > Thanks, changing this for v4. > >> >> ... >> >>> + /* Apple types use these bits for IRQ masks */ >>> + if (ourport->info->type != TYPE_APPLE_S5L) { >>> + ucon &= ~(S3C64XX_UCON_TIMEOUT_MASK | >>> + S3C64XX_UCON_EMPTYINT_EN | >>> + S3C64XX_UCON_DMASUS_EN | >>> + S3C64XX_UCON_TIMEOUT_EN); >>> + ucon |= 0xf << S3C64XX_UCON_TIMEOUT_SHIFT | >> >> Can you spell 0xf with named constant(s), please? >> >> In case they are repetitive via the code, introduce either a temporary >> variable (in case it scoped to one function only), or define it as a >> constant. > > I'm just moving this code; as far as I can tell this is a timeout value > (so just an integer), but I don't know if there is any special meaning > to 0xf here. Note that this codepath is for *non-Apple* chips, as the > Apple ones don't even have this field (at least not here). I agree here with Hector. Andi, you propose here unrelated change (which without documentation might not be doable by Hector). > >>> + irqreturn_t ret = IRQ_NONE; >> >> Redundant. You may return directly. > > What if both interrupts are pending? > >> No IO serialization? > > There is no DMA on the Apple variants (as far as I know; it's not > implemented anyway), so there is no need for serializing IO with DMA. In > any case, dealing with that is the DMA code's job, the interrupt handler > shouldn't need to care. > > If you mean serializing IO with the IRQ: CPU-wise, I would hope that's > the irqchip's job (AIC does this with a readl on the event). If you mean > ensuring all writes are complete (i.e. posted write issue), on the Apple > chips everything is non-posted as explained in the previous patches. > >> Extra blank line (check your entire series for a such) > > Thanks, noted. I'll check the declaration blocks in other patches. > >>> + ourport->rx_enabled = 1; >>> + ourport->tx_enabled = 0; >> >> How are these protected against race? > > The serial core should be holding the port mutex for pretty much every > call into the driver, as far as I can tell. > >> >> ... >> >>> + case TYPE_APPLE_S5L: { >>> + unsigned int ucon; >>> + int ret; >>> + >>> + ret = clk_prepare_enable(ourport->clk); >>> + if (ret) { >>> + dev_err(dev, "clk_enable clk failed: %d\n", ret); >>> + return ret; >>> + } >>> + if (!IS_ERR(ourport->baudclk)) { >>> + ret = clk_prepare_enable(ourport->baudclk); >>> + if (ret) { >>> + dev_err(dev, "clk_enable baudclk failed: %d\n", ret); >>> + clk_disable_unprepare(ourport->clk); >>> + return ret; >>> + } >>> + } >> >> Wouldn't it be better to use CLK bulk API? > > Ah, I guess that could save a line or two of code here, even though it > requires setting up the array. I'll give it a shot. > >>> +#ifdef CONFIG_ARCH_APPLE >> >> Why? Wouldn't you like the one kernel to work on many SoCs? > > This *adds* Apple support, it is not mutually exclusive with all the > other SoCs. You can enable all of those options and get a driver that > works on all of them. This is the same pattern used throughout the > driver for all the other Samsung variants. There is no reason to have > Apple SoC support in the samsung driver if the rest of the kernel > doesn't have Apple SoC support either, of course. How ifdef on ARCH_APLLE makes it non-working on many SoCs? All new platforms are multi... The true question is - do the ifdefs in the code make it more difficult to read/review? Best regards, Krzysztof
On Sun, Mar 7, 2021 at 12:34 PM Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> wrote: > On 05/03/2021 17:29, Hector Martin wrote: > > On 06/03/2021 01.20, Andy Shevchenko wrote: > >>> I am just splitting an > >>> existing function into two, where one takes the lock and the other does > >>> the work. Do you mean using a different locking function? I'm not > >>> entirely sure what you're suggesting. > >> > >> Yes, as a prerequisite > >> > >> spin_lock_irqsave -> spin_lock(). > > > > Krzysztof, is this something you want in this series? I was trying to > > avoid logic changes to the non-Apple paths. > > I don't quite get the need for such change (the code will be still > called in interrupt handler, right?), but assuming the "why?" is > properly documented, it can be a separate patch here. This is only for readability: the common rule is to not disable interrupts when they are already disabled, so a reader might wonder if this instance of the handler is special in some case that it might be called with interrupts enabled. There is also a small overhead in accessing the global irq mask register on some architectures, but for a uart that does not make any difference of course. While I'm generally in favor of that kind of cleanup, I'd also prefer to leave it out of this series -- once you get into details like this the series gets harder to review. Arnd
On 07/03/2021 17:01, Arnd Bergmann wrote: > On Sun, Mar 7, 2021 at 12:34 PM Krzysztof Kozlowski > <krzysztof.kozlowski@canonical.com> wrote: >> On 05/03/2021 17:29, Hector Martin wrote: >>> On 06/03/2021 01.20, Andy Shevchenko wrote: >>>>> I am just splitting an >>>>> existing function into two, where one takes the lock and the other does >>>>> the work. Do you mean using a different locking function? I'm not >>>>> entirely sure what you're suggesting. >>>> >>>> Yes, as a prerequisite >>>> >>>> spin_lock_irqsave -> spin_lock(). >>> >>> Krzysztof, is this something you want in this series? I was trying to >>> avoid logic changes to the non-Apple paths. >> >> I don't quite get the need for such change (the code will be still >> called in interrupt handler, right?), but assuming the "why?" is >> properly documented, it can be a separate patch here. > > This is only for readability: the common rule is to not disable > interrupts when they are already disabled, so a reader might wonder > if this instance of the handler is special in some case that it might > be called with interrupts enabled. > > There is also a small overhead in accessing the global irq mask > register on some architectures, but for a uart that does not make > any difference of course. > > While I'm generally in favor of that kind of cleanup, I'd also > prefer to leave it out of this series -- once you get into details > like this the series gets harder to review. So it's only about the spinlock in the IRQ handler (which does not need to disable the IRQs). Makes sense but not related at all to the topic of bringing up Apple M1, therefore should not stop the review/merging. Best regards, Krzysztof
On Thu, 04 Mar 2021 21:38:42 +0000, Hector Martin <marcan@marcan.st> wrote: > > This allows the devicetree to correctly represent the available set of > timers, which varies from device to device, without the need for fake > dummy interrupts for unavailable slots. > > Also add the hyp-virt timer/PPI, which is not currently used, but worth > representing. > > Signed-off-by: Hector Martin <marcan@marcan.st> > Reviewed-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Thanks, M.
On Thu, 04 Mar 2021 21:38:43 +0000, Hector Martin <marcan@marcan.st> wrote: > > ARM64 currently defaults to posted MMIO (nGnRnE), but some devices > require the use of non-posted MMIO (nGnRE). Introduce a new ioremap() > variant to handle this case. ioremap_np() is aliased to ioremap() by > default on arches that do not implement this variant. > > sparc64 is the only architecture that needs to be touched directly, > because it includes neither of the generic io.h or iomap.h headers. > > This adds the IORESOURCE_MEM_NONPOSTED flag, which maps to this > variant and marks a given resource as requiring non-posted mappings. > This is implemented in the resource system because it is a SoC-level > requirement, so existing drivers do not need special-case code to pick > this ioremap variant. > > Then this is implemented in devres by introducing devm_ioremap_np(), > and making devm_ioremap_resource() automatically select this variant > when the resource has the IORESOURCE_MEM_NONPOSTED flag set. > > Signed-off-by: Hector Martin <marcan@marcan.st> Acked-by: Marc Zyngier <maz@kernel.org> M.
On Thu, 04 Mar 2021 21:38:46 +0000, Hector Martin <marcan@marcan.st> wrote: > > This is used on Apple ARM platforms, which require most MMIO > (except PCI devices) to be mapped as nGnRnE. > > Signed-off-by: Hector Martin <marcan@marcan.st> Acked-by: Marc Zyngier <maz@kernel.org> M.
On Thu, 04 Mar 2021 21:38:49 +0000, Hector Martin <marcan@marcan.st> wrote: > > These definitions are in arm-gic-v3.h for historical reasons which no > longer apply. Move them to sysreg.h so the AIC driver can use them, as > it needs to peek into vGIC registers to deal with the GIC maintentance > interrupt. > > Signed-off-by: Hector Martin <marcan@marcan.st> Acked-by: Marc Zyngier <maz@kernel.org> M.
On Fri, 05 Mar 2021 15:05:08 +0000, Andy Shevchenko <andy.shevchenko@gmail.com> wrote: [...] > > +#define TIMER_FIRING(x) \ > > + (((x) & (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_MASK | \ > > + ARCH_TIMER_CTRL_IT_STAT)) == \ > > + (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_STAT)) > > It's a bit hard to read. Perhaps > > #define FOO_MASK (_ENABLE | _STAT) > #define _FIRING ... (FOO_MASK | _MASK == FOO_MASK) The expression above is a direct translation of the architecture reference manual, and I'd rather not have that hidden behind a bunch of obscure macros. [...] > > + irqc->hw_domain = irq_domain_create_linear(of_node_to_fwnode(node), > > + irqc->nr_hw + AIC_NR_FIQ, > > + &aic_irq_domain_ops, irqc); > > If you are sure it will be always OF-only, why not to use > irq_domain_add_linear()? The OF-only API is deprecated, and there is no point in using it for *new* code, specially when things like IPI allocation require the use of the modern API. For arm64 root controllers, that's the way to go. Thanks, M.
On Mon, Mar 8, 2021 at 1:50 PM Marc Zyngier <maz@kernel.org> wrote: > On Fri, 05 Mar 2021 15:05:08 +0000, > Andy Shevchenko <andy.shevchenko@gmail.com> wrote: ... > > > +#define TIMER_FIRING(x) \ > > > + (((x) & (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_MASK | \ > > > + ARCH_TIMER_CTRL_IT_STAT)) == \ > > > + (ARCH_TIMER_CTRL_ENABLE | ARCH_TIMER_CTRL_IT_STAT)) > > > > It's a bit hard to read. Perhaps > > > > #define FOO_MASK (_ENABLE | _STAT) > > #define _FIRING ... (FOO_MASK | _MASK == FOO_MASK) > > The expression above is a direct translation of the architecture > reference manual, and I'd rather not have that hidden behind a bunch > of obscure macros. OK! ... > > > + irqc->hw_domain = irq_domain_create_linear(of_node_to_fwnode(node), > > > + irqc->nr_hw + AIC_NR_FIQ, > > > + &aic_irq_domain_ops, irqc); > > > > If you are sure it will be always OF-only, why not to use > > irq_domain_add_linear()? > > The OF-only API is deprecated, and there is no point in using it for > *new* code, specially when things like IPI allocation require the use > of the modern API. For arm64 root controllers, that's the way to go. Good to know, thanks!
On Thu, 04 Mar 2021 21:38:51 +0000, Hector Martin <marcan@marcan.st> wrote: > > This is the root interrupt controller used on Apple ARM SoCs such as the > M1. This irqchip driver performs multiple functions: > > * Handles both IRQs and FIQs > > * Drives the AIC peripheral itself (which handles IRQs) > > * Dispatches FIQs to downstream hard-wired clients (currently the ARM > timer). > > * Implements a virtual IPI multiplexer to funnel multiple Linux IPIs > into a single hardware IPI > [...] > Signed-off-by: Hector Martin <marcan@marcan.st> > +static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs) > +{ > + struct aic_irq_chip *ic = aic_irqc; > + u32 event, type, irq; > + > + do { > + /* > + * We cannot use a relaxed read here, as DMA needs to be > + * ordered with respect to the IRQ firing. > + */ > + event = readl(ic->base + AIC_EVENT); > + type = FIELD_GET(AIC_EVENT_TYPE, event); > + irq = FIELD_GET(AIC_EVENT_NUM, event); > + > + if (type == AIC_EVENT_TYPE_HW) > + handle_domain_irq(aic_irqc->hw_domain, irq, regs); > + else if (type == AIC_EVENT_TYPE_IPI && irq == 1) > + aic_handle_ipi(regs); > + else if (event != 0) > + pr_err("Unknown IRQ event %d, %d\n", type, irq); > + } while (event); > + > + /* > + * vGIC maintenance interrupts end up here too, so we need to check > + * for them separately. Just report and disable vGIC for now, until > + * we implement this properly. > + */ > + if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) && > + read_sysreg_s(SYS_ICH_MISR_EL2) != 0) { > + pr_err("vGIC IRQ fired, disabling.\n"); Please add a _ratelimited here. Whilst debugging KVM on this machine, I ended up with this firing at such a rate that it was impossible to do anything useful. Ratelimiting it allowed me to pinpoint the problem. [...] > +/* > + * FIQ irqchip > + */ > + > +static void aic_fiq_mask(struct irq_data *d) > +{ > + /* Only the guest timers have real mask bits, unfortunately. */ > + switch (d->hwirq) { > + case AIC_TMR_GUEST_PHYS: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, VM_TMR_FIQ_ENABLE_P, 0); > + break; > + case AIC_TMR_GUEST_VIRT: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, VM_TMR_FIQ_ENABLE_V, 0); > + break; > + } > +} > + > +static void aic_fiq_unmask(struct irq_data *d) > +{ > + switch (d->hwirq) { > + case AIC_TMR_GUEST_PHYS: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, 0, VM_TMR_FIQ_ENABLE_P); > + break; > + case AIC_TMR_GUEST_VIRT: > + sysreg_clear_set_s(SYS_APL_VM_TMR_FIQ_ENA_EL1, 0, VM_TMR_FIQ_ENABLE_V); > + break; > + } > +} > + > +static void aic_fiq_eoi(struct irq_data *d) > +{ > + /* We mask to ack (where we can), so we need to unmask at EOI. */ > + if (!irqd_irq_disabled(d) && !irqd_irq_masked(d)) Ah, be careful here: irqd_irq_masked() doesn't do what you think it does for per-CPU interrupts. It's been on my list to fix for the rVIC implementation, but I never got around to doing it, and all decent ICs hide this from SW by having a HW-managed mask, similar to what is on the IRQ side. I can see two possibilities: - you can track the masked state directly and use that instead of these predicates - you can just drop the masking altogether as this is only useful to a hosted hypervisor (KVM), which will have to do its own masking behind the scenes anyway > + aic_fiq_unmask(d); > +} > + The rest looks good to me. Thanks, M.
On Thu, 04 Mar 2021 21:38:52 +0000, Hector Martin <marcan@marcan.st> wrote: > > This adds a Kconfig option to toggle support for Apple ARM SoCs. > At this time this targets the M1 and later "Apple Silicon" Mac SoCs. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > arch/arm64/Kconfig.platforms | 8 ++++++++ > arch/arm64/configs/defconfig | 1 + > 2 files changed, 9 insertions(+) > > diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms > index cdfd5fed457f..c2b5791e3d69 100644 > --- a/arch/arm64/Kconfig.platforms > +++ b/arch/arm64/Kconfig.platforms > @@ -36,6 +36,14 @@ config ARCH_ALPINE > This enables support for the Annapurna Labs Alpine > Soc family. > > +config ARCH_APPLE > + bool "Apple Silicon SoC family" > + select APPLE_AIC > + select ARM64_FIQ_SUPPORT Do we still need this FIQ symbol? I though it was now gone... Thanks, M.
On 06/03/2021 00.51, Arnd Bergmann wrote: > On Fri, Mar 5, 2021 at 4:09 PM Andy Shevchenko > <andy.shevchenko@gmail.com> wrote: >> On Fri, Mar 5, 2021 at 12:25 PM Linus Walleij <linus.walleij@linaro.org> wrote: >>> On Thu, Mar 4, 2021 at 10:40 PM Hector Martin <marcan@marcan.st> wrote: >>> >>>> This documents the newly introduced ioremap_np() along with all the >>>> other common ioremap() variants, and some higher-level abstractions >>>> available. >>>> >>>> Signed-off-by: Hector Martin <marcan@marcan.st> >>> >>> I like this, I just want one change: >>> >>> Put the common ioremap() on top in all paragraphs, so the norm >>> comes before the exceptions. >>> >>> I.e. it is weird to mention ioremap_np() before mentioning ioremap(). >> >> +1 here. That is what I have stumbled upon reading carefully. > > In that case, the order should probably be: > > ioremap > ioremap_wc > ioremap_wt > ioremap_np > ioremap_uc > ioremap_cache > > Going from most common to least common, rather than going from > strongest to weakest. Yeah, I was dwelling on the issue of ioremap_np being first when I wrote that... this alternative works for me, I'll sort it like this then. It'll just need some re-wording to make it all flow properly.
On 09/03/2021 00.35, Marc Zyngier wrote: > On Thu, 04 Mar 2021 21:38:52 +0000, > Hector Martin <marcan@marcan.st> wrote: >> >> This adds a Kconfig option to toggle support for Apple ARM SoCs. >> At this time this targets the M1 and later "Apple Silicon" Mac SoCs. >> >> Signed-off-by: Hector Martin <marcan@marcan.st> >> --- >> arch/arm64/Kconfig.platforms | 8 ++++++++ >> arch/arm64/configs/defconfig | 1 + >> 2 files changed, 9 insertions(+) >> >> diff --git a/arch/arm64/Kconfig.platforms b/arch/arm64/Kconfig.platforms >> index cdfd5fed457f..c2b5791e3d69 100644 >> --- a/arch/arm64/Kconfig.platforms >> +++ b/arch/arm64/Kconfig.platforms >> @@ -36,6 +36,14 @@ config ARCH_ALPINE >> This enables support for the Annapurna Labs Alpine >> Soc family. >> >> +config ARCH_APPLE >> + bool "Apple Silicon SoC family" >> + select APPLE_AIC >> + select ARM64_FIQ_SUPPORT > > Do we still need this FIQ symbol? I though it was now gone... Whoops! Thanks for the catch, this can go away.
On Thu, Mar 4, 2021 at 10:42 PM Hector Martin <marcan@marcan.st> wrote: > Earlycon support is identical to S3C2410, but Apple SoCs also need > MMIO mapped as nGnRnE. This is handled generically for normal drivers > including the normal UART path here, but earlycon uses fixmap and > runs before that scaffolding is ready. > > Since this is the only case where we need this fix, it makes more > sense to do it here in the UART driver instead of introducing a > whole fdt nonposted-mmio resolver just for earlycon/fixmap. > > Suggested-by: Arnd Bergmann <arnd@arndb.de> > Signed-off-by: Hector Martin <marcan@marcan.st> This is as elegant as it gets! Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Yours, Linus Walleij
On Fri, Mar 05, 2021 at 06:38:36AM +0900, Hector Martin wrote: > From: Marc Zyngier <maz@kernel.org> > > It seems that the CPU known as Apple M1 has the terrible habit > of being stuck with HCR_EL2.E2H==1, in violation of the architecture. > > Try and work around this deplorable state of affairs by detecting > the stuck bit early and short-circuit the nVHE dance. It is still > unknown whether there are many more such nuggets to be found... > > Reported-by: Hector Martin <marcan@marcan.st> > Signed-off-by: Marc Zyngier <maz@kernel.org> > --- > arch/arm64/kernel/head.S | 33 ++++++++++++++++++++++++++++++--- > arch/arm64/kernel/hyp-stub.S | 28 ++++++++++++++++++++++++---- > 2 files changed, 54 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 66b0e0b66e31..673002b11865 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -477,14 +477,13 @@ EXPORT_SYMBOL(kimage_vaddr) > * booted in EL1 or EL2 respectively. > */ > SYM_FUNC_START(init_kernel_el) > - mov_q x0, INIT_SCTLR_EL1_MMU_OFF > - msr sctlr_el1, x0 > - > mrs x0, CurrentEL > cmp x0, #CurrentEL_EL2 > b.eq init_el2 > > SYM_INNER_LABEL(init_el1, SYM_L_LOCAL) > + mov_q x0, INIT_SCTLR_EL1_MMU_OFF > + msr sctlr_el1, x0 > isb > mov_q x0, INIT_PSTATE_EL1 > msr spsr_el1, x0 > @@ -504,6 +503,34 @@ SYM_INNER_LABEL(init_el2, SYM_L_LOCAL) > msr vbar_el2, x0 > isb > > + /* > + * Fruity CPUs seem to have HCR_EL2.E2H set to RES1, > + * making it impossible to start in nVHE mode. Is that > + * compliant with the architecture? Absolutely not! > + */ > + mrs x0, hcr_el2 > + and x0, x0, #HCR_E2H > + cbz x0, 1f > + > + /* Switching to VHE requires a sane SCTLR_EL1 as a start */ > + mov_q x0, INIT_SCTLR_EL1_MMU_OFF > + msr_s SYS_SCTLR_EL12, x0 > + > + /* > + * Force an eret into a helper "function", and let it return > + * to our original caller... This makes sure that we have > + * initialised the basic PSTATE state. > + */ > + mov x0, #INIT_PSTATE_EL2 > + msr spsr_el1, x0 > + adr_l x0, stick_to_vhe > + msr elr_el1, x0 > + eret > + > +1: > + mov_q x0, INIT_SCTLR_EL1_MMU_OFF > + msr sctlr_el1, x0 > + > msr elr_el2, lr > mov w0, #BOOT_CPU_MODE_EL2 > eret > diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S > index 5eccbd62fec8..c7601030ee82 100644 > --- a/arch/arm64/kernel/hyp-stub.S > +++ b/arch/arm64/kernel/hyp-stub.S > @@ -27,12 +27,12 @@ SYM_CODE_START(__hyp_stub_vectors) > ventry el2_fiq_invalid // FIQ EL2t > ventry el2_error_invalid // Error EL2t > > - ventry el2_sync_invalid // Synchronous EL2h > + ventry elx_sync // Synchronous EL2h > ventry el2_irq_invalid // IRQ EL2h > ventry el2_fiq_invalid // FIQ EL2h > ventry el2_error_invalid // Error EL2h > > - ventry el1_sync // Synchronous 64-bit EL1 > + ventry elx_sync // Synchronous 64-bit EL1 > ventry el1_irq_invalid // IRQ 64-bit EL1 > ventry el1_fiq_invalid // FIQ 64-bit EL1 > ventry el1_error_invalid // Error 64-bit EL1 > @@ -45,7 +45,7 @@ SYM_CODE_END(__hyp_stub_vectors) > > .align 11 > > -SYM_CODE_START_LOCAL(el1_sync) > +SYM_CODE_START_LOCAL(elx_sync) > cmp x0, #HVC_SET_VECTORS > b.ne 1f > msr vbar_el2, x1 > @@ -71,7 +71,7 @@ SYM_CODE_START_LOCAL(el1_sync) > > 9: mov x0, xzr > eret > -SYM_CODE_END(el1_sync) > +SYM_CODE_END(elx_sync) > > // nVHE? No way! Give me the real thing! > SYM_CODE_START_LOCAL(mutate_to_vhe) > @@ -243,3 +243,23 @@ SYM_FUNC_START(switch_to_vhe) > #endif > ret > SYM_FUNC_END(switch_to_vhe) > + > +SYM_FUNC_START(stick_to_vhe) > + /* > + * Make sure the switch to VHE cannot fail, by overriding the > + * override. This is hilarious. > + */ > + adr_l x1, id_aa64mmfr1_override > + add x1, x1, #FTR_OVR_MASK_OFFSET > + dc civac, x1 > + dsb sy > + isb Why do we need an ISB here? > + ldr x0, [x1] > + bic x0, x0, #(0xf << ID_AA64MMFR1_VHE_SHIFT) > + str x0, [x1] I find it a bit bizarre doing this here, as for the primary CPU we're still a way away from parsing the early paramaters and for secondary CPUs this doesn't need to be done for each one. Furthermore, this same code is run on the resume path, which can probably then race with itself. Is it possible to do it later on the boot CPU only, e.g. in init_feature_override()? We should probably also log a warning that we're ignoring the option because nVHE is not available. Will
On Fri, Mar 05, 2021 at 06:38:43AM +0900, Hector Martin wrote: > ARM64 currently defaults to posted MMIO (nGnRnE), but some devices > require the use of non-posted MMIO (nGnRE). Introduce a new ioremap() > variant to handle this case. ioremap_np() is aliased to ioremap() by > default on arches that do not implement this variant. > > sparc64 is the only architecture that needs to be touched directly, > because it includes neither of the generic io.h or iomap.h headers. > > This adds the IORESOURCE_MEM_NONPOSTED flag, which maps to this > variant and marks a given resource as requiring non-posted mappings. > This is implemented in the resource system because it is a SoC-level > requirement, so existing drivers do not need special-case code to pick > this ioremap variant. > > Then this is implemented in devres by introducing devm_ioremap_np(), > and making devm_ioremap_resource() automatically select this variant > when the resource has the IORESOURCE_MEM_NONPOSTED flag set. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > .../driver-api/driver-model/devres.rst | 1 + > arch/sparc/include/asm/io_64.h | 4 ++++ > include/asm-generic/io.h | 22 ++++++++++++++++++- > include/asm-generic/iomap.h | 9 ++++++++ > include/linux/io.h | 2 ++ > include/linux/ioport.h | 1 + > lib/devres.c | 22 +++++++++++++++++++ > 7 files changed, 60 insertions(+), 1 deletion(-) > > diff --git a/Documentation/driver-api/driver-model/devres.rst b/Documentation/driver-api/driver-model/devres.rst > index cd8b6e657b94..2f45877a539d 100644 > --- a/Documentation/driver-api/driver-model/devres.rst > +++ b/Documentation/driver-api/driver-model/devres.rst > @@ -309,6 +309,7 @@ IOMAP > devm_ioremap() > devm_ioremap_uc() > devm_ioremap_wc() > + devm_ioremap_np() > devm_ioremap_resource() : checks resource, requests memory region, ioremaps > devm_ioremap_resource_wc() > devm_platform_ioremap_resource() : calls devm_ioremap_resource() for platform device > diff --git a/arch/sparc/include/asm/io_64.h b/arch/sparc/include/asm/io_64.h > index 9bb27e5c22f1..9fbfc9574432 100644 > --- a/arch/sparc/include/asm/io_64.h > +++ b/arch/sparc/include/asm/io_64.h > @@ -409,6 +409,10 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size) > #define ioremap_uc(X,Y) ioremap((X),(Y)) > #define ioremap_wc(X,Y) ioremap((X),(Y)) > #define ioremap_wt(X,Y) ioremap((X),(Y)) > +static inline void __iomem *ioremap_np(unsigned long offset, unsigned long size) > +{ > + return NULL; > +} > > static inline void iounmap(volatile void __iomem *addr) > { > diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h > index c6af40ce03be..082e0c96db6e 100644 > --- a/include/asm-generic/io.h > +++ b/include/asm-generic/io.h > @@ -942,7 +942,9 @@ static inline void *phys_to_virt(unsigned long address) > * > * ioremap_wc() and ioremap_wt() can provide more relaxed caching attributes > * for specific drivers if the architecture choses to implement them. If they > - * are not implemented we fall back to plain ioremap. > + * are not implemented we fall back to plain ioremap. Conversely, ioremap_np() > + * can provide stricter non-posted write semantics if the architecture > + * implements them. > */ > #ifndef CONFIG_MMU > #ifndef ioremap > @@ -993,6 +995,24 @@ static inline void __iomem *ioremap_uc(phys_addr_t offset, size_t size) > { > return NULL; > } > + > +/* > + * ioremap_np needs an explicit architecture implementation, as it > + * requests stronger semantics than regular ioremap(). Portable drivers > + * should instead use one of the higher-level abstractions, like > + * devm_ioremap_resource(), to choose the correct variant for any given > + * device and bus. Portable drivers with a good reason to want non-posted > + * write semantics should always provide an ioremap() fallback in case > + * ioremap_np() is not available. > + */ > +#ifndef ioremap_np > +#define ioremap_np ioremap_np > +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) > +{ > + return NULL; > +} > +#endif Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np() if it is supported by the architecture? That way, we could avoid defining both on arm64. Will
On Fri, Mar 05, 2021 at 06:38:40AM +0900, Hector Martin wrote: > The implementor will be used to condition the FIQ support quirk. > > The specific CPU types are not used at the moment, but let's add them > for documentation purposes. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > arch/arm64/include/asm/cputype.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h > index ef5b040dee44..6231e1f0abe7 100644 > --- a/arch/arm64/include/asm/cputype.h > +++ b/arch/arm64/include/asm/cputype.h > @@ -59,6 +59,7 @@ > #define ARM_CPU_IMP_NVIDIA 0x4E > #define ARM_CPU_IMP_FUJITSU 0x46 > #define ARM_CPU_IMP_HISI 0x48 > +#define ARM_CPU_IMP_APPLE 0x61 > > #define ARM_CPU_PART_AEM_V8 0xD0F > #define ARM_CPU_PART_FOUNDATION 0xD00 > @@ -99,6 +100,9 @@ > > #define HISI_CPU_PART_TSV110 0xD01 > > +#define APPLE_CPU_PART_M1_ICESTORM 0x022 > +#define APPLE_CPU_PART_M1_FIRESTORM 0x023 > + > #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53) > #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57) > #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72) > @@ -127,6 +131,8 @@ > #define MIDR_NVIDIA_CARMEL MIDR_CPU_MODEL(ARM_CPU_IMP_NVIDIA, NVIDIA_CPU_PART_CARMEL) > #define MIDR_FUJITSU_A64FX MIDR_CPU_MODEL(ARM_CPU_IMP_FUJITSU, FUJITSU_CPU_PART_A64FX) > #define MIDR_HISI_TSV110 MIDR_CPU_MODEL(ARM_CPU_IMP_HISI, HISI_CPU_PART_TSV110) > +#define MIDR_APPLE_M1_ICESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_ICESTORM) > +#define MIDR_APPLE_M1_FIRESTORM MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M1_FIRESTORM) We usually only merge these when they're needed, but this SoC seems broken enough that I can see the value in having them from the start :( Acked-by: Will Deacon <will@kernel.org> Will
On Fri, Mar 05, 2021 at 06:38:46AM +0900, Hector Martin wrote: > This is used on Apple ARM platforms, which require most MMIO > (except PCI devices) to be mapped as nGnRnE. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > arch/arm64/include/asm/io.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h > index 5ea8656a2030..953b8703af60 100644 > --- a/arch/arm64/include/asm/io.h > +++ b/arch/arm64/include/asm/io.h > @@ -169,6 +169,7 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size); > > #define ioremap(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE)) > #define ioremap_wc(addr, size) __ioremap((addr), (size), __pgprot(PROT_NORMAL_NC)) > +#define ioremap_np(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRnE)) Probably worth noting that whether or not this actually results in a non-posted mapping depends on the system architecture, but this is the best we can do, so: Acked-by: Will Deacon <will@kernel.org> I would personally prefer that drivers didn't have to care about this, and ioremap on arm64 figured out the right attributes based on the region being mapped, but I haven't followed the discussion closely so I won't die on that hill. Will
On Fri, Mar 05, 2021 at 06:38:49AM +0900, Hector Martin wrote: > These definitions are in arm-gic-v3.h for historical reasons which no > longer apply. Move them to sysreg.h so the AIC driver can use them, as > it needs to peek into vGIC registers to deal with the GIC maintentance > interrupt. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > arch/arm64/include/asm/sysreg.h | 60 ++++++++++++++++++++++++++++++ > include/linux/irqchip/arm-gic-v3.h | 56 ---------------------------- > 2 files changed, 60 insertions(+), 56 deletions(-) This would be much easier to remove if you just moved the definitions, rather than reordered than and changed the comments at the same time! But I *think* nothing had changed, so: Acked-by: Will Deacon <will@kernel.org> Will
On Fri, Mar 05, 2021 at 06:38:48AM +0900, Hector Martin wrote: > Apple ARM64 SoCs have a ton of vendor-specific registers we're going to > have to deal with, and those don't really belong in sysreg.h with all > the architectural registers. Make a new home for them, and add some > registers which are useful for early bring-up. > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > MAINTAINERS | 1 + > arch/arm64/include/asm/sysreg_apple.h | 69 +++++++++++++++++++++++++++ > 2 files changed, 70 insertions(+) > create mode 100644 arch/arm64/include/asm/sysreg_apple.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index aec14fbd61b8..3a352c687d4b 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1646,6 +1646,7 @@ B: https://github.com/AsahiLinux/linux/issues > C: irc://chat.freenode.net/asahi-dev > T: git https://github.com/AsahiLinux/linux.git > F: Documentation/devicetree/bindings/arm/apple.yaml > +F: arch/arm64/include/asm/sysreg_apple.h (this isn't needed with my suggestion below). > ARM/ARTPEC MACHINE SUPPORT > M: Jesper Nilsson <jesper.nilsson@axis.com> > diff --git a/arch/arm64/include/asm/sysreg_apple.h b/arch/arm64/include/asm/sysreg_apple.h > new file mode 100644 > index 000000000000..48347a51d564 > --- /dev/null > +++ b/arch/arm64/include/asm/sysreg_apple.h I doubt apple are the only folks doing this, so can we instead have sysreg-impdef.h please, and then have an Apple section in there for these registers? That way, we could also have an imp_sys_reg() macro to limit CRn to 11 or 15, which is the reserved encoding space for these registers. We'll cc you for any patches touching the Apple parts, as we don't have the first clue about what's hiding in there. > @@ -0,0 +1,69 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Apple SoC vendor-defined system register definitions > + * > + * Copyright The Asahi Linux Contributors > + > + * This file contains only well-understood registers that are useful to > + * Linux. If you are looking for things to add here, you should visit: > + * > + * https://github.com/AsahiLinux/docs/wiki/HW:ARM-System-Registers > + */ > + > +#ifndef __ASM_SYSREG_APPLE_H > +#define __ASM_SYSREG_APPLE_H > + > +#include <asm/sysreg.h> > +#include <linux/bits.h> > +#include <linux/bitfield.h> > + > +/* > + * Keep these registers in encoding order, except for register arrays; > + * those should be listed in array order starting from the position of > + * the encoding of the first register. > + */ > + > +#define SYS_APL_PMCR0_EL1 sys_reg(3, 1, 15, 0, 0) > +#define PMCR0_IMODE GENMASK(10, 8) > +#define PMCR0_IMODE_OFF 0 > +#define PMCR0_IMODE_PMI 1 > +#define PMCR0_IMODE_AIC 2 > +#define PMCR0_IMODE_HALT 3 > +#define PMCR0_IMODE_FIQ 4 > +#define PMCR0_IACT BIT(11) The Arm ARM says this about imp-def sysregs: | The Arm architecture guarantees not to define any register name | prefixed with IMP_ as part of the standard Arm architecture. | | Note | Arm strongly recommends that any register names created in the | IMPLEMENTATION DEFINED register spaces be prefixed with IMP_ and | postfixed with _ELx, where appropriate. and it seems like we could follow that here without much effort, if you don't mind. Will
On Wed, Mar 24, 2021 at 06:38:18PM +0000, Will Deacon wrote: > On Fri, Mar 05, 2021 at 06:38:48AM +0900, Hector Martin wrote: > > Apple ARM64 SoCs have a ton of vendor-specific registers we're going to > > have to deal with, and those don't really belong in sysreg.h with all > > the architectural registers. Make a new home for them, and add some > > registers which are useful for early bring-up. > > > > Signed-off-by: Hector Martin <marcan@marcan.st> > > --- > > MAINTAINERS | 1 + > > arch/arm64/include/asm/sysreg_apple.h | 69 +++++++++++++++++++++++++++ > > 2 files changed, 70 insertions(+) > > create mode 100644 arch/arm64/include/asm/sysreg_apple.h > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index aec14fbd61b8..3a352c687d4b 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -1646,6 +1646,7 @@ B: https://github.com/AsahiLinux/linux/issues > > C: irc://chat.freenode.net/asahi-dev > > T: git https://github.com/AsahiLinux/linux.git > > F: Documentation/devicetree/bindings/arm/apple.yaml > > +F: arch/arm64/include/asm/sysreg_apple.h > > (this isn't needed with my suggestion below). > > > ARM/ARTPEC MACHINE SUPPORT > > M: Jesper Nilsson <jesper.nilsson@axis.com> > > diff --git a/arch/arm64/include/asm/sysreg_apple.h b/arch/arm64/include/asm/sysreg_apple.h > > new file mode 100644 > > index 000000000000..48347a51d564 > > --- /dev/null > > +++ b/arch/arm64/include/asm/sysreg_apple.h > > I doubt apple are the only folks doing this, so can we instead have > sysreg-impdef.h please, and then have an Apple section in there for these > registers? That way, we could also have an imp_sys_reg() macro to limit > CRn to 11 or 15, which is the reserved encoding space for these registers. > > We'll cc you for any patches touching the Apple parts, as we don't have > the first clue about what's hiding in there. For existing IMP-DEF sysregs (e.g. the Kryo L2 control registers), we've put the definitions in the drivers, rather than collating non-architectural bits under arch/arm64/. So far we've kept arch/arm64/ largely devoid of IMP-DEF bits, and it seems a shame to add something with the sole purpose of collating that, especially given arch code shouldn't need to touch these if FW and bootloader have done their jobs right. Can we put the definitions in the relevant drivers? That would sidestep any pain with MAINTAINERS, too. Thanks, Mark.
On Wed, Mar 24, 2021 at 06:59:21PM +0000, Mark Rutland wrote: > On Wed, Mar 24, 2021 at 06:38:18PM +0000, Will Deacon wrote: > > On Fri, Mar 05, 2021 at 06:38:48AM +0900, Hector Martin wrote: > > > Apple ARM64 SoCs have a ton of vendor-specific registers we're going to > > > have to deal with, and those don't really belong in sysreg.h with all > > > the architectural registers. Make a new home for them, and add some > > > registers which are useful for early bring-up. > > > > > > Signed-off-by: Hector Martin <marcan@marcan.st> > > > --- > > > MAINTAINERS | 1 + > > > arch/arm64/include/asm/sysreg_apple.h | 69 +++++++++++++++++++++++++++ > > > 2 files changed, 70 insertions(+) > > > create mode 100644 arch/arm64/include/asm/sysreg_apple.h > > > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index aec14fbd61b8..3a352c687d4b 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -1646,6 +1646,7 @@ B: https://github.com/AsahiLinux/linux/issues > > > C: irc://chat.freenode.net/asahi-dev > > > T: git https://github.com/AsahiLinux/linux.git > > > F: Documentation/devicetree/bindings/arm/apple.yaml > > > +F: arch/arm64/include/asm/sysreg_apple.h > > > > (this isn't needed with my suggestion below). > > > > > ARM/ARTPEC MACHINE SUPPORT > > > M: Jesper Nilsson <jesper.nilsson@axis.com> > > > diff --git a/arch/arm64/include/asm/sysreg_apple.h b/arch/arm64/include/asm/sysreg_apple.h > > > new file mode 100644 > > > index 000000000000..48347a51d564 > > > --- /dev/null > > > +++ b/arch/arm64/include/asm/sysreg_apple.h > > > > I doubt apple are the only folks doing this, so can we instead have > > sysreg-impdef.h please, and then have an Apple section in there for these > > registers? That way, we could also have an imp_sys_reg() macro to limit > > CRn to 11 or 15, which is the reserved encoding space for these registers. > > > > We'll cc you for any patches touching the Apple parts, as we don't have > > the first clue about what's hiding in there. > > For existing IMP-DEF sysregs (e.g. the Kryo L2 control registers), we've > put the definitions in the drivers, rather than collating > non-architectural bits under arch/arm64/. Yeah, but we could include those here as well. > So far we've kept arch/arm64/ largely devoid of IMP-DEF bits, and it > seems a shame to add something with the sole purpose of collating that, > especially given arch code shouldn't need to touch these if FW and > bootloader have done their jobs right. > > Can we put the definitions in the relevant drivers? That would sidestep > any pain with MAINTAINERS, too. If we can genuinely ignore these in arch code, then sure. I just don't know how long that is going to be the case, and ending up in a situation where these are scattered randomly throughout the tree sounds horrible to me. Will
On Wed, Mar 24, 2021 at 7:12 PM Will Deacon <will@kernel.org> wrote: > > > +/* > > + * ioremap_np needs an explicit architecture implementation, as it > > + * requests stronger semantics than regular ioremap(). Portable drivers > > + * should instead use one of the higher-level abstractions, like > > + * devm_ioremap_resource(), to choose the correct variant for any given > > + * device and bus. Portable drivers with a good reason to want non-posted > > + * write semantics should always provide an ioremap() fallback in case > > + * ioremap_np() is not available. > > + */ > > +#ifndef ioremap_np > > +#define ioremap_np ioremap_np > > +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) > > +{ > > + return NULL; > > +} > > +#endif > > Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np() > if it is supported by the architecture? That way, we could avoid defining > both on arm64. Good idea. It needs a fallback in case the ioremap_np() fails on most architectures, but that sounds easy enough. Since pci_remap_cfgspace() only has custom implementations, it sounds like we can actually make the generic implementation unconditional in the end, but that requires adding ioremap_np() on 32-bit as well, and I would keep that separate from this series. Arnd
Hi Hector, Sorry it took me so long to get to this. Some comments below. On Fri, Mar 05, 2021 at 06:38:51AM +0900, Hector Martin wrote: > This is the root interrupt controller used on Apple ARM SoCs such as the > M1. This irqchip driver performs multiple functions: > > * Handles both IRQs and FIQs > > * Drives the AIC peripheral itself (which handles IRQs) > > * Dispatches FIQs to downstream hard-wired clients (currently the ARM > timer). > > * Implements a virtual IPI multiplexer to funnel multiple Linux IPIs > into a single hardware IPI > > Signed-off-by: Hector Martin <marcan@marcan.st> > --- > MAINTAINERS | 2 + > drivers/irqchip/Kconfig | 8 + > drivers/irqchip/Makefile | 1 + > drivers/irqchip/irq-apple-aic.c | 710 ++++++++++++++++++++++++++++++++ > include/linux/cpuhotplug.h | 1 + > 5 files changed, 722 insertions(+) > create mode 100644 drivers/irqchip/irq-apple-aic.c [...] > + * Implementation notes: > + * > + * - This driver creates two IRQ domains, one for HW IRQs and internal FIQs, > + * and one for IPIs. > + * - Since Linux needs more than 2 IPIs, we implement a software IRQ controller > + * and funnel all IPIs into one per-CPU IPI (the second "self" IPI is unused). > + * - FIQ hwirq numbers are assigned after true hwirqs, and are per-cpu. > + * - DT bindings use 3-cell form (like GIC): > + * - <0 nr flags> - hwirq #nr > + * - <1 nr flags> - FIQ #nr > + * - nr=0 Physical HV timer > + * - nr=1 Virtual HV timer > + * - nr=2 Physical guest timer > + * - nr=3 Virtual guest timer > + * > + */ > + > +#define pr_fmt(fmt) "%s: " fmt, __func__ General nit: but I suspect many of the prints in here probably want to be using the *_ratelimited variants. > +static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs) > +{ > + struct aic_irq_chip *ic = aic_irqc; > + u32 event, type, irq; > + > + do { > + /* > + * We cannot use a relaxed read here, as DMA needs to be > + * ordered with respect to the IRQ firing. > + */ I think this could be a bit clearer: the readl() doesn't order any DMA accesses, but instead means that subsequent reads by the CPU are ordered (which may be from a buffer which was DMA'd to) are ordered after the read of the MMIO register. > + event = readl(ic->base + AIC_EVENT); > + type = FIELD_GET(AIC_EVENT_TYPE, event); > + irq = FIELD_GET(AIC_EVENT_NUM, event); > + > + if (type == AIC_EVENT_TYPE_HW) > + handle_domain_irq(aic_irqc->hw_domain, irq, regs); > + else if (type == AIC_EVENT_TYPE_IPI && irq == 1) > + aic_handle_ipi(regs); > + else if (event != 0) > + pr_err("Unknown IRQ event %d, %d\n", type, irq); > + } while (event); > + > + /* > + * vGIC maintenance interrupts end up here too, so we need to check > + * for them separately. Just report and disable vGIC for now, until > + * we implement this properly. > + */ > + if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) && > + read_sysreg_s(SYS_ICH_MISR_EL2) != 0) { > + pr_err("vGIC IRQ fired, disabling.\n"); > + sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0); > + } What prevents all these system register accesses being speculated up before the handler? > +} > + > +static int aic_irq_set_affinity(struct irq_data *d, > + const struct cpumask *mask_val, bool force) > +{ > + irq_hw_number_t hwirq = irqd_to_hwirq(d); > + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > + int cpu; > + > + if (hwirq > ic->nr_hw) > + return -EINVAL; > + > + if (force) > + cpu = cpumask_first(mask_val); > + else > + cpu = cpumask_any_and(mask_val, cpu_online_mask); > + > + aic_ic_write(ic, AIC_TARGET_CPU + hwirq * 4, BIT(cpu)); > + irq_data_update_effective_affinity(d, cpumask_of(cpu)); > + > + return IRQ_SET_MASK_OK; > +} > + > +static int aic_irq_set_type(struct irq_data *d, unsigned int type) > +{ > + return (type == IRQ_TYPE_LEVEL_HIGH) ? 0 : -EINVAL; > +} > + > +static struct irq_chip aic_chip = { > + .name = "AIC", > + .irq_mask = aic_irq_mask, > + .irq_unmask = aic_irq_unmask, I know these are driven by the higher-level irq chip code, but I'm a bit confused as to what provides ordering if, e.g. something ends up calling: aic_chip.irq_mask(d); ... aic_chip.irq_unmask(d); I can't see any ISBs in here and they're writing to two different registers, so can we end up with the IRQ masked after this sequence? > +/* > + * IPI irqchip > + */ > + > +static void aic_ipi_mask(struct irq_data *d) > +{ > + u32 irq_bit = BIT(irqd_to_hwirq(d)); > + int this_cpu = smp_processor_id(); > + > + /* No specific ordering requirements needed here. */ > + atomic_andnot(irq_bit, &aic_vipi_enable[this_cpu]); > +} Why not use a per-cpu variable here instead of an array of atomics? The pcpu API has things for atomic updates (e.g. or, and, xchg). > +static void aic_ipi_unmask(struct irq_data *d) > +{ > + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > + u32 irq_bit = BIT(irqd_to_hwirq(d)); > + int this_cpu = smp_processor_id(); > + > + /* > + * This must complete before the atomic_read_acquire() below to avoid > + * racing aic_ipi_send_mask(). Use a dummy fetch op with release > + * semantics for this. This is arch-specific: ARMv8 B2.3.3 specifies > + * that writes with Release semantics are Barrier-ordered-before reads > + * with Acquire semantics, even though the Linux arch-independent > + * definition of these atomic ops does not. > + */ I think a more idiomatic (and portable) way to do this would be to use the relaxed accessors, but with smp_mb__after_atomic() between them. Do you have a good reason for _not_ doing it like that? > + (void)atomic_fetch_or_release(irq_bit, &aic_vipi_enable[this_cpu]); > + > + /* > + * If a pending vIPI was unmasked, raise a HW IPI to ourselves. > + * No barriers needed here since this is a self-IPI. > + */ > + if (atomic_read_acquire(&aic_vipi_flag[this_cpu]) & irq_bit) "No barriers needed here" right before an acquire is confusing ;) > + aic_ic_write(ic, AIC_IPI_SEND, AIC_IPI_SEND_CPU(this_cpu)); > +} > + > +static void aic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask) > +{ > + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > + u32 irq_bit = BIT(irqd_to_hwirq(d)); > + u32 send = 0; > + int cpu; > + unsigned long pending; > + > + for_each_cpu(cpu, mask) { > + /* > + * This sequence is the mirror of the one in aic_ipi_unmask(); > + * see the comment there. Additionally, release semantics > + * ensure that the vIPI flag set is ordered after any shared > + * memory accesses that precede it. This therefore also pairs > + * with the atomic_fetch_andnot in aic_handle_ipi(). > + */ > + pending = atomic_fetch_or_release(irq_bit, &aic_vipi_flag[cpu]); (same here) > + if (!(pending & irq_bit) && (atomic_read_acquire(&aic_vipi_enable[cpu]) & irq_bit)) > + send |= AIC_IPI_SEND_CPU(cpu); > + } > + > + /* > + * The flag writes must complete before the physical IPI is issued > + * to another CPU. This is implied by the control dependency on > + * the result of atomic_read_acquire() above, which is itself > + * already ordered after the vIPI flag write. > + */ > + if (send) > + aic_ic_write(ic, AIC_IPI_SEND, send); > +} > + > +static struct irq_chip ipi_chip = { > + .name = "AIC-IPI", > + .irq_mask = aic_ipi_mask, > + .irq_unmask = aic_ipi_unmask, > + .ipi_send_mask = aic_ipi_send_mask, > +}; > + > +/* > + * IPI IRQ domain > + */ > + > +static void aic_handle_ipi(struct pt_regs *regs) > +{ > + int this_cpu = smp_processor_id(); > + int i; > + unsigned long enabled, firing; > + > + /* > + * Ack the IPI. We need to order this after the AIC event read, but > + * that is enforced by normal MMIO ordering guarantees. > + */ > + aic_ic_write(aic_irqc, AIC_IPI_ACK, AIC_IPI_OTHER); > + > + /* > + * The mask read does not need to be ordered. Only we can change > + * our own mask anyway, so no races are possible here, as long as > + * we are properly in the interrupt handler (which is covered by > + * the barrier that is part of the top-level AIC handler's readl()). > + */ > + enabled = atomic_read(&aic_vipi_enable[this_cpu]); > + > + /* > + * Clear the IPIs we are about to handle. This pairs with the > + * atomic_fetch_or_release() in aic_ipi_send_mask(), and needs to be > + * ordered after the aic_ic_write() above (to avoid dropping vIPIs) and > + * before IPI handling code (to avoid races handling vIPIs before they > + * are signaled). The former is taken care of by the release semantics > + * of the write portion, while the latter is taken care of by the > + * acquire semantics of the read portion. > + */ > + firing = atomic_fetch_andnot(enabled, &aic_vipi_flag[this_cpu]) & enabled; Does this also need to be ordered after the Ack? For example, if we have something like: CPU 0 CPU 1 <some other IPI> aic_ipi_send_mask() atomic_fetch_andnot(flag) atomic_fetch_or_release(flag) aic_ic_write(AIC_IPI_SEND) aic_ic_write(AIC_IPI_ACK) sorry if it's a stupid question, I'm just not sure about the cases in which the hardware will pend things for you. Will
On 25/03/2021 04.09, Arnd Bergmann wrote: > On Wed, Mar 24, 2021 at 7:12 PM Will Deacon <will@kernel.org> wrote: >> >>> +/* >>> + * ioremap_np needs an explicit architecture implementation, as it >>> + * requests stronger semantics than regular ioremap(). Portable drivers >>> + * should instead use one of the higher-level abstractions, like >>> + * devm_ioremap_resource(), to choose the correct variant for any given >>> + * device and bus. Portable drivers with a good reason to want non-posted >>> + * write semantics should always provide an ioremap() fallback in case >>> + * ioremap_np() is not available. >>> + */ >>> +#ifndef ioremap_np >>> +#define ioremap_np ioremap_np >>> +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) >>> +{ >>> + return NULL; >>> +} >>> +#endif >> >> Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np() >> if it is supported by the architecture? That way, we could avoid defining >> both on arm64. > > Good idea. It needs a fallback in case the ioremap_np() fails on most > architectures, but that sounds easy enough. > > Since pci_remap_cfgspace() only has custom implementations, it sounds like > we can actually make the generic implementation unconditional in the end, > but that requires adding ioremap_np() on 32-bit as well, and I would keep > that separate from this series. Sounds good; I'm adding a patch to adjust the generic implementation and remove the arm64 one in v4, and we can then complete the cleanup for other arches later.
On Thu, Mar 25, 2021 at 11:07:40PM +0900, Hector Martin wrote: > On 25/03/2021 04.09, Arnd Bergmann wrote: > > On Wed, Mar 24, 2021 at 7:12 PM Will Deacon <will@kernel.org> wrote: > > > > > > > +/* > > > > + * ioremap_np needs an explicit architecture implementation, as it > > > > + * requests stronger semantics than regular ioremap(). Portable drivers > > > > + * should instead use one of the higher-level abstractions, like > > > > + * devm_ioremap_resource(), to choose the correct variant for any given > > > > + * device and bus. Portable drivers with a good reason to want non-posted > > > > + * write semantics should always provide an ioremap() fallback in case > > > > + * ioremap_np() is not available. > > > > + */ > > > > +#ifndef ioremap_np > > > > +#define ioremap_np ioremap_np > > > > +static inline void __iomem *ioremap_np(phys_addr_t offset, size_t size) > > > > +{ > > > > + return NULL; > > > > +} > > > > +#endif > > > > > > Can we implement the generic pci_remap_cfgspace() in terms of ioremap_np() > > > if it is supported by the architecture? That way, we could avoid defining > > > both on arm64. > > > > Good idea. It needs a fallback in case the ioremap_np() fails on most > > architectures, but that sounds easy enough. > > > > Since pci_remap_cfgspace() only has custom implementations, it sounds like > > we can actually make the generic implementation unconditional in the end, > > but that requires adding ioremap_np() on 32-bit as well, and I would keep > > that separate from this series. > > Sounds good; I'm adding a patch to adjust the generic implementation and > remove the arm64 one in v4, and we can then complete the cleanup for other > arches later. Cheers. Don't forget to update the comment in the generic code which complains about the lack of an ioremap() API for non-posted writes ;) Will
On 25/03/2021 04.04, Will Deacon wrote: > On Wed, Mar 24, 2021 at 06:59:21PM +0000, Mark Rutland wrote: >> So far we've kept arch/arm64/ largely devoid of IMP-DEF bits, and it >> seems a shame to add something with the sole purpose of collating that, >> especially given arch code shouldn't need to touch these if FW and >> bootloader have done their jobs right. >> >> Can we put the definitions in the relevant drivers? That would sidestep >> any pain with MAINTAINERS, too. > > If we can genuinely ignore these in arch code, then sure. I just don't know > how long that is going to be the case, and ending up in a situation where > these are scattered randomly throughout the tree sounds horrible to me. I thought we would need some in KVM code, but given the direction Marc's series ended up in, it seems we won't. So I'm happy keeping these in the respective drivers; if this ends up being messy in the future it shouldn't be a big deal to refactor it all into one file again.
On 25/03/2021 05.00, Marc Zyngier wrote: > I've come up with this on top of the original patch, spitting a > warning when the right conditions are met. It's pretty ugly, but hey, > so is the HW this runs on. [...] Folded into v4 and tested; works fine with `kvm-arm.mode=nvhe`, throwing the ugly WARN.
On 08/03/2021 22.31, Marc Zyngier wrote: >> + if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) && >> + read_sysreg_s(SYS_ICH_MISR_EL2) != 0) { >> + pr_err("vGIC IRQ fired, disabling.\n"); > > Please add a _ratelimited here. Whilst debugging KVM on this machine, > I ended up with this firing at such a rate that it was impossible to > do anything useful. Ratelimiting it allowed me to pinpoint the > problem. Ouch. Done for v4. >> +static void aic_fiq_eoi(struct irq_data *d) >> +{ >> + /* We mask to ack (where we can), so we need to unmask at EOI. */ >> + if (!irqd_irq_disabled(d) && !irqd_irq_masked(d)) > > Ah, be careful here: irqd_irq_masked() doesn't do what you think it > does for per-CPU interrupts. It's been on my list to fix for the rVIC > implementation, but I never got around to doing it, and all decent ICs > hide this from SW by having a HW-managed mask, similar to what is on > the IRQ side. > > I can see two possibilities: > > - you can track the masked state directly and use that instead of > these predicates > > - you can just drop the masking altogether as this is only useful to a > hosted hypervisor (KVM), which will have to do its own masking > behind the scenes anyway > Since you're using the masking in KVM after all, I'm tracking the mask state in a percpu variable now. Also folded in your two minor bugfixes from the KVM series. Cheers!
Hi Will, No worries, I was busy until a couple days ago anyway. On 25/03/2021 04.57, Will Deacon wrote: >> +#define pr_fmt(fmt) "%s: " fmt, __func__ > > General nit: but I suspect many of the prints in here probably want to be > using the *_ratelimited variants. You're right, everything complaining about spurious/unsupported stuff is probably safer rate-limited. I also made them all pr_err_ratelimited, since nothing should be trying to use that hardware before we get around to supporting it in this driver. >> +static void __exception_irq_entry aic_handle_irq(struct pt_regs *regs) >> +{ >> + struct aic_irq_chip *ic = aic_irqc; >> + u32 event, type, irq; >> + >> + do { >> + /* >> + * We cannot use a relaxed read here, as DMA needs to be >> + * ordered with respect to the IRQ firing. >> + */ > > I think this could be a bit clearer: the readl() doesn't order any DMA > accesses, but instead means that subsequent reads by the CPU are ordered > (which may be from a buffer which was DMA'd to) are ordered after the > read of the MMIO register. Good point, reworded it for v4: /* * We cannot use a relaxed read here, as reads from DMA buffers * need to be ordered after the IRQ fires. */ > >> + event = readl(ic->base + AIC_EVENT); >> + type = FIELD_GET(AIC_EVENT_TYPE, event); >> + irq = FIELD_GET(AIC_EVENT_NUM, event); >> + >> + if (type == AIC_EVENT_TYPE_HW) >> + handle_domain_irq(aic_irqc->hw_domain, irq, regs); >> + else if (type == AIC_EVENT_TYPE_IPI && irq == 1) >> + aic_handle_ipi(regs); >> + else if (event != 0) >> + pr_err("Unknown IRQ event %d, %d\n", type, irq); >> + } while (event); >> + >> + /* >> + * vGIC maintenance interrupts end up here too, so we need to check >> + * for them separately. Just report and disable vGIC for now, until >> + * we implement this properly. >> + */ >> + if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) && >> + read_sysreg_s(SYS_ICH_MISR_EL2) != 0) { >> + pr_err("vGIC IRQ fired, disabling.\n"); >> + sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0); >> + } > > What prevents all these system register accesses being speculated up before > the handler? Nothing, but that's not a problem, is it? If the condition is met, it means the vGIC IRQ *is* firing and needs clearing. We don't particularly care if this happens before, after, or during the rest of the IRQ handling. I changed the message to this, because we actually should never hit this path with correctly-working KVM code (it takes care of it before this handler runs): pr_err_ratelimited("vGIC IRQ fired and not handled by KVM, disabling.\n"); >> +static struct irq_chip aic_chip = { >> + .name = "AIC", >> + .irq_mask = aic_irq_mask, >> + .irq_unmask = aic_irq_unmask, > > I know these are driven by the higher-level irq chip code, but I'm a bit > confused as to what provides ordering if, e.g. something ends up calling: > > aic_chip.irq_mask(d); > ... > aic_chip.irq_unmask(d); > > I can't see any ISBs in here and they're writing to two different registers, > so can we end up with the IRQ masked after this sequence? Wait, aren't MMIO writes to the same peripheral using device-nGnRnE memory modes always ordered with respect to each other? I thought the _relaxed versions were only trouble when mixed with memory/DMA buffers, and MMIO for any given peripheral always takes effect in program order. >> +static void aic_ipi_mask(struct irq_data *d) >> +{ >> + u32 irq_bit = BIT(irqd_to_hwirq(d)); >> + int this_cpu = smp_processor_id(); >> + >> + /* No specific ordering requirements needed here. */ >> + atomic_andnot(irq_bit, &aic_vipi_enable[this_cpu]); >> +} > > Why not use a per-cpu variable here instead of an array of atomics? The pcpu > API has things for atomic updates (e.g. or, and, xchg). One CPU still needs to be able to mutate the flags of another CPU to fire an IPI; AIUI the per-cpu ops are *not* atomic for concurrent access by multiple CPUs, and in fact there is no API for that, only for "this CPU". >> +static void aic_ipi_unmask(struct irq_data *d) >> +{ >> + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); >> + u32 irq_bit = BIT(irqd_to_hwirq(d)); >> + int this_cpu = smp_processor_id(); >> + >> + /* >> + * This must complete before the atomic_read_acquire() below to avoid >> + * racing aic_ipi_send_mask(). Use a dummy fetch op with release >> + * semantics for this. This is arch-specific: ARMv8 B2.3.3 specifies >> + * that writes with Release semantics are Barrier-ordered-before reads >> + * with Acquire semantics, even though the Linux arch-independent >> + * definition of these atomic ops does not. >> + */ > > I think a more idiomatic (and portable) way to do this would be to use > the relaxed accessors, but with smp_mb__after_atomic() between them. Do you > have a good reason for _not_ doing it like that? Not particularly, other than symmetry with the case below. >> + /* >> + * This sequence is the mirror of the one in aic_ipi_unmask(); >> + * see the comment there. Additionally, release semantics >> + * ensure that the vIPI flag set is ordered after any shared >> + * memory accesses that precede it. This therefore also pairs >> + * with the atomic_fetch_andnot in aic_handle_ipi(). >> + */ >> + pending = atomic_fetch_or_release(irq_bit, &aic_vipi_flag[cpu]); We do need the return data here, and the release semantics (or another barrier before it). But the read below can be made relaxed and a barrier used instead, and then the same patern above except with a plain atomic_or(). >> + if (!(pending & irq_bit) && (atomic_read_acquire(&aic_vipi_enable[cpu]) & irq_bit)) >> + send |= AIC_IPI_SEND_CPU(cpu); >> + } [...] >> + /* >> + * Clear the IPIs we are about to handle. This pairs with the >> + * atomic_fetch_or_release() in aic_ipi_send_mask(), and needs to be >> + * ordered after the aic_ic_write() above (to avoid dropping vIPIs) and >> + * before IPI handling code (to avoid races handling vIPIs before they >> + * are signaled). The former is taken care of by the release semantics >> + * of the write portion, while the latter is taken care of by the >> + * acquire semantics of the read portion. >> + */ >> + firing = atomic_fetch_andnot(enabled, &aic_vipi_flag[this_cpu]) & enabled; > > Does this also need to be ordered after the Ack? For example, if we have > something like: > > CPU 0 CPU 1 > <some other IPI> > aic_ipi_send_mask() > atomic_fetch_andnot(flag) > atomic_fetch_or_release(flag) > aic_ic_write(AIC_IPI_SEND) > aic_ic_write(AIC_IPI_ACK) > > sorry if it's a stupid question, I'm just not sure about the cases in which > the hardware will pend things for you. It is ordered, right? As the comment says, it "needs to be ordered after the aic_ic_write() above". atomic_fetch_andnot() is *supposed* to be fully ordered and that should include against the writel_relaxed() on AIC_IPI_FLAG. On ARM it turns out it's not quite fully ordered, but the acquire semantics of the read half are sufficient for this case, as they guarantee the flags are always read after the FIQ has been ACKed. Cheeers,
On 06/03/2021 00.05, Andy Shevchenko wrote: >> +#define pr_fmt(fmt) "%s: " fmt, __func__ > > This is not needed, really, if you have unique / distinguishable > messages in the first place. > Rather people include module names, which may be useful. Makes sense, I'll switch to KBUILD_MODNAME. >> +#define MASK_BIT(x) BIT((x) & 0x1f) > > GENMASK(4,0) It's not really a register bitmask, but rather extracting the low bits of an index... but sure, GENMASK also expresses that. Changed. >> +static atomic_t aic_vipi_flag[AIC_MAX_CPUS]; >> +static atomic_t aic_vipi_enable[AIC_MAX_CPUS]; > > Isn't it easier to handle these when they are full width, i.e. 32 > items per the array? I don't think so, it doesn't really buy us anything. It's just a maximum beyond which the driver doesn't work in its current state anyway (if the number were much larger it'd make sense to dynamically allocate these, but not at this point). >> +static int aic_irq_set_affinity(struct irq_data *d, >> + const struct cpumask *mask_val, bool force) >> +{ >> + irq_hw_number_t hwirq = irqd_to_hwirq(d); >> + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); >> + int cpu; >> + >> + if (hwirq > ic->nr_hw) > > >= ? Good catch, but this is actually obsolete. Higher IRQs go into the FIQ irqchip, so this should never happen (it's a leftover from when they were a single one). I'll remove it. Ack on the other comments, thanks!
Hi Hector, On Fri, Mar 26, 2021 at 05:58:15PM +0900, Hector Martin wrote: > On 25/03/2021 04.57, Will Deacon wrote: > > > + event = readl(ic->base + AIC_EVENT); > > > + type = FIELD_GET(AIC_EVENT_TYPE, event); > > > + irq = FIELD_GET(AIC_EVENT_NUM, event); > > > + > > > + if (type == AIC_EVENT_TYPE_HW) > > > + handle_domain_irq(aic_irqc->hw_domain, irq, regs); > > > + else if (type == AIC_EVENT_TYPE_IPI && irq == 1) > > > + aic_handle_ipi(regs); > > > + else if (event != 0) > > > + pr_err("Unknown IRQ event %d, %d\n", type, irq); > > > + } while (event); > > > + > > > + /* > > > + * vGIC maintenance interrupts end up here too, so we need to check > > > + * for them separately. Just report and disable vGIC for now, until > > > + * we implement this properly. > > > + */ > > > + if ((read_sysreg_s(SYS_ICH_HCR_EL2) & ICH_HCR_EN) && > > > + read_sysreg_s(SYS_ICH_MISR_EL2) != 0) { > > > + pr_err("vGIC IRQ fired, disabling.\n"); > > > + sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0); > > > + } > > > > What prevents all these system register accesses being speculated up before > > the handler? > > Nothing, but that's not a problem, is it? If the condition is met, it means > the vGIC IRQ *is* firing and needs clearing. We don't particularly care if > this happens before, after, or during the rest of the IRQ handling. > > I changed the message to this, because we actually should never hit this > path with correctly-working KVM code (it takes care of it before this > handler runs): > > pr_err_ratelimited("vGIC IRQ fired and not handled by KVM, disabling.\n"); Looks good. > > > +static struct irq_chip aic_chip = { > > > + .name = "AIC", > > > + .irq_mask = aic_irq_mask, > > > + .irq_unmask = aic_irq_unmask, > > > > I know these are driven by the higher-level irq chip code, but I'm a bit > > confused as to what provides ordering if, e.g. something ends up calling: > > > > aic_chip.irq_mask(d); > > ... > > aic_chip.irq_unmask(d); > > > > I can't see any ISBs in here and they're writing to two different registers, > > so can we end up with the IRQ masked after this sequence? > > Wait, aren't MMIO writes to the same peripheral using device-nGnRnE memory > modes always ordered with respect to each other? I thought the _relaxed > versions were only trouble when mixed with memory/DMA buffers, and MMIO for > any given peripheral always takes effect in program order. Sorry, this was my mistake -- I seem to have mixed up the MMIO parts with the system register parts. In this case, aic_irq_[un]mask() are MMIO writes, so things work out. It's the FIQ mask/unmask code that needs the ISBs. > > > +static void aic_ipi_mask(struct irq_data *d) > > > +{ > > > + u32 irq_bit = BIT(irqd_to_hwirq(d)); > > > + int this_cpu = smp_processor_id(); > > > + > > > + /* No specific ordering requirements needed here. */ > > > + atomic_andnot(irq_bit, &aic_vipi_enable[this_cpu]); > > > +} > > > > Why not use a per-cpu variable here instead of an array of atomics? The pcpu > > API has things for atomic updates (e.g. or, and, xchg). > > One CPU still needs to be able to mutate the flags of another CPU to fire an > IPI; AIUI the per-cpu ops are *not* atomic for concurrent access by multiple > CPUs, and in fact there is no API for that, only for "this CPU". Huh, I really thought we had an API for that, but you're right. Oh well! But I'd still suggest a per-cpu atomic_t in that case, rather than the array. > > > +static void aic_ipi_unmask(struct irq_data *d) > > > +{ > > > + struct aic_irq_chip *ic = irq_data_get_irq_chip_data(d); > > > + u32 irq_bit = BIT(irqd_to_hwirq(d)); > > > + int this_cpu = smp_processor_id(); > > > + > > > + /* > > > + * This must complete before the atomic_read_acquire() below to avoid > > > + * racing aic_ipi_send_mask(). Use a dummy fetch op with release > > > + * semantics for this. This is arch-specific: ARMv8 B2.3.3 specifies > > > + * that writes with Release semantics are Barrier-ordered-before reads > > > + * with Acquire semantics, even though the Linux arch-independent > > > + * definition of these atomic ops does not. > > > + */ > > > > I think a more idiomatic (and portable) way to do this would be to use > > the relaxed accessors, but with smp_mb__after_atomic() between them. Do you > > have a good reason for _not_ doing it like that? > > Not particularly, other than symmetry with the case below. I think it would be better not to rely on arm64-specific ordering unless there's a good reason to. > > > + /* > > > + * This sequence is the mirror of the one in aic_ipi_unmask(); > > > + * see the comment there. Additionally, release semantics > > > + * ensure that the vIPI flag set is ordered after any shared > > > + * memory accesses that precede it. This therefore also pairs > > > + * with the atomic_fetch_andnot in aic_handle_ipi(). > > > + */ > > > + pending = atomic_fetch_or_release(irq_bit, &aic_vipi_flag[cpu]); > > We do need the return data here, and the release semantics (or another > barrier before it). But the read below can be made relaxed and a barrier > used instead, and then the same patern above except with a plain > atomic_or(). Yes, I think using atomic_fetch_or() followed by atomic_read() would be best (obviously with the relevant comments!) > > > > + if (!(pending & irq_bit) && (atomic_read_acquire(&aic_vipi_enable[cpu]) & irq_bit)) > > > + send |= AIC_IPI_SEND_CPU(cpu); > > > + } > > [...] > > > > + /* > > > + * Clear the IPIs we are about to handle. This pairs with the > > > + * atomic_fetch_or_release() in aic_ipi_send_mask(), and needs to be > > > + * ordered after the aic_ic_write() above (to avoid dropping vIPIs) and > > > + * before IPI handling code (to avoid races handling vIPIs before they > > > + * are signaled). The former is taken care of by the release semantics > > > + * of the write portion, while the latter is taken care of by the > > > + * acquire semantics of the read portion. > > > + */ > > > + firing = atomic_fetch_andnot(enabled, &aic_vipi_flag[this_cpu]) & enabled; > > > > Does this also need to be ordered after the Ack? For example, if we have > > something like: > > > > CPU 0 CPU 1 > > <some other IPI> > > aic_ipi_send_mask() > > atomic_fetch_andnot(flag) > > atomic_fetch_or_release(flag) > > aic_ic_write(AIC_IPI_SEND) > > aic_ic_write(AIC_IPI_ACK) > > > > sorry if it's a stupid question, I'm just not sure about the cases in which > > the hardware will pend things for you. > > It is ordered, right? As the comment says, it "needs to be ordered after the > aic_ic_write() above". atomic_fetch_andnot() is *supposed* to be fully > ordered and that should include against the writel_relaxed() on > AIC_IPI_FLAG. On ARM it turns out it's not quite fully ordered, but the > acquire semantics of the read half are sufficient for this case, as they > guarantee the flags are always read after the FIQ has been ACKed. Sorry, I missed that the answer to my question was already written in the comment. However, I'm still a bit unsure about whether the memory barriers give you what you need here. The barrier in atomic_fetch_andnot() will order the previous aic_ic_write(AIC_IPI_ACK) for the purposes of other CPUs reading those locations, but it doesn't say anything about when the interrupt controller actually changes state after the Ack. Given that the AIC is mapped Device-nGnRnE, the Arm ARM offers: | Additionally, for Device-nGnRnE memory, a read or write of a Location | in a Memory-mapped peripheral that exhibits side-effects is complete | only when the read or write both: | | * Can begin to affect the state of the Memory-mapped peripheral. | * Can trigger all associated side-effects, whether they affect other | peripheral devices, PEs, or memory. so without AIC documentation I can't tell whether completion of the Ack write just begins the process of an Ack (in which case we might need something like a read-back), or whether the write response back from the AIC only occurs once the Ack has taken effect. Any ideas? > Cheeers, No prooblem :) Will
Hi Will, On 29/03/2021 21.04, Will Deacon wrote: >> One CPU still needs to be able to mutate the flags of another CPU to fire an >> IPI; AIUI the per-cpu ops are *not* atomic for concurrent access by multiple >> CPUs, and in fact there is no API for that, only for "this CPU". > > Huh, I really thought we had an API for that, but you're right. Oh well! But > I'd still suggest a per-cpu atomic_t in that case, rather than the array. Yeah, after digging into the per-cpu stuff earlier and understanding how it works, I agree that a per-cpu atomic makes sense here. Switched it to that (which simplified out a bunch of smp_processor_id() calls too). Thanks! >>> I think a more idiomatic (and portable) way to do this would be to use >>> the relaxed accessors, but with smp_mb__after_atomic() between them. Do you >>> have a good reason for _not_ doing it like that? >> >> Not particularly, other than symmetry with the case below. > > I think it would be better not to rely on arm64-specific ordering unless > there's a good reason to. Sounds reasonable, I'll switch to the barrier version. >> We do need the return data here, and the release semantics (or another >> barrier before it). But the read below can be made relaxed and a barrier >> used instead, and then the same patern above except with a plain >> atomic_or(). > > Yes, I think using atomic_fetch_or() followed by atomic_read() would be > best (obviously with the relevant comments!) atomic_fetch_or_release is sufficient here (atomic_fetch_or is stronger; atomic_fetch_or_relaxed would not be strong enough as this needs to be ordered after any writes prior to sending the IPI; in this case release semantics also make logical sense). >> It is ordered, right? As the comment says, it "needs to be ordered after the >> aic_ic_write() above". atomic_fetch_andnot() is *supposed* to be fully >> ordered and that should include against the writel_relaxed() on >> AIC_IPI_FLAG. On ARM it turns out it's not quite fully ordered, but the >> acquire semantics of the read half are sufficient for this case, as they >> guarantee the flags are always read after the FIQ has been ACKed. > > Sorry, I missed that the answer to my question was already written in the > comment. However, I'm still a bit unsure about whether the memory barriers > give you what you need here. The barrier in atomic_fetch_andnot() will > order the previous aic_ic_write(AIC_IPI_ACK) for the purposes of other > CPUs reading those locations, but it doesn't say anything about when the > interrupt controller actually changes state after the Ack. > > Given that the AIC is mapped Device-nGnRnE, the Arm ARM offers: > > | Additionally, for Device-nGnRnE memory, a read or write of a Location > | in a Memory-mapped peripheral that exhibits side-effects is complete > | only when the read or write both: > | > | * Can begin to affect the state of the Memory-mapped peripheral. > | * Can trigger all associated side-effects, whether they affect other > | peripheral devices, PEs, or memory. > > so without AIC documentation I can't tell whether completion of the Ack write > just begins the process of an Ack (in which case we might need something like > a read-back), or whether the write response back from the AIC only occurs once > the Ack has taken effect. Any ideas? Ahh, you're talking about latency within AIC itself... I obviously don't have an authoritative answer to this, though the hardware designer in me wants to say this really ought to be single-cycle type stuff that isn't internally pipelined in a way that would create races. I tried to set up an SMP test case for the atomic-to-AIC sequence in m1n1, but unfortunately I couldn't hit the race window in deliberately racy code (i.e. ack after clearing flags) without widening it even further with at least one dummy load in between, and of course I didn't experience any races with the proper code either. What I can say is that a simple set IPI; ack IPI (in adjacent str instructions) sequence always yields a cleared IPI, and the converse always yields a set IPI. So if there is latency to the operations it seems it would at least be the same for sets and acks and would imply readbacks block, which should still yield equivalently correct results. But of course this is a single-CPU test, so it is not fully representative of what could happen in an SMP scenario. At this point all I can say is I'm inclined to shrug and say we have no evidence of this being something that can happen, and it shouldn't in sane hardware, and hope for the best :-) Thanks,