mbox series

[v2,0/5] irqchip/gic-v3: Disable pseudo NMIs on Mediatek Chromebooks w/ bad FW

Message ID 20230515131353.v2.cover@dianders
Headers show
Series irqchip/gic-v3: Disable pseudo NMIs on Mediatek Chromebooks w/ bad FW | expand

Message

Doug Anderson May 15, 2023, 8:13 p.m. UTC
As talked about in the bindings patch included in this series
("dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for
Mediatek SoCs w/ broken FW"), many Mediatek-based Chromebooks shipped
with firmware that doesn't properly save/restore some GICR
registers. This causes the system to crash if "pseudo NMIs" are turned
on.

This series makes sure that we never allow turning on "pseudo NMIs" if
we are running with the problematic firmware.

The patches in this series can land in any order and can go through
entirely different trees. None of the patches are harmful on their
own, but to get things fixed we need all of them.

v2 fixes the quirk name and also moves the quirk out of the SoC.dtsi
file and into the Chromebook file. This, unfortunately, means that
mt8186-based Chromebooks are no longer handled since they don't appear
to be upstream yet. :(

Changes in v2:
- "when CPUs are powered" => "when the GIC redistributors are..."
- Changed "Fixes" tag.
- Moved from mt8183.dtsi to mt8183-kukui.dtsi
- Moved from mt8192.dtsi to mt8192-asurada.dtsi
- Moved from mt8195.dtsi to mt8195-cherry.dtsi
- mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw

Douglas Anderson (5):
  dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek
    SoCs w/ broken FW
  irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware
    issues
  arm64: dts: mediatek: mt8183: Add mediatek,broken-save-restore-fw to
    kukui
  arm64: dts: mediatek: mt8192: Add mediatek,broken-save-restore-fw to
    asurada
  arm64: dts: mediatek: mt8195: Add mediatek,broken-save-restore-fw to
    cherry

 .../interrupt-controller/arm,gic-v3.yaml      |  6 ++++++
 .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 ++++
 .../boot/dts/mediatek/mt8192-asurada.dtsi     |  4 ++++
 .../boot/dts/mediatek/mt8195-cherry.dtsi      |  4 ++++
 drivers/irqchip/irq-gic-common.c              |  8 ++++++--
 drivers/irqchip/irq-gic-common.h              |  1 +
 drivers/irqchip/irq-gic-v3.c                  | 20 +++++++++++++++++++
 7 files changed, 45 insertions(+), 2 deletions(-)

Comments

Marc Zyngier May 16, 2023, 9:58 a.m. UTC | #1
On Mon, 15 May 2023 21:13:49 +0100,
Douglas Anderson <dianders@chromium.org> wrote:
> 
> As talked about in the bindings patch included in this series
> ("dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for
> Mediatek SoCs w/ broken FW"), many Mediatek-based Chromebooks shipped
> with firmware that doesn't properly save/restore some GICR
> registers. This causes the system to crash if "pseudo NMIs" are turned
> on.
> 
> This series makes sure that we never allow turning on "pseudo NMIs" if
> we are running with the problematic firmware.
> 
> The patches in this series can land in any order and can go through
> entirely different trees. None of the patches are harmful on their
> own, but to get things fixed we need all of them.
> 
> v2 fixes the quirk name and also moves the quirk out of the SoC.dtsi
> file and into the Chromebook file. This, unfortunately, means that
> mt8186-based Chromebooks are no longer handled since they don't appear
> to be upstream yet. :(
> 
> Changes in v2:
> - "when CPUs are powered" => "when the GIC redistributors are..."
> - Changed "Fixes" tag.
> - Moved from mt8183.dtsi to mt8183-kukui.dtsi
> - Moved from mt8192.dtsi to mt8192-asurada.dtsi
> - Moved from mt8195.dtsi to mt8195-cherry.dtsi
> - mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw
> 
> Douglas Anderson (5):
>   dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek
>     SoCs w/ broken FW
>   irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware
>     issues
>   arm64: dts: mediatek: mt8183: Add mediatek,broken-save-restore-fw to
>     kukui
>   arm64: dts: mediatek: mt8192: Add mediatek,broken-save-restore-fw to
>     asurada
>   arm64: dts: mediatek: mt8195: Add mediatek,broken-save-restore-fw to
>     cherry
> 
>  .../interrupt-controller/arm,gic-v3.yaml      |  6 ++++++
>  .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 ++++
>  .../boot/dts/mediatek/mt8192-asurada.dtsi     |  4 ++++
>  .../boot/dts/mediatek/mt8195-cherry.dtsi      |  4 ++++
>  drivers/irqchip/irq-gic-common.c              |  8 ++++++--
>  drivers/irqchip/irq-gic-common.h              |  1 +
>  drivers/irqchip/irq-gic-v3.c                  | 20 +++++++++++++++++++
>  7 files changed, 45 insertions(+), 2 deletions(-)

I'll take the first two patches as fixes. The rest can be merged via
the soc tree as required.

	M.
AngeloGioacchino Del Regno May 16, 2023, 1:18 p.m. UTC | #2
Il 15/05/23 22:13, Douglas Anderson ha scritto:
> Firmware shipped on mt8195 Chromebooks is affected by the GICR
> save/restore issue as described by the patch ("dt-bindings:
> interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/
> broken FW"). Add the quirk property.
> 
> Fixes: 5eb2e303ec6b ("arm64: dts: mediatek: Introduce MT8195 Cherry platform's Tomato")
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
AngeloGioacchino Del Regno May 16, 2023, 1:18 p.m. UTC | #3
Il 15/05/23 22:13, Douglas Anderson ha scritto:
> Firmware shipped on mt8192 Chromebooks is affected by the GICR
> save/restore issue as described by the patch ("dt-bindings:
> interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/
> broken FW"). Add the quirk property.
> 
> Fixes: 331fae2fc922 ("arm64: dts: mediatek: Introduce MT8192-based Asurada board family")
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
AngeloGioacchino Del Regno May 16, 2023, 1:18 p.m. UTC | #4
Il 15/05/23 22:13, Douglas Anderson ha scritto:
> Firmware shipped on mt8183 Chromebooks is affected by the GICR
> save/restore issue as described by the patch ("dt-bindings:
> interrupt-controller: arm,gic-v3: Add quirk for Mediatek SoCs w/
> broken FW"). Add the quirk property.
> 
> Fixes: cd894e274b74 ("arm64: dts: mt8183: Add krane-sku176 board")
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
AngeloGioacchino Del Regno May 16, 2023, 1:23 p.m. UTC | #5
Il 15/05/23 22:13, Douglas Anderson ha scritto:
> Some Chromebooks with Mediatek SoCs have a problem where the firmware
> doesn't properly save/restore certain GICR registers. Newer
> Chromebooks should fix this issue and we may be able to do firmware
> updates for old Chromebooks. At the moment, the only known issue with
> these Chromebooks is that we can't enable "pseudo NMIs" since the
> priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> with the problematic firmware causes crashes and freezes.
> 
> Let's detect devices with this problem and then disable "pseudo NMIs"
> on them. We'll detect the problem by looking for the presence of the
> "mediatek,broken-save-restore-fw" property in the GIC device tree
> node. Any devices with fixed firmware will not have this property.
> 
> Our detection plan works because we never bake a Chromebook's device
> tree into firmware. Instead, device trees are always bundled with the
> kernel. We'll update the device trees of all affected Chromebooks and
> then we'll never enable "pseudo NMI" on a kernel that is bundled with
> old device trees. When a firmware update is shipped that fixes this
> issue it will know to patch the device tree to remove the property.
> 
> In order to make this work, the quick detection mechanism of the GICv3
> code is extended to be able to look for properties in addition to
> looking at "compatible".
> 
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

I don't like firmware removing properties from my devicetrees and I'd like this
issue to get addressed in another way (use a scratch register? and check it in
Linux drivers to determine if the issue is not present: if scratch contains BIT(x),
do not parse the quirk) but that's a different discussion which is a bit out of
context for this patch, so:

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Doug Anderson May 16, 2023, 2:19 p.m. UTC | #6
Hi,

On Tue, May 16, 2023 at 6:23 AM AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com> wrote:
>
> Il 15/05/23 22:13, Douglas Anderson ha scritto:
> > Some Chromebooks with Mediatek SoCs have a problem where the firmware
> > doesn't properly save/restore certain GICR registers. Newer
> > Chromebooks should fix this issue and we may be able to do firmware
> > updates for old Chromebooks. At the moment, the only known issue with
> > these Chromebooks is that we can't enable "pseudo NMIs" since the
> > priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> > with the problematic firmware causes crashes and freezes.
> >
> > Let's detect devices with this problem and then disable "pseudo NMIs"
> > on them. We'll detect the problem by looking for the presence of the
> > "mediatek,broken-save-restore-fw" property in the GIC device tree
> > node. Any devices with fixed firmware will not have this property.
> >
> > Our detection plan works because we never bake a Chromebook's device
> > tree into firmware. Instead, device trees are always bundled with the
> > kernel. We'll update the device trees of all affected Chromebooks and
> > then we'll never enable "pseudo NMI" on a kernel that is bundled with
> > old device trees. When a firmware update is shipped that fixes this
> > issue it will know to patch the device tree to remove the property.
> >
> > In order to make this work, the quick detection mechanism of the GICv3
> > code is extended to be able to look for properties in addition to
> > looking at "compatible".
> >
> > Reviewed-by: Julius Werner <jwerner@chromium.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
>
> I don't like firmware removing properties from my devicetrees and I'd like this
> issue to get addressed in another way (use a scratch register? and check it in
> Linux drivers to determine if the issue is not present: if scratch contains BIT(x),
> do not parse the quirk) but that's a different discussion which is a bit out of
> context for this patch, so:

Any particular reason why? IMO it's actually a fair bit cleaner to
have firmware remove a property that's specifically documented for the
firmware to remove compared to having firmware adding properties to or
otherwise messing with the device tree. For the removal case, it's
easy from the device tree git history to find out about the property,
when it was added, and that it is expected that some versions of
firmware will remove it. IMO having firmware add properties can be a
little more mysterious, though that has its place too. In general,
though, firmware is expected to be able to be able to touch up the
device tree. It puts things in "chosen", adds bits describing the
firmware, can add things to the device tree to describe components it
is uniquely able to probe (like SDRAM), could enable/disable a
component if it has info about their presence, etc.

I'm happy to hear other opinions on it, but in my mind having a
sideband bit telling us to ignore the quirk is more confusing instead
of less confusing.


> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>

Thanks!
Marc Zyngier May 16, 2023, 2:57 p.m. UTC | #7
On Tue, 16 May 2023 14:23:52 +0100,
AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> wrote:
> 
> Il 15/05/23 22:13, Douglas Anderson ha scritto:
> > Some Chromebooks with Mediatek SoCs have a problem where the firmware
> > doesn't properly save/restore certain GICR registers. Newer
> > Chromebooks should fix this issue and we may be able to do firmware
> > updates for old Chromebooks. At the moment, the only known issue with
> > these Chromebooks is that we can't enable "pseudo NMIs" since the
> > priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> > with the problematic firmware causes crashes and freezes.
> > 
> > Let's detect devices with this problem and then disable "pseudo NMIs"
> > on them. We'll detect the problem by looking for the presence of the
> > "mediatek,broken-save-restore-fw" property in the GIC device tree
> > node. Any devices with fixed firmware will not have this property.
> > 
> > Our detection plan works because we never bake a Chromebook's device
> > tree into firmware. Instead, device trees are always bundled with the
> > kernel. We'll update the device trees of all affected Chromebooks and
> > then we'll never enable "pseudo NMI" on a kernel that is bundled with
> > old device trees. When a firmware update is shipped that fixes this
> > issue it will know to patch the device tree to remove the property.
> > 
> > In order to make this work, the quick detection mechanism of the GICv3
> > code is extended to be able to look for properties in addition to
> > looking at "compatible".
> > 
> > Reviewed-by: Julius Werner <jwerner@chromium.org>
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> 
> I don't like firmware removing properties from my devicetrees and I'd like this
> issue to get addressed in another way (use a scratch register? and check it in
> Linux drivers to determine if the issue is not present: if scratch contains BIT(x),
> do not parse the quirk) but that's a different discussion which is a bit out of
> context for this patch, so:

So what you're advocating for is that we have another flag somewhere
that says the same thing. Stored where? Accessible how? On top of
having to check for DT, ACPI, and SOC_ID interfaces, you want YAFM
(Yet Another Fixing Method)?

Thanks, but no, thanks.

	M.
Matthias Brugger May 29, 2023, 3:41 p.m. UTC | #8
On 16/05/2023 11:58, Marc Zyngier wrote:
> On Mon, 15 May 2023 21:13:49 +0100,
> Douglas Anderson <dianders@chromium.org> wrote:
>>
>> As talked about in the bindings patch included in this series
>> ("dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for
>> Mediatek SoCs w/ broken FW"), many Mediatek-based Chromebooks shipped
>> with firmware that doesn't properly save/restore some GICR
>> registers. This causes the system to crash if "pseudo NMIs" are turned
>> on.
>>
>> This series makes sure that we never allow turning on "pseudo NMIs" if
>> we are running with the problematic firmware.
>>
>> The patches in this series can land in any order and can go through
>> entirely different trees. None of the patches are harmful on their
>> own, but to get things fixed we need all of them.
>>
>> v2 fixes the quirk name and also moves the quirk out of the SoC.dtsi
>> file and into the Chromebook file. This, unfortunately, means that
>> mt8186-based Chromebooks are no longer handled since they don't appear
>> to be upstream yet. :(
>>
>> Changes in v2:
>> - "when CPUs are powered" => "when the GIC redistributors are..."
>> - Changed "Fixes" tag.
>> - Moved from mt8183.dtsi to mt8183-kukui.dtsi
>> - Moved from mt8192.dtsi to mt8192-asurada.dtsi
>> - Moved from mt8195.dtsi to mt8195-cherry.dtsi
>> - mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw
>>
>> Douglas Anderson (5):
>>    dt-bindings: interrupt-controller: arm,gic-v3: Add quirk for Mediatek
>>      SoCs w/ broken FW
>>    irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/ firmware
>>      issues
>>    arm64: dts: mediatek: mt8183: Add mediatek,broken-save-restore-fw to
>>      kukui
>>    arm64: dts: mediatek: mt8192: Add mediatek,broken-save-restore-fw to
>>      asurada
>>    arm64: dts: mediatek: mt8195: Add mediatek,broken-save-restore-fw to
>>      cherry
>>
>>   .../interrupt-controller/arm,gic-v3.yaml      |  6 ++++++
>>   .../arm64/boot/dts/mediatek/mt8183-kukui.dtsi |  4 ++++
>>   .../boot/dts/mediatek/mt8192-asurada.dtsi     |  4 ++++
>>   .../boot/dts/mediatek/mt8195-cherry.dtsi      |  4 ++++
>>   drivers/irqchip/irq-gic-common.c              |  8 ++++++--
>>   drivers/irqchip/irq-gic-common.h              |  1 +
>>   drivers/irqchip/irq-gic-v3.c                  | 20 +++++++++++++++++++
>>   7 files changed, 45 insertions(+), 2 deletions(-)
> 
> I'll take the first two patches as fixes. The rest can be merged via
> the soc tree as required.
> 
> 	M.
> 

Patches 3-5 applied now. Thanks!
Geert Uytterhoeven May 30, 2023, 8:29 a.m. UTC | #9
Hi Douglas,

On Mon, May 15, 2023 at 10:16 PM Douglas Anderson <dianders@chromium.org> wrote:
> Some Chromebooks with Mediatek SoCs have a problem where the firmware
> doesn't properly save/restore certain GICR registers. Newer
> Chromebooks should fix this issue and we may be able to do firmware
> updates for old Chromebooks. At the moment, the only known issue with
> these Chromebooks is that we can't enable "pseudo NMIs" since the
> priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> with the problematic firmware causes crashes and freezes.
>
> Let's detect devices with this problem and then disable "pseudo NMIs"
> on them. We'll detect the problem by looking for the presence of the
> "mediatek,broken-save-restore-fw" property in the GIC device tree
> node. Any devices with fixed firmware will not have this property.
>
> Our detection plan works because we never bake a Chromebook's device
> tree into firmware. Instead, device trees are always bundled with the
> kernel. We'll update the device trees of all affected Chromebooks and
> then we'll never enable "pseudo NMI" on a kernel that is bundled with
> old device trees. When a firmware update is shipped that fixes this
> issue it will know to patch the device tree to remove the property.
>
> In order to make this work, the quick detection mechanism of the GICv3
> code is extended to be able to look for properties in addition to
> looking at "compatible".
>
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw

Thanks for your patch, which is now commit 44bd78dd2b8897f5
("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/
firmware issues") in v6.4-rc4.

This causes enabling an unrelated workaround on R-Car V4H:

    GIC: enabling workaround for GICv3: Cavium erratum 38539

> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -16,7 +16,11 @@ void gic_enable_of_quirks(const struct device_node *np,
>                           const struct gic_quirk *quirks, void *data)
>  {
>         for (; quirks->desc; quirks++) {
> -               if (!of_device_is_compatible(np, quirks->compatible))
> +               if (quirks->compatible &&
> +                   !of_device_is_compatible(np, quirks->compatible))
> +                       continue;
> +               if (quirks->property &&
> +                   !of_property_read_bool(np, quirks->property))
>                         continue;

Presumably the loop should continue if none of quirks-compatible
or quirks->property is set?

>                 if (quirks->init(data))
>                         pr_info("GIC: enabling workaround for %s\n",
> @@ -28,7 +32,7 @@ void gic_enable_quirks(u32 iidr, const struct gic_quirk *quirks,
>                 void *data)
>  {
>         for (; quirks->desc; quirks++) {
> -               if (quirks->compatible)
> +               if (quirks->compatible || quirks->property)
>                         continue;
>                 if (quirks->iidr != (quirks->mask & iidr))
>                         continue;
> diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h
> index 27e3d4ed4f32..3db4592cda1c 100644
> --- a/drivers/irqchip/irq-gic-common.h
> +++ b/drivers/irqchip/irq-gic-common.h
> @@ -13,6 +13,7 @@
>  struct gic_quirk {
>         const char *desc;
>         const char *compatible;
> +       const char *property;
>         bool (*init)(void *data);
>         u32 iidr;
>         u32 mask;
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 6fcee221f201..a605aa79435a 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -39,6 +39,7 @@
>
>  #define FLAGS_WORKAROUND_GICR_WAKER_MSM8996    (1ULL << 0)
>  #define FLAGS_WORKAROUND_CAVIUM_ERRATUM_38539  (1ULL << 1)
> +#define FLAGS_WORKAROUND_MTK_GICR_SAVE         (1ULL << 2)
>
>  #define GIC_IRQ_TYPE_PARTITION (GIC_IRQ_TYPE_LPI + 1)
>
> @@ -1720,6 +1721,15 @@ static bool gic_enable_quirk_msm8996(void *data)
>         return true;
>  }
>
> +static bool gic_enable_quirk_mtk_gicr(void *data)
> +{
> +       struct gic_chip_data *d = data;
> +
> +       d->flags |= FLAGS_WORKAROUND_MTK_GICR_SAVE;
> +
> +       return true;
> +}
> +
>  static bool gic_enable_quirk_cavium_38539(void *data)
>  {
>         struct gic_chip_data *d = data;
> @@ -1792,6 +1802,11 @@ static const struct gic_quirk gic_quirks[] = {
>                 .compatible = "qcom,msm8996-gic-v3",
>                 .init   = gic_enable_quirk_msm8996,
>         },
> +       {
> +               .desc   = "GICv3: Mediatek Chromebook GICR save problem",
> +               .property = "mediatek,broken-save-restore-fw",
> +               .init   = gic_enable_quirk_mtk_gicr,
> +       },
>         {
>                 .desc   = "GICv3: HIP06 erratum 161010803",
>                 .iidr   = 0x0204043b,
> @@ -1834,6 +1849,11 @@ static void gic_enable_nmi_support(void)
>         if (!gic_prio_masking_enabled())
>                 return;
>
> +       if (gic_data.flags & FLAGS_WORKAROUND_MTK_GICR_SAVE) {
> +               pr_warn("Skipping NMI enable due to firmware issues\n");
> +               return;
> +       }
> +
>         ppi_nmi_refs = kcalloc(gic_data.ppi_nr, sizeof(*ppi_nmi_refs), GFP_KERNEL);
>         if (!ppi_nmi_refs)
>                 return;
> --
> 2.40.1.606.ga4b1b128d6-goog

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven May 30, 2023, 9:58 a.m. UTC | #10
Hi Marc,

On Tue, May 30, 2023 at 11:46 AM Marc Zyngier <maz@kernel.org> wrote:
> On Tue, 30 May 2023 09:29:02 +0100,
> Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Mon, May 15, 2023 at 10:16 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > Some Chromebooks with Mediatek SoCs have a problem where the firmware
> > > doesn't properly save/restore certain GICR registers. Newer
> > > Chromebooks should fix this issue and we may be able to do firmware
> > > updates for old Chromebooks. At the moment, the only known issue with
> > > these Chromebooks is that we can't enable "pseudo NMIs" since the
> > > priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> > > with the problematic firmware causes crashes and freezes.
> > >
> > > Let's detect devices with this problem and then disable "pseudo NMIs"
> > > on them. We'll detect the problem by looking for the presence of the
> > > "mediatek,broken-save-restore-fw" property in the GIC device tree
> > > node. Any devices with fixed firmware will not have this property.
> > >
> > > Our detection plan works because we never bake a Chromebook's device
> > > tree into firmware. Instead, device trees are always bundled with the
> > > kernel. We'll update the device trees of all affected Chromebooks and
> > > then we'll never enable "pseudo NMI" on a kernel that is bundled with
> > > old device trees. When a firmware update is shipped that fixes this
> > > issue it will know to patch the device tree to remove the property.
> > >
> > > In order to make this work, the quick detection mechanism of the GICv3
> > > code is extended to be able to look for properties in addition to
> > > looking at "compatible".
> > >
> > > Reviewed-by: Julius Werner <jwerner@chromium.org>
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > > Changes in v2:
> > > - mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw
> >
> > Thanks for your patch, which is now commit 44bd78dd2b8897f5
> > ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/
> > firmware issues") in v6.4-rc4.
> >
> > This causes enabling an unrelated workaround on R-Car V4H:
> >
> >     GIC: enabling workaround for GICv3: Cavium erratum 38539
> >
> > > --- a/drivers/irqchip/irq-gic-common.c
> > > +++ b/drivers/irqchip/irq-gic-common.c
> > > @@ -16,7 +16,11 @@ void gic_enable_of_quirks(const struct device_node *np,
> > >                           const struct gic_quirk *quirks, void *data)
> > >  {
> > >         for (; quirks->desc; quirks++) {
> > > -               if (!of_device_is_compatible(np, quirks->compatible))
> > > +               if (quirks->compatible &&
> > > +                   !of_device_is_compatible(np, quirks->compatible))
> > > +                       continue;
> > > +               if (quirks->property &&
> > > +                   !of_property_read_bool(np, quirks->property))
> > >                         continue;
> >
> > Presumably the loop should continue if none of quirks-compatible
> > or quirks->property is set?
>
> Indeed, thanks for pointing that out. Can you give the following hack
> a go (compile tested only)?
>
> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
> index de47b51cdadb..7b591736ab58 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -16,6 +16,8 @@ void gic_enable_of_quirks(const struct device_node *np,
>                           const struct gic_quirk *quirks, void *data)
>  {
>         for (; quirks->desc; quirks++) {
> +               if (!quirks->compatible && !quirks->property)
> +                       continue;
>                 if (quirks->compatible &&
>                     !of_device_is_compatible(np, quirks->compatible))
>                         continue;
>
> If that works for you, I'll queue it ASAP.

Thanks, that fixes the issue for me on Renesas White-Hawk (R-Car V4H).
No regressions on Koelsch (R-Car M2-W) and Salvator-XS (R-Car H3 ES2.0).
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
Doug Anderson May 30, 2023, 4:36 p.m. UTC | #11
Hi,

On Tue, May 30, 2023 at 2:46 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 May 2023 09:29:02 +0100,
> Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Douglas,
> >
> > On Mon, May 15, 2023 at 10:16 PM Douglas Anderson <dianders@chromium.org> wrote:
> > > Some Chromebooks with Mediatek SoCs have a problem where the firmware
> > > doesn't properly save/restore certain GICR registers. Newer
> > > Chromebooks should fix this issue and we may be able to do firmware
> > > updates for old Chromebooks. At the moment, the only known issue with
> > > these Chromebooks is that we can't enable "pseudo NMIs" since the
> > > priority register can be lost. Enabling "pseudo NMIs" on Chromebooks
> > > with the problematic firmware causes crashes and freezes.
> > >
> > > Let's detect devices with this problem and then disable "pseudo NMIs"
> > > on them. We'll detect the problem by looking for the presence of the
> > > "mediatek,broken-save-restore-fw" property in the GIC device tree
> > > node. Any devices with fixed firmware will not have this property.
> > >
> > > Our detection plan works because we never bake a Chromebook's device
> > > tree into firmware. Instead, device trees are always bundled with the
> > > kernel. We'll update the device trees of all affected Chromebooks and
> > > then we'll never enable "pseudo NMI" on a kernel that is bundled with
> > > old device trees. When a firmware update is shipped that fixes this
> > > issue it will know to patch the device tree to remove the property.
> > >
> > > In order to make this work, the quick detection mechanism of the GICv3
> > > code is extended to be able to look for properties in addition to
> > > looking at "compatible".
> > >
> > > Reviewed-by: Julius Werner <jwerner@chromium.org>
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > > Changes in v2:
> > > - mediatek,gicr-save-quirk => mediatek,broken-save-restore-fw
> >
> > Thanks for your patch, which is now commit 44bd78dd2b8897f5
> > ("irqchip/gic-v3: Disable pseudo NMIs on Mediatek devices w/
> > firmware issues") in v6.4-rc4.
> >
> > This causes enabling an unrelated workaround on R-Car V4H:
> >
> >     GIC: enabling workaround for GICv3: Cavium erratum 38539
> >
> > > --- a/drivers/irqchip/irq-gic-common.c
> > > +++ b/drivers/irqchip/irq-gic-common.c
> > > @@ -16,7 +16,11 @@ void gic_enable_of_quirks(const struct device_node *np,
> > >                           const struct gic_quirk *quirks, void *data)
> > >  {
> > >         for (; quirks->desc; quirks++) {
> > > -               if (!of_device_is_compatible(np, quirks->compatible))
> > > +               if (quirks->compatible &&
> > > +                   !of_device_is_compatible(np, quirks->compatible))
> > > +                       continue;
> > > +               if (quirks->property &&
> > > +                   !of_property_read_bool(np, quirks->property))
> > >                         continue;
> >
> > Presumably the loop should continue if none of quirks-compatible
> > or quirks->property is set?
>
> Indeed, thanks for pointing that out. Can you give the following hack
> a go (compile tested only)?
>
> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
> index de47b51cdadb..7b591736ab58 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -16,6 +16,8 @@ void gic_enable_of_quirks(const struct device_node *np,
>                           const struct gic_quirk *quirks, void *data)
>  {
>         for (; quirks->desc; quirks++) {
> +               if (!quirks->compatible && !quirks->property)
> +                       continue;

Sorry for missing this and thanks for the fix. Looks like this is
already committed, but in case it matters:

Reviewed-by: Douglas Anderson <dianders@chromium.org>