Message ID | 20190526222536.10917-1-peron.clem@gmail.com |
---|---|
Headers | show |
Series | Allwinner A64/H6 IR support | expand |
Hi, On Mon, May 27, 2019 at 12:25:28AM +0200, Clément Péron wrote: > Allwiner A31 has a different memory mapping so add the compatible > we will need it later. > > Signed-off-by: Clément Péron <peron.clem@gmail.com> > --- > drivers/media/rc/sunxi-cir.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/media/rc/sunxi-cir.c b/drivers/media/rc/sunxi-cir.c > index 307e44714ea0..29ac33b68596 100644 > --- a/drivers/media/rc/sunxi-cir.c > +++ b/drivers/media/rc/sunxi-cir.c > @@ -319,6 +319,7 @@ static int sunxi_ir_remove(struct platform_device *pdev) > static const struct of_device_id sunxi_ir_match[] = { > { .compatible = "allwinner,sun4i-a10-ir", }, > { .compatible = "allwinner,sun5i-a13-ir", }, > + { .compatible = "allwinner,sun6i-a31-ir", }, We should also move from reset_get_optional to the non optional variant for the A31, and ignore it otherwise. Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
On Mon, May 27, 2019 at 12:25:29AM +0200, Clément Péron wrote: > Since A31, memory mapping of the IR driver has changed. > > Prefer the A31 bindings instead of A13. > > Signed-off-by: Clément Péron <peron.clem@gmail.com> > --- > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- Can you split the H3 in a separate patch? this will go through a separate branch. Thanks! Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Hi Maxime, On Mon, 27 May 2019 at 09:47, Maxime Ripard <maxime.ripard@bootlin.com> wrote: > > On Mon, May 27, 2019 at 12:25:29AM +0200, Clément Péron wrote: > > Since A31, memory mapping of the IR driver has changed. > > > > Prefer the A31 bindings instead of A13. > > > > Signed-off-by: Clément Péron <peron.clem@gmail.com> > > --- > > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > > Can you split the H3 in a separate patch? this will go through a > separate branch. Ok I will, Thanks, Clément > > Thanks! > Maxime > > -- > Maxime Ripard, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com
Hi Maxime, On Mon, 27 May 2019 at 09:47, Maxime Ripard <maxime.ripard@bootlin.com> wrote: > > Hi, > > On Mon, May 27, 2019 at 12:25:28AM +0200, Clément Péron wrote: > > Allwiner A31 has a different memory mapping so add the compatible > > we will need it later. > > > > Signed-off-by: Clément Péron <peron.clem@gmail.com> > > --- > > drivers/media/rc/sunxi-cir.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/media/rc/sunxi-cir.c b/drivers/media/rc/sunxi-cir.c > > index 307e44714ea0..29ac33b68596 100644 > > --- a/drivers/media/rc/sunxi-cir.c > > +++ b/drivers/media/rc/sunxi-cir.c > > @@ -319,6 +319,7 @@ static int sunxi_ir_remove(struct platform_device *pdev) > > static const struct of_device_id sunxi_ir_match[] = { > > { .compatible = "allwinner,sun4i-a10-ir", }, > > { .compatible = "allwinner,sun5i-a13-ir", }, > > + { .compatible = "allwinner,sun6i-a31-ir", }, > > We should also move from reset_get_optional to the non optional > variant for the A31, and ignore it otherwise. Should this be done in this series ? Thanks, Clément > > Maxime > > -- > Maxime Ripard, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com
On Mon, May 27, 2019 at 10:20:05AM +0200, Clément Péron wrote: > Hi Maxime, > > On Mon, 27 May 2019 at 09:47, Maxime Ripard <maxime.ripard@bootlin.com> wrote: > > > > Hi, > > > > On Mon, May 27, 2019 at 12:25:28AM +0200, Clément Péron wrote: > > > Allwiner A31 has a different memory mapping so add the compatible > > > we will need it later. > > > > > > Signed-off-by: Clément Péron <peron.clem@gmail.com> > > > --- > > > drivers/media/rc/sunxi-cir.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/media/rc/sunxi-cir.c b/drivers/media/rc/sunxi-cir.c > > > index 307e44714ea0..29ac33b68596 100644 > > > --- a/drivers/media/rc/sunxi-cir.c > > > +++ b/drivers/media/rc/sunxi-cir.c > > > @@ -319,6 +319,7 @@ static int sunxi_ir_remove(struct platform_device *pdev) > > > static const struct of_device_id sunxi_ir_match[] = { > > > { .compatible = "allwinner,sun4i-a10-ir", }, > > > { .compatible = "allwinner,sun5i-a13-ir", }, > > > + { .compatible = "allwinner,sun6i-a31-ir", }, > > > > We should also move from reset_get_optional to the non optional > > variant for the A31, and ignore it otherwise. > > Should this be done in this series ? Yep, please Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Hi Clément, On Mon, May 27, 2019 at 12:25:26AM +0200, Clément Péron wrote: > Hi, > > A64 IR support series[1] pointed out that an A31 bindings should be > introduced. > > This series introduce the A31 compatible bindings, then switch it on > the already existing board. > > Finally introduce A64 and H6 support. Does H6 support actually work? I don't see any driver changes and last time I tried with the exact same bindings, I got RCU stalls shortly after boot. Enabling/disabling ir node was enough to trigger/stop the RCU stalls on H6. regards, o. > Regards, > Clément > > [1] https://lore.kernel.org/patchwork/patch/1031390/#1221464 > > Changes since v1: > - Document reset lines as required since A31 > - Explain the memory mapping difference in commit log > - Fix misspelling "Allwiner" to "Allwinner" > > Clément Péron (8): > dt-bindings: media: sunxi-ir: add A31 compatible > media: rc: sunxi: Add A31 compatible > ARM: dts: sunxi: prefer A31 instead of A13 for ir > dt-bindings: media: sunxi-ir: Add A64 compatible > dt-bindings: media: sunxi-ir: Add H6 compatible > arm64: dts: allwinner: h6: Add IR receiver node > arm64: dts: allwinner: h6: Enable IR on H6 boards > arm64: defconfig: enable IR SUNXI option > > Igors Makejevs (1): > arm64: dts: allwinner: a64: Add IR node > > Jernej Skrabec (1): > arm64: dts: allwinner: a64: Enable IR on Orange Pi Win > > .../devicetree/bindings/media/sunxi-ir.txt | 11 +++++++++-- > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > .../dts/allwinner/sun50i-a64-orangepi-win.dts | 4 ++++ > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 18 ++++++++++++++++++ > .../dts/allwinner/sun50i-h6-beelink-gs1.dts | 4 ++++ > .../dts/allwinner/sun50i-h6-orangepi.dtsi | 4 ++++ > .../boot/dts/allwinner/sun50i-h6-pine-h64.dts | 4 ++++ > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 19 +++++++++++++++++++ > arch/arm64/configs/defconfig | 1 + > drivers/media/rc/sunxi-cir.c | 1 + > 13 files changed, 68 insertions(+), 6 deletions(-) > > -- > 2.20.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Ondřej, On Mon, 27 May 2019 at 15:48, Ondřej Jirman <megous@megous.com> wrote: > > Hi Clément, > > On Mon, May 27, 2019 at 12:25:26AM +0200, Clément Péron wrote: > > Hi, > > > > A64 IR support series[1] pointed out that an A31 bindings should be > > introduced. > > > > This series introduce the A31 compatible bindings, then switch it on > > the already existing board. > > > > Finally introduce A64 and H6 support. > > Does H6 support actually work? I don't see any driver changes and last time > I tried with the exact same bindings, I got RCU stalls shortly after boot. Actually, I have tested only on H6 on my Beelink GS1 with a "NEC" remote. I have manually toggle the protocols and do a simple cat in /dev/input/event0 # echo nec > /sys/class/rc/rc0/protocols # cat /dev/input/event0 | hexdump 0000000 0093 0000 0000 0000 8bfb 0009 0000 0000 0000010 0004 0004 8028 0000 0093 0000 0000 0000 0000020 8bfb 0009 0000 0000 0000 0000 0000 0000 0000030 0093 0000 0000 0000 55be 000a 0000 0000 0000040 0004 0004 8028 0000 0093 0000 0000 0000 0000050 55be 000a 0000 0000 0000 0000 0000 0000 0000060 0093 0000 0000 0000 fa42 000d 0000 0000 0000070 0004 0004 8028 0000 0093 0000 0000 0000 0000080 fa42 000d 0000 0000 0000 0000 0000 0000 0000090 0093 0000 0000 0000 c41a 000e 0000 0000 00000a0 0004 0004 8028 0000 0093 0000 0000 0000 00000b0 c41a 000e 0000 0000 0000 0000 0000 0000 Which kernel did you test with? Do you have any log? Thanks, Clément > > Enabling/disabling ir node was enough to trigger/stop the RCU stalls on H6. > > regards, > o. > > > Regards, > > Clément > > > > [1] https://lore.kernel.org/patchwork/patch/1031390/#1221464 > > > > Changes since v1: > > - Document reset lines as required since A31 > > - Explain the memory mapping difference in commit log > > - Fix misspelling "Allwiner" to "Allwinner" > > > > Clément Péron (8): > > dt-bindings: media: sunxi-ir: add A31 compatible > > media: rc: sunxi: Add A31 compatible > > ARM: dts: sunxi: prefer A31 instead of A13 for ir > > dt-bindings: media: sunxi-ir: Add A64 compatible > > dt-bindings: media: sunxi-ir: Add H6 compatible > > arm64: dts: allwinner: h6: Add IR receiver node > > arm64: dts: allwinner: h6: Enable IR on H6 boards > > arm64: defconfig: enable IR SUNXI option > > > > Igors Makejevs (1): > > arm64: dts: allwinner: a64: Add IR node > > > > Jernej Skrabec (1): > > arm64: dts: allwinner: a64: Enable IR on Orange Pi Win > > > > .../devicetree/bindings/media/sunxi-ir.txt | 11 +++++++++-- > > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > > .../dts/allwinner/sun50i-a64-orangepi-win.dts | 4 ++++ > > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 18 ++++++++++++++++++ > > .../dts/allwinner/sun50i-h6-beelink-gs1.dts | 4 ++++ > > .../dts/allwinner/sun50i-h6-orangepi.dtsi | 4 ++++ > > .../boot/dts/allwinner/sun50i-h6-pine-h64.dts | 4 ++++ > > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 19 +++++++++++++++++++ > > arch/arm64/configs/defconfig | 1 + > > drivers/media/rc/sunxi-cir.c | 1 + > > 13 files changed, 68 insertions(+), 6 deletions(-) > > > > -- > > 2.20.1 > > > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Clément, On Mon, May 27, 2019 at 04:59:35PM +0200, Clément Péron wrote: > Hi Ondřej, > > On Mon, 27 May 2019 at 15:48, Ondřej Jirman <megous@megous.com> wrote: > > > > Hi Clément, > > > > On Mon, May 27, 2019 at 12:25:26AM +0200, Clément Péron wrote: > > > Hi, > > > > > > A64 IR support series[1] pointed out that an A31 bindings should be > > > introduced. > > > > > > This series introduce the A31 compatible bindings, then switch it on > > > the already existing board. > > > > > > Finally introduce A64 and H6 support. > > > > Does H6 support actually work? I don't see any driver changes and last time > > I tried with the exact same bindings, I got RCU stalls shortly after boot. > > Actually, I have tested only on H6 on my Beelink GS1 with a "NEC" remote. > > I have manually toggle the protocols and do a simple cat in /dev/input/event0 > # echo nec > /sys/class/rc/rc0/protocols > # cat /dev/input/event0 | hexdump > 0000000 0093 0000 0000 0000 8bfb 0009 0000 0000 > 0000010 0004 0004 8028 0000 0093 0000 0000 0000 > 0000020 8bfb 0009 0000 0000 0000 0000 0000 0000 > 0000030 0093 0000 0000 0000 55be 000a 0000 0000 > 0000040 0004 0004 8028 0000 0093 0000 0000 0000 > 0000050 55be 000a 0000 0000 0000 0000 0000 0000 > 0000060 0093 0000 0000 0000 fa42 000d 0000 0000 > 0000070 0004 0004 8028 0000 0093 0000 0000 0000 > 0000080 fa42 000d 0000 0000 0000 0000 0000 0000 > 0000090 0093 0000 0000 0000 c41a 000e 0000 0000 > 00000a0 0004 0004 8028 0000 0093 0000 0000 0000 > 00000b0 c41a 000e 0000 0000 0000 0000 0000 0000 > > > Which kernel did you test with? Do you have any log? I tested with my kernel (https://megous.com/git/linux/log/?h=opi3-5.2). I also tried with 5.1 and the same kernel build on H5, to exclude some early 5.2-rc bugs and to see if this is H6 specific. I'll try testing again with your patches, and get you some logs. But last time they were not very informative. regards, o. > Thanks, > Clément > > > > > Enabling/disabling ir node was enough to trigger/stop the RCU stalls on H6. > > > > regards, > > o. > > > > > Regards, > > > Clément > > > > > > [1] https://lore.kernel.org/patchwork/patch/1031390/#1221464 > > > > > > Changes since v1: > > > - Document reset lines as required since A31 > > > - Explain the memory mapping difference in commit log > > > - Fix misspelling "Allwiner" to "Allwinner" > > > > > > Clément Péron (8): > > > dt-bindings: media: sunxi-ir: add A31 compatible > > > media: rc: sunxi: Add A31 compatible > > > ARM: dts: sunxi: prefer A31 instead of A13 for ir > > > dt-bindings: media: sunxi-ir: Add A64 compatible > > > dt-bindings: media: sunxi-ir: Add H6 compatible > > > arm64: dts: allwinner: h6: Add IR receiver node > > > arm64: dts: allwinner: h6: Enable IR on H6 boards > > > arm64: defconfig: enable IR SUNXI option > > > > > > Igors Makejevs (1): > > > arm64: dts: allwinner: a64: Add IR node > > > > > > Jernej Skrabec (1): > > > arm64: dts: allwinner: a64: Enable IR on Orange Pi Win > > > > > > .../devicetree/bindings/media/sunxi-ir.txt | 11 +++++++++-- > > > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > > > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > > > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > > > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > > > .../dts/allwinner/sun50i-a64-orangepi-win.dts | 4 ++++ > > > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 18 ++++++++++++++++++ > > > .../dts/allwinner/sun50i-h6-beelink-gs1.dts | 4 ++++ > > > .../dts/allwinner/sun50i-h6-orangepi.dtsi | 4 ++++ > > > .../boot/dts/allwinner/sun50i-h6-pine-h64.dts | 4 ++++ > > > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 19 +++++++++++++++++++ > > > arch/arm64/configs/defconfig | 1 + > > > drivers/media/rc/sunxi-cir.c | 1 + > > > 13 files changed, 68 insertions(+), 6 deletions(-) > > > > > > -- > > > 2.20.1 > > > > > > > > > _______________________________________________ > > > linux-arm-kernel mailing list > > > linux-arm-kernel@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Clément, On Mon, May 27, 2019 at 06:31:17PM +0200, verejna wrote: > Hi Clément, > > On Mon, May 27, 2019 at 04:59:35PM +0200, Clément Péron wrote: > > Hi Ondřej, > > > > On Mon, 27 May 2019 at 15:48, Ondřej Jirman <megous@megous.com> wrote: > > > > > > Hi Clément, > > > > > > On Mon, May 27, 2019 at 12:25:26AM +0200, Clément Péron wrote: > > > > Hi, > > > > > > > > A64 IR support series[1] pointed out that an A31 bindings should be > > > > introduced. > > > > > > > > This series introduce the A31 compatible bindings, then switch it on > > > > the already existing board. > > > > > > > > Finally introduce A64 and H6 support. > > > > > > Does H6 support actually work? I don't see any driver changes and last time > > > I tried with the exact same bindings, I got RCU stalls shortly after boot. > > > > Actually, I have tested only on H6 on my Beelink GS1 with a "NEC" remote. > > > > I have manually toggle the protocols and do a simple cat in /dev/input/event0 > > # echo nec > /sys/class/rc/rc0/protocols > > # cat /dev/input/event0 | hexdump > > 0000000 0093 0000 0000 0000 8bfb 0009 0000 0000 > > 0000010 0004 0004 8028 0000 0093 0000 0000 0000 > > 0000020 8bfb 0009 0000 0000 0000 0000 0000 0000 > > 0000030 0093 0000 0000 0000 55be 000a 0000 0000 > > 0000040 0004 0004 8028 0000 0093 0000 0000 0000 > > 0000050 55be 000a 0000 0000 0000 0000 0000 0000 > > 0000060 0093 0000 0000 0000 fa42 000d 0000 0000 > > 0000070 0004 0004 8028 0000 0093 0000 0000 0000 > > 0000080 fa42 000d 0000 0000 0000 0000 0000 0000 > > 0000090 0093 0000 0000 0000 c41a 000e 0000 0000 > > 00000a0 0004 0004 8028 0000 0093 0000 0000 0000 > > 00000b0 c41a 000e 0000 0000 0000 0000 0000 0000 > > > > > > Which kernel did you test with? Do you have any log? > > I tested with my kernel (https://megous.com/git/linux/log/?h=opi3-5.2). I also > tried with 5.1 and the same kernel build on H5, to exclude some early 5.2-rc > bugs and to see if this is H6 specific. > > I'll try testing again with your patches, and get you some logs. But last time > they were not very informative. I'm testing on Orange Pi 3. With your patches, I get kernel lockup after ~1 minute of use (ssh stops responding/serial console stops responding). I don't have RC controller to test the CIR. But just enabling the CIR causes kernel to hang shortly after boot. I tried booting multiple times. Other results: boot 2: - ssh hangs even before connecting (ethernet crashes/is reset) INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 dwmac-sun8i 5020000.ethernet eth0: Reset adapter. rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. rcu: blocking rcu_node structures: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. rcu: blocking rcu_node structures: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 above messages appear regularly. boot 3: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 Sometimes serial console keeps working. Sometimes it locks up too (but not frequently). Storage locks up always (any program that was not run before the crash can't be started and lock up the kernel hard, programs that were executed prior, can be run again). Exactly the same kernel build on H5 seems to work (or at least I was not able to trigger the crash). So this seems to be limited to H6 for now. I suspect that the crash occurs sooner if I vary the light (turn on/off the table lamp light). Without your patches, everything works fine on H6, and I never see crashes/lockups. I tired physically covering the IR receiver, and that helps preventing the crash. As soon as I uncover it, the crash happens again in 1s or so: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 This time I got the hung task and reboot: (probably not directly related) INFO: task find:560 blocked for more than 120 seconds. Not tainted 5.2.0-rc2+ #7 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. find D 0 560 551 0x00000000 Call trace: __switch_to+0x6c/0x90 __schedule+0x1f4/0x578 schedule+0x28/0xa8 io_schedule+0x18/0x38 __lock_page+0x12c/0x208 pagecache_get_page+0x238/0x2e8 __get_node_page+0x6c/0x310 f2fs_get_node_page+0x14/0x20 f2fs_iget+0x70/0xc60 f2fs_lookup+0xcc/0x218 __lookup_slow+0x78/0x160 lookup_slow+0x3c/0x60 walk_component+0x1e4/0x2e0 path_lookupat.isra.13+0x5c/0x1e0 filename_lookup.part.23+0x6c/0xe8 user_path_at_empty+0x4c/0x60 vfs_statx+0x78/0xd8 __se_sys_newfstatat+0x24/0x48 __arm64_sys_newfstatat+0x18/0x20 el0_svc_handler+0x9c/0x170 el0_svc+0x8/0xc Kernel panic - not syncing: hung_task: blocked tasks CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 Hardware name: OrangePi 3 (DT) Call trace: dump_backtrace+0x0/0xf8 show_stack+0x14/0x20 dump_stack+0xa8/0xcc panic+0x124/0x2dc proc_dohung_task_timeout_secs+0x0/0x40 kthread+0x120/0x128 ret_from_fork+0x10/0x18 SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002000 Memory Limit: none Rebooting in 3 seconds.. Meanwhile H5 based board now runs for 15 minutes without issues. So to sum up: - these crashes are definitely H6 IR related - the same kernel, on H5 works - covering the sensor prevents the crashes on H6 So we should probably hold on with the series, until this is figured out. I have tried searching for differences between H3 and H6 BSPs. And there are some: break; case IR_IRQ_FIFO_SIZE: - irq_reg = sunxi_smc_readl(IR_BASE+IR_RXINTE_REG); - irq_reg |= IR_FIFO_32; + irq_reg = readl(reg_base + IR_RXINTE_REG); + irq_reg |= IR_FIFO_20; break; } case IR_CLK_SAMPLE: -#ifdef FPGA_SIM_CONFIG - sample_reg |= 0x3<<0; /* Fsample = 24MHz/512 = 46875Hz (21.33us) */ -#else - sample_reg |= IR_SAMPLE_128; -#endif + sample_reg |= IR_SAMPLE_DEV; break; + case IR_BOTH_PULSE_MODE: + ctrl_reg = readl(reg_base + IR_CTRL_REG); + ctrl_reg |= IR_BOTH_PULSE; + break; + case IR_LOW_PULSE_MODE: + ctrl_reg = readl(reg_base + IR_CTRL_REG); + ctrl_reg |= IR_LOW_PULSE; + break; + case IR_HIGH_PULSE_MODE: + ctrl_reg = readl(reg_base + IR_CTRL_REG); + ctrl_reg |= IR_HIGH_PULSE; + break; 0x0000 CIR_CTL new bit 8 - CGPO General Program Output (GPO) Control in CIR mode for TX Pin 0: Low level 1: High level CIR_RXSTA 0x0030 RAC is just 13:8 instead of 14:8 I haven't looked deeper, because I have no use for IR on H6. But I hope this helps. I can help testing patches if you like. thank you and regards, o. > regards, > o. > > > Thanks, > > Clément > > > > > > > > Enabling/disabling ir node was enough to trigger/stop the RCU stalls on H6. > > > > > > regards, > > > o. > > > > > > > Regards, > > > > Clément > > > > > > > > [1] https://lore.kernel.org/patchwork/patch/1031390/#1221464 > > > > > > > > Changes since v1: > > > > - Document reset lines as required since A31 > > > > - Explain the memory mapping difference in commit log > > > > - Fix misspelling "Allwiner" to "Allwinner" > > > > > > > > Clément Péron (8): > > > > dt-bindings: media: sunxi-ir: add A31 compatible > > > > media: rc: sunxi: Add A31 compatible > > > > ARM: dts: sunxi: prefer A31 instead of A13 for ir > > > > dt-bindings: media: sunxi-ir: Add A64 compatible > > > > dt-bindings: media: sunxi-ir: Add H6 compatible > > > > arm64: dts: allwinner: h6: Add IR receiver node > > > > arm64: dts: allwinner: h6: Enable IR on H6 boards > > > > arm64: defconfig: enable IR SUNXI option > > > > > > > > Igors Makejevs (1): > > > > arm64: dts: allwinner: a64: Add IR node > > > > > > > > Jernej Skrabec (1): > > > > arm64: dts: allwinner: a64: Enable IR on Orange Pi Win > > > > > > > > .../devicetree/bindings/media/sunxi-ir.txt | 11 +++++++++-- > > > > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > > > > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > > > > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > > > > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > > > > .../dts/allwinner/sun50i-a64-orangepi-win.dts | 4 ++++ > > > > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 18 ++++++++++++++++++ > > > > .../dts/allwinner/sun50i-h6-beelink-gs1.dts | 4 ++++ > > > > .../dts/allwinner/sun50i-h6-orangepi.dtsi | 4 ++++ > > > > .../boot/dts/allwinner/sun50i-h6-pine-h64.dts | 4 ++++ > > > > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 19 +++++++++++++++++++ > > > > arch/arm64/configs/defconfig | 1 + > > > > drivers/media/rc/sunxi-cir.c | 1 + > > > > 13 files changed, 68 insertions(+), 6 deletions(-) > > > > > > > > -- > > > > 2.20.1 > > > > > > > > > > > > _______________________________________________ > > > > linux-arm-kernel mailing list > > > > linux-arm-kernel@lists.infradead.org > > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Ondrej, On Mon, 27 May 2019 at 19:23, Ondřej Jirman <megous@megous.com> wrote: > > Hi Clément, > > On Mon, May 27, 2019 at 06:31:17PM +0200, verejna wrote: > > Hi Clément, > > > > On Mon, May 27, 2019 at 04:59:35PM +0200, Clément Péron wrote: > > > Hi Ondřej, > > > > > > On Mon, 27 May 2019 at 15:48, Ondřej Jirman <megous@megous.com> wrote: > > > > > > > > Hi Clément, > > > > > > > > On Mon, May 27, 2019 at 12:25:26AM +0200, Clément Péron wrote: > > > > > Hi, > > > > > > > > > > A64 IR support series[1] pointed out that an A31 bindings should be > > > > > introduced. > > > > > > > > > > This series introduce the A31 compatible bindings, then switch it on > > > > > the already existing board. > > > > > > > > > > Finally introduce A64 and H6 support. > > > > > > > > Does H6 support actually work? I don't see any driver changes and last time > > > > I tried with the exact same bindings, I got RCU stalls shortly after boot. > > > > > > Actually, I have tested only on H6 on my Beelink GS1 with a "NEC" remote. > > > > > > I have manually toggle the protocols and do a simple cat in /dev/input/event0 > > > # echo nec > /sys/class/rc/rc0/protocols > > > # cat /dev/input/event0 | hexdump > > > 0000000 0093 0000 0000 0000 8bfb 0009 0000 0000 > > > 0000010 0004 0004 8028 0000 0093 0000 0000 0000 > > > 0000020 8bfb 0009 0000 0000 0000 0000 0000 0000 > > > 0000030 0093 0000 0000 0000 55be 000a 0000 0000 > > > 0000040 0004 0004 8028 0000 0093 0000 0000 0000 > > > 0000050 55be 000a 0000 0000 0000 0000 0000 0000 > > > 0000060 0093 0000 0000 0000 fa42 000d 0000 0000 > > > 0000070 0004 0004 8028 0000 0093 0000 0000 0000 > > > 0000080 fa42 000d 0000 0000 0000 0000 0000 0000 > > > 0000090 0093 0000 0000 0000 c41a 000e 0000 0000 > > > 00000a0 0004 0004 8028 0000 0093 0000 0000 0000 > > > 00000b0 c41a 000e 0000 0000 0000 0000 0000 0000 > > > > > > > > > Which kernel did you test with? Do you have any log? > > > > I tested with my kernel (https://megous.com/git/linux/log/?h=opi3-5.2). I also > > tried with 5.1 and the same kernel build on H5, to exclude some early 5.2-rc > > bugs and to see if this is H6 specific. > > > > I'll try testing again with your patches, and get you some logs. But last time > > they were not very informative. > > I'm testing on Orange Pi 3. > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > responding/serial console stops responding). I don't have RC controller to test > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > I tried booting multiple times. Other results: > > boot 2: > > - ssh hangs even before connecting (ethernet crashes/is reset) > > INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > rcu: blocking rcu_node structures: > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > rcu: blocking rcu_node structures: > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > above messages appear regularly. > > boot 3: > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > frequently). Storage locks up always (any program that was not run before > the crash can't be started and lock up the kernel hard, programs that > were executed prior, can be run again). > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > trigger the crash). So this seems to be limited to H6 for now. > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > lamp light). > > Without your patches, everything works fine on H6, and I never see > crashes/lockups. > > I tired physically covering the IR receiver, and that helps preventing the > crash. As soon as I uncover it, the crash happens again in 1s or so: > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > This time I got the hung task and reboot: (probably not directly related) > > INFO: task find:560 blocked for more than 120 seconds. > Not tainted 5.2.0-rc2+ #7 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > find D 0 560 551 0x00000000 > Call trace: > __switch_to+0x6c/0x90 > __schedule+0x1f4/0x578 > schedule+0x28/0xa8 > io_schedule+0x18/0x38 > __lock_page+0x12c/0x208 > pagecache_get_page+0x238/0x2e8 > __get_node_page+0x6c/0x310 > f2fs_get_node_page+0x14/0x20 > f2fs_iget+0x70/0xc60 > f2fs_lookup+0xcc/0x218 > __lookup_slow+0x78/0x160 > lookup_slow+0x3c/0x60 > walk_component+0x1e4/0x2e0 > path_lookupat.isra.13+0x5c/0x1e0 > filename_lookup.part.23+0x6c/0xe8 > user_path_at_empty+0x4c/0x60 > vfs_statx+0x78/0xd8 > __se_sys_newfstatat+0x24/0x48 > __arm64_sys_newfstatat+0x18/0x20 > el0_svc_handler+0x9c/0x170 > el0_svc+0x8/0xc > Kernel panic - not syncing: hung_task: blocked tasks > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > Hardware name: OrangePi 3 (DT) > Call trace: > dump_backtrace+0x0/0xf8 > show_stack+0x14/0x20 > dump_stack+0xa8/0xcc > panic+0x124/0x2dc > proc_dohung_task_timeout_secs+0x0/0x40 > kthread+0x120/0x128 > ret_from_fork+0x10/0x18 > SMP: stopping secondary CPUs > Kernel Offset: disabled > CPU features: 0x0002,20002000 > Memory Limit: none > Rebooting in 3 seconds.. > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > So to sum up: > > - these crashes are definitely H6 IR related > - the same kernel, on H5 works > - covering the sensor prevents the crashes on H6 > > So we should probably hold on with the series, until this is figured out. Thanks for testing, but I think it's more hardware related. It seems that your IR is flooded or misconfigured for your board. Could you add a simple print in the "sunxi_ir_irq" If it's confirmed, maybe tweak the threshold configuration or implement the new active_threshold will help. With my hardware Beelink GS1 and on Jernej's board (A64) there is no issue. I will disable all the other H6 boards until someone test it. Regards, Clément > > I have tried searching for differences between H3 and H6 BSPs. And there are some: > > break; > case IR_IRQ_FIFO_SIZE: > - irq_reg = sunxi_smc_readl(IR_BASE+IR_RXINTE_REG); > - irq_reg |= IR_FIFO_32; > + irq_reg = readl(reg_base + IR_RXINTE_REG); > + irq_reg |= IR_FIFO_20; > break; > } > > case IR_CLK_SAMPLE: > -#ifdef FPGA_SIM_CONFIG > - sample_reg |= 0x3<<0; /* Fsample = 24MHz/512 = 46875Hz (21.33us) */ > -#else > - sample_reg |= IR_SAMPLE_128; > -#endif > + sample_reg |= IR_SAMPLE_DEV; > break; > > + case IR_BOTH_PULSE_MODE: > + ctrl_reg = readl(reg_base + IR_CTRL_REG); > + ctrl_reg |= IR_BOTH_PULSE; > + break; > + case IR_LOW_PULSE_MODE: > + ctrl_reg = readl(reg_base + IR_CTRL_REG); > + ctrl_reg |= IR_LOW_PULSE; > + break; > + case IR_HIGH_PULSE_MODE: > + ctrl_reg = readl(reg_base + IR_CTRL_REG); > + ctrl_reg |= IR_HIGH_PULSE; > + break; > > > 0x0000 CIR_CTL > > new bit 8 - CGPO > General Program Output (GPO) Control in CIR mode for TX Pin > 0: Low level > 1: High level > > CIR_RXSTA 0x0030 > > RAC is just 13:8 instead of 14:8 > > > I haven't looked deeper, because I have no use for IR on H6. But I hope this > helps. I can help testing patches if you like. > > thank you and regards, > o. > > > regards, > > o. > > > > > Thanks, > > > Clément > > > > > > > > > > > Enabling/disabling ir node was enough to trigger/stop the RCU stalls on H6. > > > > > > > > regards, > > > > o. > > > > > > > > > Regards, > > > > > Clément > > > > > > > > > > [1] https://lore.kernel.org/patchwork/patch/1031390/#1221464 > > > > > > > > > > Changes since v1: > > > > > - Document reset lines as required since A31 > > > > > - Explain the memory mapping difference in commit log > > > > > - Fix misspelling "Allwiner" to "Allwinner" > > > > > > > > > > Clément Péron (8): > > > > > dt-bindings: media: sunxi-ir: add A31 compatible > > > > > media: rc: sunxi: Add A31 compatible > > > > > ARM: dts: sunxi: prefer A31 instead of A13 for ir > > > > > dt-bindings: media: sunxi-ir: Add A64 compatible > > > > > dt-bindings: media: sunxi-ir: Add H6 compatible > > > > > arm64: dts: allwinner: h6: Add IR receiver node > > > > > arm64: dts: allwinner: h6: Enable IR on H6 boards > > > > > arm64: defconfig: enable IR SUNXI option > > > > > > > > > > Igors Makejevs (1): > > > > > arm64: dts: allwinner: a64: Add IR node > > > > > > > > > > Jernej Skrabec (1): > > > > > arm64: dts: allwinner: a64: Enable IR on Orange Pi Win > > > > > > > > > > .../devicetree/bindings/media/sunxi-ir.txt | 11 +++++++++-- > > > > > arch/arm/boot/dts/sun6i-a31.dtsi | 2 +- > > > > > arch/arm/boot/dts/sun8i-a83t.dtsi | 2 +- > > > > > arch/arm/boot/dts/sun9i-a80.dtsi | 2 +- > > > > > arch/arm/boot/dts/sunxi-h3-h5.dtsi | 2 +- > > > > > .../dts/allwinner/sun50i-a64-orangepi-win.dts | 4 ++++ > > > > > arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 18 ++++++++++++++++++ > > > > > .../dts/allwinner/sun50i-h6-beelink-gs1.dts | 4 ++++ > > > > > .../dts/allwinner/sun50i-h6-orangepi.dtsi | 4 ++++ > > > > > .../boot/dts/allwinner/sun50i-h6-pine-h64.dts | 4 ++++ > > > > > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 19 +++++++++++++++++++ > > > > > arch/arm64/configs/defconfig | 1 + > > > > > drivers/media/rc/sunxi-cir.c | 1 + > > > > > 13 files changed, 68 insertions(+), 6 deletions(-) > > > > > > > > > > -- > > > > > 2.20.1 > > > > > > > > > > > > > > > _______________________________________________ > > > > > linux-arm-kernel mailing list > > > > > linux-arm-kernel@lists.infradead.org > > > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > > > _______________________________________________ > > > linux-arm-kernel mailing list > > > linux-arm-kernel@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Clément, On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > Hi Ondrej, > > > > > I'm testing on Orange Pi 3. > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > responding/serial console stops responding). I don't have RC controller to test > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > I tried booting multiple times. Other results: > > > > boot 2: > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > rcu: blocking rcu_node structures: > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > rcu: blocking rcu_node structures: > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > above messages appear regularly. > > > > boot 3: > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > frequently). Storage locks up always (any program that was not run before > > the crash can't be started and lock up the kernel hard, programs that > > were executed prior, can be run again). > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > trigger the crash). So this seems to be limited to H6 for now. > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > lamp light). > > > > Without your patches, everything works fine on H6, and I never see > > crashes/lockups. > > > > I tired physically covering the IR receiver, and that helps preventing the > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > This time I got the hung task and reboot: (probably not directly related) > > > > INFO: task find:560 blocked for more than 120 seconds. > > Not tainted 5.2.0-rc2+ #7 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > find D 0 560 551 0x00000000 > > Call trace: > > __switch_to+0x6c/0x90 > > __schedule+0x1f4/0x578 > > schedule+0x28/0xa8 > > io_schedule+0x18/0x38 > > __lock_page+0x12c/0x208 > > pagecache_get_page+0x238/0x2e8 > > __get_node_page+0x6c/0x310 > > f2fs_get_node_page+0x14/0x20 > > f2fs_iget+0x70/0xc60 > > f2fs_lookup+0xcc/0x218 > > __lookup_slow+0x78/0x160 > > lookup_slow+0x3c/0x60 > > walk_component+0x1e4/0x2e0 > > path_lookupat.isra.13+0x5c/0x1e0 > > filename_lookup.part.23+0x6c/0xe8 > > user_path_at_empty+0x4c/0x60 > > vfs_statx+0x78/0xd8 > > __se_sys_newfstatat+0x24/0x48 > > __arm64_sys_newfstatat+0x18/0x20 > > el0_svc_handler+0x9c/0x170 > > el0_svc+0x8/0xc > > Kernel panic - not syncing: hung_task: blocked tasks > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > Hardware name: OrangePi 3 (DT) > > Call trace: > > dump_backtrace+0x0/0xf8 > > show_stack+0x14/0x20 > > dump_stack+0xa8/0xcc > > panic+0x124/0x2dc > > proc_dohung_task_timeout_secs+0x0/0x40 > > kthread+0x120/0x128 > > ret_from_fork+0x10/0x18 > > SMP: stopping secondary CPUs > > Kernel Offset: disabled > > CPU features: 0x0002,20002000 > > Memory Limit: none > > Rebooting in 3 seconds.. > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > So to sum up: > > > > - these crashes are definitely H6 IR related > > - the same kernel, on H5 works > > - covering the sensor prevents the crashes on H6 > > > > So we should probably hold on with the series, until this is figured out. > > Thanks for testing, but I think it's more hardware related. > It seems that your IR is flooded or misconfigured for your board. > Could you add a simple print in the "sunxi_ir_irq" Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, but it persists even after I turn it off and cover the IR sensor). That's weird, because on H6 in CIR_RXSTA, bit 5 is undefined but corresponding bit in CIR_RXINT is DRQ_EN (RX FIFO DMA Enable) So I'm not sure what it could be flooded with and why IRQs keep being fired, even with no sensor input after the FIFO is read. regards, o. > If it's confirmed, maybe tweak the threshold configuration or > implement the new active_threshold will help. > > With my hardware Beelink GS1 and on Jernej's board (A64) there is no issue. > > I will disable all the other H6 boards until someone test it. > > Regards, > Clément
Hi Clément, On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote: > Hi Clément, > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > > Hi Ondrej, > > > > > > > > I'm testing on Orange Pi 3. > > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > > responding/serial console stops responding). I don't have RC controller to test > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > > > I tried booting multiple times. Other results: > > > > > > boot 2: > > > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > > rcu: blocking rcu_node structures: > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > > rcu: blocking rcu_node structures: > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > > > above messages appear regularly. > > > > > > boot 3: > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > > frequently). Storage locks up always (any program that was not run before > > > the crash can't be started and lock up the kernel hard, programs that > > > were executed prior, can be run again). > > > > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > > trigger the crash). So this seems to be limited to H6 for now. > > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > > lamp light). > > > > > > Without your patches, everything works fine on H6, and I never see > > > crashes/lockups. > > > > > > I tired physically covering the IR receiver, and that helps preventing the > > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > > > This time I got the hung task and reboot: (probably not directly related) > > > > > > INFO: task find:560 blocked for more than 120 seconds. > > > Not tainted 5.2.0-rc2+ #7 > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > find D 0 560 551 0x00000000 > > > Call trace: > > > __switch_to+0x6c/0x90 > > > __schedule+0x1f4/0x578 > > > schedule+0x28/0xa8 > > > io_schedule+0x18/0x38 > > > __lock_page+0x12c/0x208 > > > pagecache_get_page+0x238/0x2e8 > > > __get_node_page+0x6c/0x310 > > > f2fs_get_node_page+0x14/0x20 > > > f2fs_iget+0x70/0xc60 > > > f2fs_lookup+0xcc/0x218 > > > __lookup_slow+0x78/0x160 > > > lookup_slow+0x3c/0x60 > > > walk_component+0x1e4/0x2e0 > > > path_lookupat.isra.13+0x5c/0x1e0 > > > filename_lookup.part.23+0x6c/0xe8 > > > user_path_at_empty+0x4c/0x60 > > > vfs_statx+0x78/0xd8 > > > __se_sys_newfstatat+0x24/0x48 > > > __arm64_sys_newfstatat+0x18/0x20 > > > el0_svc_handler+0x9c/0x170 > > > el0_svc+0x8/0xc > > > Kernel panic - not syncing: hung_task: blocked tasks > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > > Hardware name: OrangePi 3 (DT) > > > Call trace: > > > dump_backtrace+0x0/0xf8 > > > show_stack+0x14/0x20 > > > dump_stack+0xa8/0xcc > > > panic+0x124/0x2dc > > > proc_dohung_task_timeout_secs+0x0/0x40 > > > kthread+0x120/0x128 > > > ret_from_fork+0x10/0x18 > > > SMP: stopping secondary CPUs > > > Kernel Offset: disabled > > > CPU features: 0x0002,20002000 > > > Memory Limit: none > > > Rebooting in 3 seconds.. > > > > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > > > So to sum up: > > > > > > - these crashes are definitely H6 IR related > > > - the same kernel, on H5 works > > > - covering the sensor prevents the crashes on H6 > > > > > > So we should probably hold on with the series, until this is figured out. > > > > Thanks for testing, but I think it's more hardware related. > > It seems that your IR is flooded or misconfigured for your board. > > Could you add a simple print in the "sunxi_ir_irq" > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, > but it persists even after I turn it off and cover the IR sensor). Interestingly, status also contains RAC, and it's 0 in this case. So the interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless of input. So there's something else up. regards, o. > That's weird, because on H6 in CIR_RXSTA, bit 5 is undefined but corresponding > bit in CIR_RXINT is DRQ_EN (RX FIFO DMA Enable) > > So I'm not sure what it could be flooded with and why IRQs keep being > fired, even with no sensor input after the FIFO is read. > > regards, > o. > > > If it's confirmed, maybe tweak the threshold configuration or > > implement the new active_threshold will help. > > > > With my hardware Beelink GS1 and on Jernej's board (A64) there is no issue. > > > > I will disable all the other H6 boards until someone test it. > > > > Regards, > > Clément > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Ondřej, On Mon, 27 May 2019 at 21:53, 'Ondřej Jirman' via linux-sunxi <linux-sunxi@googlegroups.com> wrote: > > Hi Clément, > > On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote: > > Hi Clément, > > > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > > > Hi Ondrej, > > > > > > > > > > > I'm testing on Orange Pi 3. > > > > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > > > responding/serial console stops responding). I don't have RC controller to test > > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > > > > > I tried booting multiple times. Other results: > > > > > > > > boot 2: > > > > > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > > > rcu: blocking rcu_node structures: > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > > > rcu: blocking rcu_node structures: > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > > > > > above messages appear regularly. > > > > > > > > boot 3: > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > > > frequently). Storage locks up always (any program that was not run before > > > > the crash can't be started and lock up the kernel hard, programs that > > > > were executed prior, can be run again). > > > > > > > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > > > trigger the crash). So this seems to be limited to H6 for now. > > > > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > > > lamp light). > > > > > > > > Without your patches, everything works fine on H6, and I never see > > > > crashes/lockups. > > > > > > > > I tired physically covering the IR receiver, and that helps preventing the > > > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > > > > > This time I got the hung task and reboot: (probably not directly related) > > > > > > > > INFO: task find:560 blocked for more than 120 seconds. > > > > Not tainted 5.2.0-rc2+ #7 > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > find D 0 560 551 0x00000000 > > > > Call trace: > > > > __switch_to+0x6c/0x90 > > > > __schedule+0x1f4/0x578 > > > > schedule+0x28/0xa8 > > > > io_schedule+0x18/0x38 > > > > __lock_page+0x12c/0x208 > > > > pagecache_get_page+0x238/0x2e8 > > > > __get_node_page+0x6c/0x310 > > > > f2fs_get_node_page+0x14/0x20 > > > > f2fs_iget+0x70/0xc60 > > > > f2fs_lookup+0xcc/0x218 > > > > __lookup_slow+0x78/0x160 > > > > lookup_slow+0x3c/0x60 > > > > walk_component+0x1e4/0x2e0 > > > > path_lookupat.isra.13+0x5c/0x1e0 > > > > filename_lookup.part.23+0x6c/0xe8 > > > > user_path_at_empty+0x4c/0x60 > > > > vfs_statx+0x78/0xd8 > > > > __se_sys_newfstatat+0x24/0x48 > > > > __arm64_sys_newfstatat+0x18/0x20 > > > > el0_svc_handler+0x9c/0x170 > > > > el0_svc+0x8/0xc > > > > Kernel panic - not syncing: hung_task: blocked tasks > > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > > > Hardware name: OrangePi 3 (DT) > > > > Call trace: > > > > dump_backtrace+0x0/0xf8 > > > > show_stack+0x14/0x20 > > > > dump_stack+0xa8/0xcc > > > > panic+0x124/0x2dc > > > > proc_dohung_task_timeout_secs+0x0/0x40 > > > > kthread+0x120/0x128 > > > > ret_from_fork+0x10/0x18 > > > > SMP: stopping secondary CPUs > > > > Kernel Offset: disabled > > > > CPU features: 0x0002,20002000 > > > > Memory Limit: none > > > > Rebooting in 3 seconds.. > > > > > > > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > > > > > So to sum up: > > > > > > > > - these crashes are definitely H6 IR related > > > > - the same kernel, on H5 works > > > > - covering the sensor prevents the crashes on H6 > > > > > > > > So we should probably hold on with the series, until this is figured out. > > > > > > Thanks for testing, but I think it's more hardware related. > > > It seems that your IR is flooded or misconfigured for your board. > > > Could you add a simple print in the "sunxi_ir_irq" > > > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, > > but it persists even after I turn it off and cover the IR sensor). > > Interestingly, status also contains RAC, and it's 0 in this case. So the > interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless > of input. > > So there's something else up. Really weird indeed... I have pushed a new version, where I didn't enabled the support for others H6 board and the cover letter include a link to this thread. It would be great if other sunxi users could test this series, to check if this issue in present in other OPi3 / Pine H64. Regards, Clément > > regards, > o. > > > That's weird, because on H6 in CIR_RXSTA, bit 5 is undefined but corresponding > > bit in CIR_RXINT is DRQ_EN (RX FIFO DMA Enable) > > > > So I'm not sure what it could be flooded with and why IRQs keep being > > fired, even with no sensor input after the FIFO is read. > > > > regards, > > o. > > > > > If it's confirmed, maybe tweak the threshold configuration or > > > implement the new active_threshold will help. > > > > > > With my hardware Beelink GS1 and on Jernej's board (A64) there is no issue. > > > > > > I will disable all the other H6 boards until someone test it. > > > > > > Regards, > > > Clément > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- > You received this message because you are subscribed to the Google Groups "linux-sunxi" group. > To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe@googlegroups.com. > To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-sunxi/20190527195330.pugb7ypvnyv32fug%40core.my.home. > For more options, visit https://groups.google.com/d/optout.
Hello Clément, On Tue, May 28, 2019 at 06:21:19PM +0200, Clément Péron wrote: > Hi Ondřej, > > On Mon, 27 May 2019 at 21:53, 'Ondřej Jirman' via linux-sunxi > <linux-sunxi@googlegroups.com> wrote: > > > > Hi Clément, > > > > On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote: > > > Hi Clément, > > > > > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > > > > Hi Ondrej, > > > > > > > > > > > > > > I'm testing on Orange Pi 3. > > > > > > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > > > > responding/serial console stops responding). I don't have RC controller to test > > > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > > > > > > > I tried booting multiple times. Other results: > > > > > > > > > > boot 2: > > > > > > > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > > > > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > > > > rcu: blocking rcu_node structures: > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > > > > rcu: blocking rcu_node structures: > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > > > > > > > above messages appear regularly. > > > > > > > > > > boot 3: > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > > > > > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > > > > frequently). Storage locks up always (any program that was not run before > > > > > the crash can't be started and lock up the kernel hard, programs that > > > > > were executed prior, can be run again). > > > > > > > > > > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > > > > trigger the crash). So this seems to be limited to H6 for now. > > > > > > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > > > > lamp light). > > > > > > > > > > Without your patches, everything works fine on H6, and I never see > > > > > crashes/lockups. > > > > > > > > > > I tired physically covering the IR receiver, and that helps preventing the > > > > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > > > > > > > This time I got the hung task and reboot: (probably not directly related) > > > > > > > > > > INFO: task find:560 blocked for more than 120 seconds. > > > > > Not tainted 5.2.0-rc2+ #7 > > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > find D 0 560 551 0x00000000 > > > > > Call trace: > > > > > __switch_to+0x6c/0x90 > > > > > __schedule+0x1f4/0x578 > > > > > schedule+0x28/0xa8 > > > > > io_schedule+0x18/0x38 > > > > > __lock_page+0x12c/0x208 > > > > > pagecache_get_page+0x238/0x2e8 > > > > > __get_node_page+0x6c/0x310 > > > > > f2fs_get_node_page+0x14/0x20 > > > > > f2fs_iget+0x70/0xc60 > > > > > f2fs_lookup+0xcc/0x218 > > > > > __lookup_slow+0x78/0x160 > > > > > lookup_slow+0x3c/0x60 > > > > > walk_component+0x1e4/0x2e0 > > > > > path_lookupat.isra.13+0x5c/0x1e0 > > > > > filename_lookup.part.23+0x6c/0xe8 > > > > > user_path_at_empty+0x4c/0x60 > > > > > vfs_statx+0x78/0xd8 > > > > > __se_sys_newfstatat+0x24/0x48 > > > > > __arm64_sys_newfstatat+0x18/0x20 > > > > > el0_svc_handler+0x9c/0x170 > > > > > el0_svc+0x8/0xc > > > > > Kernel panic - not syncing: hung_task: blocked tasks > > > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > > > > Hardware name: OrangePi 3 (DT) > > > > > Call trace: > > > > > dump_backtrace+0x0/0xf8 > > > > > show_stack+0x14/0x20 > > > > > dump_stack+0xa8/0xcc > > > > > panic+0x124/0x2dc > > > > > proc_dohung_task_timeout_secs+0x0/0x40 > > > > > kthread+0x120/0x128 > > > > > ret_from_fork+0x10/0x18 > > > > > SMP: stopping secondary CPUs > > > > > Kernel Offset: disabled > > > > > CPU features: 0x0002,20002000 > > > > > Memory Limit: none > > > > > Rebooting in 3 seconds.. > > > > > > > > > > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > > > > > > > So to sum up: > > > > > > > > > > - these crashes are definitely H6 IR related > > > > > - the same kernel, on H5 works > > > > > - covering the sensor prevents the crashes on H6 > > > > > > > > > > So we should probably hold on with the series, until this is figured out. > > > > > > > > Thanks for testing, but I think it's more hardware related. > > > > It seems that your IR is flooded or misconfigured for your board. > > > > Could you add a simple print in the "sunxi_ir_irq" > > > > > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, > > > but it persists even after I turn it off and cover the IR sensor). > > > > Interestingly, status also contains RAC, and it's 0 in this case. So the > > interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless > > of input. > > > > So there's something else up. > > Really weird indeed... > > I have pushed a new version, where I didn't enabled the support for > others H6 board and the cover letter include a link to this thread. > > It would be great if other sunxi users could test this series, to > check if this issue in present in other OPi3 / Pine H64. I don't know if this is enough. I'd rather prefer if the driver has a way of detecting this situation and shutting the module down, at the very least, instead of taking down the entire system with IRQ flood. It may be detectable by checking RAC == 0 when RX FIFO available interrupt flag is set. Otherwise, this will eventually be forgotten (cover letters are not even stored in git), and someone will fall into the trap again, after enabling r_ir on their board, and end up chasing their tail for a day. I've initially only found this is IR driver issue after a long unpleasant debugging session, chasing other more obvious ideas (as when this happens there's absolutely nothing in the log indicating this is IR issue). regards, o. > Regards, > Clément > > > > regards, > > o. > > > > > That's weird, because on H6 in CIR_RXSTA, bit 5 is undefined but corresponding > > > bit in CIR_RXINT is DRQ_EN (RX FIFO DMA Enable) > > > > > > So I'm not sure what it could be flooded with and why IRQs keep being > > > fired, even with no sensor input after the FIFO is read. > > > > > > regards, > > > o. > > > > > > > If it's confirmed, maybe tweak the threshold configuration or > > > > implement the new active_threshold will help. > > > > > > > > With my hardware Beelink GS1 and on Jernej's board (A64) there is no issue. > > > > > > > > I will disable all the other H6 boards until someone test it. > > > > > > > > Regards, > > > > Clément > > > > > > _______________________________________________ > > > linux-arm-kernel mailing list > > > linux-arm-kernel@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > -- > > You received this message because you are subscribed to the Google Groups "linux-sunxi" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscribe@googlegroups.com. > > To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-sunxi/20190527195330.pugb7ypvnyv32fug%40core.my.home. > > For more options, visit https://groups.google.com/d/optout. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Tue, May 28, 2019 at 08:04:47PM +0200, Ondřej Jirman wrote: > Hello Clément, > > On Tue, May 28, 2019 at 06:21:19PM +0200, Clément Péron wrote: > > Hi Ondřej, > > > > On Mon, 27 May 2019 at 21:53, 'Ondřej Jirman' via linux-sunxi > > <linux-sunxi@googlegroups.com> wrote: > > > > > > Hi Clément, > > > > > > On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote: > > > > Hi Clément, > > > > > > > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > > > > > Hi Ondrej, > > > > > > > > > > > > > > > > > I'm testing on Orange Pi 3. > > > > > > > > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > > > > > responding/serial console stops responding). I don't have RC controller to test > > > > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > > > > > > > > > I tried booting multiple times. Other results: > > > > > > > > > > > > boot 2: > > > > > > > > > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > > > > > > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > > > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > > > > > rcu: blocking rcu_node structures: > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > > > > > rcu: blocking rcu_node structures: > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > > > > > > > > > above messages appear regularly. > > > > > > > > > > > > boot 3: > > > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > > > > > > > > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > > > > > frequently). Storage locks up always (any program that was not run before > > > > > > the crash can't be started and lock up the kernel hard, programs that > > > > > > were executed prior, can be run again). > > > > > > > > > > > > > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > > > > > trigger the crash). So this seems to be limited to H6 for now. > > > > > > > > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > > > > > lamp light). > > > > > > > > > > > > Without your patches, everything works fine on H6, and I never see > > > > > > crashes/lockups. > > > > > > > > > > > > I tired physically covering the IR receiver, and that helps preventing the > > > > > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > > > > > > > > > This time I got the hung task and reboot: (probably not directly related) > > > > > > > > > > > > INFO: task find:560 blocked for more than 120 seconds. > > > > > > Not tainted 5.2.0-rc2+ #7 > > > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > > find D 0 560 551 0x00000000 > > > > > > Call trace: > > > > > > __switch_to+0x6c/0x90 > > > > > > __schedule+0x1f4/0x578 > > > > > > schedule+0x28/0xa8 > > > > > > io_schedule+0x18/0x38 > > > > > > __lock_page+0x12c/0x208 > > > > > > pagecache_get_page+0x238/0x2e8 > > > > > > __get_node_page+0x6c/0x310 > > > > > > f2fs_get_node_page+0x14/0x20 > > > > > > f2fs_iget+0x70/0xc60 > > > > > > f2fs_lookup+0xcc/0x218 > > > > > > __lookup_slow+0x78/0x160 > > > > > > lookup_slow+0x3c/0x60 > > > > > > walk_component+0x1e4/0x2e0 > > > > > > path_lookupat.isra.13+0x5c/0x1e0 > > > > > > filename_lookup.part.23+0x6c/0xe8 > > > > > > user_path_at_empty+0x4c/0x60 > > > > > > vfs_statx+0x78/0xd8 > > > > > > __se_sys_newfstatat+0x24/0x48 > > > > > > __arm64_sys_newfstatat+0x18/0x20 > > > > > > el0_svc_handler+0x9c/0x170 > > > > > > el0_svc+0x8/0xc > > > > > > Kernel panic - not syncing: hung_task: blocked tasks > > > > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > > > > > Hardware name: OrangePi 3 (DT) > > > > > > Call trace: > > > > > > dump_backtrace+0x0/0xf8 > > > > > > show_stack+0x14/0x20 > > > > > > dump_stack+0xa8/0xcc > > > > > > panic+0x124/0x2dc > > > > > > proc_dohung_task_timeout_secs+0x0/0x40 > > > > > > kthread+0x120/0x128 > > > > > > ret_from_fork+0x10/0x18 > > > > > > SMP: stopping secondary CPUs > > > > > > Kernel Offset: disabled > > > > > > CPU features: 0x0002,20002000 > > > > > > Memory Limit: none > > > > > > Rebooting in 3 seconds.. > > > > > > > > > > > > > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > > > > > > > > > So to sum up: > > > > > > > > > > > > - these crashes are definitely H6 IR related > > > > > > - the same kernel, on H5 works > > > > > > - covering the sensor prevents the crashes on H6 > > > > > > > > > > > > So we should probably hold on with the series, until this is figured out. > > > > > > > > > > Thanks for testing, but I think it's more hardware related. > > > > > It seems that your IR is flooded or misconfigured for your board. > > > > > Could you add a simple print in the "sunxi_ir_irq" > > > > > > > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, > > > > but it persists even after I turn it off and cover the IR sensor). > > > > > > Interestingly, status also contains RAC, and it's 0 in this case. So the > > > interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless > > > of input. > > > > > > So there's something else up. > > > > Really weird indeed... > > > > I have pushed a new version, where I didn't enabled the support for > > others H6 board and the cover letter include a link to this thread. > > > > It would be great if other sunxi users could test this series, to > > check if this issue in present in other OPi3 / Pine H64. > > I don't know if this is enough. I'd rather prefer if the driver has a way > of detecting this situation and shutting the module down, at the very least, > instead of taking down the entire system with IRQ flood. > > It may be detectable by checking RAC == 0 when RX FIFO available interrupt > flag is set. > > Otherwise, this will eventually be forgotten (cover letters are not even stored > in git), and someone will fall into the trap again, after enabling r_ir on > their board, and end up chasing their tail for a day. I've initially only found > this is IR driver issue after a long unpleasant debugging session, chasing other > more obvious ideas (as when this happens there's absolutely nothing in the log > indicating this is IR issue). Returning IRQ_NONE in the handler will disable the interrupt line after 100,000 (I think?) occurences. That might be a good workaround, but we definitely want to have a comment there :) Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Hi, On Wed, 29 May 2019 at 09:19, Maxime Ripard <maxime.ripard@bootlin.com> wrote: > > On Tue, May 28, 2019 at 08:04:47PM +0200, Ondřej Jirman wrote: > > Hello Clément, > > > > On Tue, May 28, 2019 at 06:21:19PM +0200, Clément Péron wrote: > > > Hi Ondřej, > > > > > > On Mon, 27 May 2019 at 21:53, 'Ondřej Jirman' via linux-sunxi > > > <linux-sunxi@googlegroups.com> wrote: > > > > > > > > Hi Clément, > > > > > > > > On Mon, May 27, 2019 at 09:30:16PM +0200, verejna wrote: > > > > > Hi Clément, > > > > > > > > > > On Mon, May 27, 2019 at 08:49:59PM +0200, Clément Péron wrote: > > > > > > Hi Ondrej, > > > > > > > > > > > > > > > > > > > > I'm testing on Orange Pi 3. > > > > > > > > > > > > > > With your patches, I get kernel lockup after ~1 minute of use (ssh stops > > > > > > > responding/serial console stops responding). I don't have RC controller to test > > > > > > > the CIR. But just enabling the CIR causes kernel to hang shortly after boot. > > > > > > > > > > > > > > I tried booting multiple times. Other results: > > > > > > > > > > > > > > boot 2: > > > > > > > > > > > > > > - ssh hangs even before connecting (ethernet crashes/is reset) > > > > > > > > > > > > > > INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=2437 > > > > > > > dwmac-sun8i 5020000.ethernet eth0: Reset adapter. > > > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 5696 jiffies s: 81 root: 0x1/. > > > > > > > rcu: blocking rcu_node structures: > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=9714 > > > > > > > rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 21568 jiffies s: 81 root: 0x1/. > > > > > > > rcu: blocking rcu_node structures: > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (1 GPs behind) idle=64a/0/0x3 softirq=4091/4091 fqs=17203 > > > > > > > > > > > > > > above messages appear regularly. > > > > > > > > > > > > > > boot 3: > > > > > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (9 GPs behind) idle=992/0/0x3 softirq=6123/6123 fqs=2600 > > > > > > > > > > > > > > > > > > > > > Sometimes serial console keeps working. Sometimes it locks up too (but not > > > > > > > frequently). Storage locks up always (any program that was not run before > > > > > > > the crash can't be started and lock up the kernel hard, programs that > > > > > > > were executed prior, can be run again). > > > > > > > > > > > > > > > > > > > > > Exactly the same kernel build on H5 seems to work (or at least I was not able to > > > > > > > trigger the crash). So this seems to be limited to H6 for now. > > > > > > > > > > > > > > I suspect that the crash occurs sooner if I vary the light (turn on/off the table > > > > > > > lamp light). > > > > > > > > > > > > > > Without your patches, everything works fine on H6, and I never see > > > > > > > crashes/lockups. > > > > > > > > > > > > > > I tired physically covering the IR receiver, and that helps preventing the > > > > > > > crash. As soon as I uncover it, the crash happens again in 1s or so: > > > > > > > > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=2444 > > > > > > > rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > > > > rcu: 0-....: (1 GPs behind) idle=4ea/0/0x3 softirq=4483/4484 fqs=9777 > > > > > > > > > > > > > > This time I got the hung task and reboot: (probably not directly related) > > > > > > > > > > > > > > INFO: task find:560 blocked for more than 120 seconds. > > > > > > > Not tainted 5.2.0-rc2+ #7 > > > > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > > > > find D 0 560 551 0x00000000 > > > > > > > Call trace: > > > > > > > __switch_to+0x6c/0x90 > > > > > > > __schedule+0x1f4/0x578 > > > > > > > schedule+0x28/0xa8 > > > > > > > io_schedule+0x18/0x38 > > > > > > > __lock_page+0x12c/0x208 > > > > > > > pagecache_get_page+0x238/0x2e8 > > > > > > > __get_node_page+0x6c/0x310 > > > > > > > f2fs_get_node_page+0x14/0x20 > > > > > > > f2fs_iget+0x70/0xc60 > > > > > > > f2fs_lookup+0xcc/0x218 > > > > > > > __lookup_slow+0x78/0x160 > > > > > > > lookup_slow+0x3c/0x60 > > > > > > > walk_component+0x1e4/0x2e0 > > > > > > > path_lookupat.isra.13+0x5c/0x1e0 > > > > > > > filename_lookup.part.23+0x6c/0xe8 > > > > > > > user_path_at_empty+0x4c/0x60 > > > > > > > vfs_statx+0x78/0xd8 > > > > > > > __se_sys_newfstatat+0x24/0x48 > > > > > > > __arm64_sys_newfstatat+0x18/0x20 > > > > > > > el0_svc_handler+0x9c/0x170 > > > > > > > el0_svc+0x8/0xc > > > > > > > Kernel panic - not syncing: hung_task: blocked tasks > > > > > > > CPU: 1 PID: 34 Comm: khungtaskd Not tainted 5.2.0-rc2+ #7 > > > > > > > Hardware name: OrangePi 3 (DT) > > > > > > > Call trace: > > > > > > > dump_backtrace+0x0/0xf8 > > > > > > > show_stack+0x14/0x20 > > > > > > > dump_stack+0xa8/0xcc > > > > > > > panic+0x124/0x2dc > > > > > > > proc_dohung_task_timeout_secs+0x0/0x40 > > > > > > > kthread+0x120/0x128 > > > > > > > ret_from_fork+0x10/0x18 > > > > > > > SMP: stopping secondary CPUs > > > > > > > Kernel Offset: disabled > > > > > > > CPU features: 0x0002,20002000 > > > > > > > Memory Limit: none > > > > > > > Rebooting in 3 seconds.. > > > > > > > > > > > > > > > > > > > > > Meanwhile H5 based board now runs for 15 minutes without issues. > > > > > > > > > > > > > > So to sum up: > > > > > > > > > > > > > > - these crashes are definitely H6 IR related > > > > > > > - the same kernel, on H5 works > > > > > > > - covering the sensor prevents the crashes on H6 > > > > > > > > > > > > > > So we should probably hold on with the series, until this is figured out. > > > > > > > > > > > > Thanks for testing, but I think it's more hardware related. > > > > > > It seems that your IR is flooded or misconfigured for your board. > > > > > > Could you add a simple print in the "sunxi_ir_irq" > > > > > > > > > > Yes, I get flood of IRQs with status = 0x30. (after I turn on the lamp, > > > > > but it persists even after I turn it off and cover the IR sensor). > > > > > > > > Interestingly, status also contains RAC, and it's 0 in this case. So the > > > > interrupt if firing with "No available data in RX FIFO" repeatedly. Regardless > > > > of input. > > > > > > > > So there's something else up. > > > > > > Really weird indeed... > > > > > > I have pushed a new version, where I didn't enabled the support for > > > others H6 board and the cover letter include a link to this thread. > > > > > > It would be great if other sunxi users could test this series, to > > > check if this issue in present in other OPi3 / Pine H64. > > > > I don't know if this is enough. I'd rather prefer if the driver has a way > > of detecting this situation and shutting the module down, at the very least, > > instead of taking down the entire system with IRQ flood. > > > > It may be detectable by checking RAC == 0 when RX FIFO available interrupt > > flag is set. > > > > Otherwise, this will eventually be forgotten (cover letters are not even stored > > in git), and someone will fall into the trap again, after enabling r_ir on > > their board, and end up chasing their tail for a day. I've initially only found > > this is IR driver issue after a long unpleasant debugging session, chasing other > > more obvious ideas (as when this happens there's absolutely nothing in the log > > indicating this is IR issue). > > Returning IRQ_NONE in the handler will disable the interrupt line > after 100,000 (I think?) occurences. That might be a good workaround, > but we definitely want to have a comment there :) > Thanks for the suggestion, I will propose a patch to return IRQ_NONE if Fifo is empty when RA is setted. Just a comment in the IRQ handling we are actually looking at the RXSTA register and using the RXINT bit ? Is there any reason for doing that ? Thanks, Clément > Maxime > > -- > Maxime Ripard, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com