Message ID | 20241105-rk3399-dram-init-v1-0-1e29acdf0966@cherry.de |
---|---|
Headers | show |
Series | rockchip: panic() when DRAM init fails | expand |
Hi Quentin, On 2024/11/6 01:21, Quentin Schulz wrote: > I am the unfortunate owner of an RK3399 Puma that is failing DRAM init > every now and then with the following cryptic error message: > """ > pctl_start: Failed to init pctl channel 0 > """ > > When it fails, the current logic is implemented in such a way that the > board enters an infinite while loop. This is not ideal for embedded > systems that are not easily accessible and also an issue when having > devices in a CI lab for example. It's fine for this patch to make the choice in panic to hang or to reset the system. But I don't understand how this help you, does it may success to init the dram after the reset in your case? I think in most case the board will keep reset because this is more likely hardware issue and not able to became success after reset. Thanks, - Kever > > Therefore, let's simply panic if the DRAM fails to init. This was tested > only on RK3399 Puma, similar changes may be required for other SoCs to > properly propagate the errors. > > While this is all but a work-around instead of fixing the DRAM init > sequence, I believe it is important for resilience of devices in the > field. > > Marking this as RFC because: > 1) panic()s for all Rockchip SoCs but only tested on RK3399 > 2) unsure about error return values (ENODEV and the likes), not sure > what would be better than that though :/ > > Note that one can still hang() if they want by setting PANIC_HANG > symbol. > > Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> > --- > Quentin Schulz (5): > ram: rk3399: allow to fail DRAM init if pctl_start fails > ram: rk3399: fail probe if DRAM init failed > rockchip: spl/tpl: panic when DRAM init failed > ram: rk3399: merge two consecutive ifs with same condition > ram: rk3399: fail DRAM init when pctl channel init fails instead of hanging > > arch/arm/mach-rockchip/spl.c | 7 +++---- > arch/arm/mach-rockchip/tpl.c | 6 ++---- > drivers/ram/rockchip/sdram_rk3399.c | 24 +++++++++++------------- > 3 files changed, 16 insertions(+), 21 deletions(-) > --- > base-commit: 56accc56b9aab87ef4809ccc588e1257969cd271 > change-id: 20241105-rk3399-dram-init-207325c2d9ca > > Best regards,
Hi Kever, On 11/7/24 4:35 AM, Kever Yang wrote: > Hi Quentin, > > On 2024/11/6 01:21, Quentin Schulz wrote: >> I am the unfortunate owner of an RK3399 Puma that is failing DRAM init >> every now and then with the following cryptic error message: >> """ >> pctl_start: Failed to init pctl channel 0 >> """ >> >> When it fails, the current logic is implemented in such a way that the >> board enters an infinite while loop. This is not ideal for embedded >> systems that are not easily accessible and also an issue when having >> devices in a CI lab for example. > > It's fine for this patch to make the choice in panic to hang or to reset > the system. > > But I don't understand how this help you, does it may success to init > the dram after > > the reset in your case? I think in most case the board will keep reset > because this is > > more likely hardware issue and not able to became success after reset. > Exactly. After a reset (manual by pressing the reset button, or by panic()) the board works again. I don't have exact numbers for the probability for this to happen, but I had a board doing boot-into-Linux tests in a loop and it failed multiple times in the same day. Gut feeling is around ~10-20 times for ~2200 reboots. I am NOT saying this is not related to a HW issue though, but I've been using that RK3399 Puma for a while, and sometimes I have this issue. The rest of the time it seems to "work fine" (haven't run memtest or extensive testing on that unit though). I am NOT sure BUT I think applying https://github.com/rockchip-linux/u-boot/commit/e1652d3918b182e107addd5e6f451655f239efbc on upstream U-Boot helped lower the probability (it still happened though!). Could be totally unrelated though and just flukes. Cheers, Quentin
Hi Kever, On 11/5/24 6:21 PM, Quentin Schulz wrote: > I am the unfortunate owner of an RK3399 Puma that is failing DRAM init > every now and then with the following cryptic error message: > """ > pctl_start: Failed to init pctl channel 0 > """ > > When it fails, the current logic is implemented in such a way that the > board enters an infinite while loop. This is not ideal for embedded > systems that are not easily accessible and also an issue when having > devices in a CI lab for example. > > Therefore, let's simply panic if the DRAM fails to init. This was tested > only on RK3399 Puma, similar changes may be required for other SoCs to > properly propagate the errors. > > While this is all but a work-around instead of fixing the DRAM init > sequence, I believe it is important for resilience of devices in the > field. > > Marking this as RFC because: > 1) panic()s for all Rockchip SoCs but only tested on RK3399 > 2) unsure about error return values (ENODEV and the likes), not sure > what would be better than that though :/ > > Note that one can still hang() if they want by setting PANIC_HANG > symbol. > > Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> What should we do with those patches? Additionally, is there something I could do to help fix the DDR init on RK3399? Increasing the timeout didn't help sadly :/ Cheers, Quentin
I am the unfortunate owner of an RK3399 Puma that is failing DRAM init every now and then with the following cryptic error message: """ pctl_start: Failed to init pctl channel 0 """ When it fails, the current logic is implemented in such a way that the board enters an infinite while loop. This is not ideal for embedded systems that are not easily accessible and also an issue when having devices in a CI lab for example. Therefore, let's simply panic if the DRAM fails to init. This was tested only on RK3399 Puma, similar changes may be required for other SoCs to properly propagate the errors. While this is all but a work-around instead of fixing the DRAM init sequence, I believe it is important for resilience of devices in the field. Marking this as RFC because: 1) panic()s for all Rockchip SoCs but only tested on RK3399 2) unsure about error return values (ENODEV and the likes), not sure what would be better than that though :/ Note that one can still hang() if they want by setting PANIC_HANG symbol. Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de> --- Quentin Schulz (5): ram: rk3399: allow to fail DRAM init if pctl_start fails ram: rk3399: fail probe if DRAM init failed rockchip: spl/tpl: panic when DRAM init failed ram: rk3399: merge two consecutive ifs with same condition ram: rk3399: fail DRAM init when pctl channel init fails instead of hanging arch/arm/mach-rockchip/spl.c | 7 +++---- arch/arm/mach-rockchip/tpl.c | 6 ++---- drivers/ram/rockchip/sdram_rk3399.c | 24 +++++++++++------------- 3 files changed, 16 insertions(+), 21 deletions(-) --- base-commit: 56accc56b9aab87ef4809ccc588e1257969cd271 change-id: 20241105-rk3399-dram-init-207325c2d9ca Best regards,