Message ID | ZH39SVA1vbZR3+Gk@xdrudis.tinet.cat |
---|---|
State | Superseded |
Delegated to: | Marek Vasut |
Headers | show |
Series | cmd: usb: Prevent reset in usb tree/info command | expand |
On 6/5/23 17:20, Xavier Drudis Ferran wrote: > Add a check to avoid dommed (by null pointer dereference) recursive > call, not only for UCLASS_BLK. > > When booting my Rock Pi 4B+ with a USB mass storage stick plugged > into one of the USB 2 ports (EHCI), when it is plugged before power > on, when issuing a > > usb tree > > or > > usb info I cannot reproduce the problem. Do you perform any other interaction with the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the aforementioned commands ? What kind of USB stick is used here, please share model, VID, PID. > command I get a "Synchronous Error" and a reset just after printing the > mass storage device in the usb tree or usb info. It might depend on the > contents of the USB stick too, I'm not sure. > > It seems like I have two devices as children of the mass storage > device. When there's only a UCLASS_BLK it works fine, but when there's > a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as > first parameter and fails. > > Likewise for usb_show_info(). > > Not sure if this should be a patch, an RFC or a bug report. There may > be a better way to solve this, I haven't researched commit > 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow > properly, or thought about cases where udev is not null but the > recursive call might need preventing too. Feel free to think it over > before merging (or after). It is unclear what the real issue is, so until that is sorted out, no merging will occur, sorry. > But this at least fixes a reset at an > innocent looking usb tree or usb info command. Maybe we can improve it > later? NAK, please let's not add ad-hoc poorly understood changes into core code.
El Thu, Jun 08, 2023 at 12:05:18AM +0200, Marek Vasut deia: > On 6/5/23 17:20, Xavier Drudis Ferran wrote: > > Add a check to avoid dommed (by null pointer dereference) recursive > > call, not only for UCLASS_BLK. > > > > When booting my Rock Pi 4B+ with a USB mass storage stick plugged > > into one of the USB 2 ports (EHCI), when it is plugged before power > > on, when issuing a > > > > usb tree > > > > or > > > > usb info > > I cannot reproduce the problem. Do you perform any other interaction with > the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the > aforementioned commands ? > No. Well, in some tests yes and some no, but I got the error in all cases. Btw, I was testing on the next branch. I had those two commits on top https://patchwork.ozlabs.org/project/uboot/patch/202013db5a47ecbac4a53c360ed1ca91ca663996.1685974993.git.xdrudis@tinet.cat/ https://patchwork.ozlabs.org/project/uboot/patch/464111fca83008503022e8ada5305e69ffd1afbd.1685974993.git.xdrudis@tinet.cat/ And minor configuration changes (but I had bootstage active, might or might not be related) U-Boot was in a microSD card. > What kind of USB stick is used here, please share model, VID, PID. > It may only happen with ext4 partitions or something... Or it might have to do with other media not being bootable (they used to be for some old and customized version of U-boot, but not the current one) This is the lsusb -v output of the same USB stick in another computer Bus 007 Device 008: ID 0718:070a Imation Corp. TF10 Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 2.00 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 idVendor 0x0718 Imation Corp. idProduct 0x070a bcdDevice 1.00 iManufacturer 1 TDK LoR iProduct 2 TF10 iSerial 3 07032B6B1D0ACB96 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x0020 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 200mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0200 1x 512 bytes bInterval 0 Device Qualifier (for other device speed): bLength 10 bDescriptorType 6 bcdUSB 2.00 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 64 bNumConfigurations 1 can't get debug descriptor: Resource temporarily unavailable Device Status: 0x0000 (Bus Powered) Partitions: Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: TDK LoR TF10 (scsi) Disk /dev/sda: 32,0GB Sector size (logical/physical): 512B/512B Partition Table: loop Disk Flags: Number Start End Size File system Flags 1 0,00B 32,0GB 32,0GB ext4 The content in the ext4 partition is just data, 3 files, lost+found and another directory Let me get to my logs... (I added a printf here uclass_id=22 is UCLASS_BLK uclass_id=25 is UCLASS_BOOTDEV ) U-Boot TPL 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52) lpddr4_set_rate: change freq to 400MHz 0, 1 Channel 0: LPDDR4, 400MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB Channel 1: LPDDR4, 400MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB 256B stride lpddr4_set_rate: change freq to 800MHz 1, 0 Trying to boot from BOOTROM Returning to boot ROM... U-Boot SPL 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52 +0200) Trying to boot from MMC1 NOTICE: BL31: v2.1(release):v2.1-728-ged01e0c4-dirty NOTICE: BL31: Built : 18:29:11, Mar 22 2022 U-Boot 2023.07-rc2-00089-gab17b3d648-dirty (Jun 05 2023 - 10:42:52 +0200) SoC: Rockchip rk3399 Reset cause: POR Model: Radxa ROCK Pi 4B DRAM: 4 GiB (effective 3.9 GiB) PMIC: RK808 Core: 284 devices, 29 uclasses, devicetree: separate MMC: mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Radxa ROCK Pi 4B Net: eth0: ethernet@fe300000 Hit any key to stop autoboot: 2 1 0 rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! Bus usb@fe380000: USB EHCI 1.00 Bus usb@fe3c0000: USB EHCI 1.00 Bus usb@fe800000: Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 Bus usb@fe900000: Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe380000 for devices... 1 USB Device(s) found scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found scanning bus usb@fe800000 for devices... 1 USB Device(s) found scanning bus usb@fe900000 for devices... 1 USB Device(s) found rockchip_pcie pcie@f8000000: failed to find ep-gpios property ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! Could not initialize PHY ethernet@fe300000 rockchip_pcie pcie@f8000000: failed to find ep-gpios property ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! Could not initialize PHY ethernet@fe300000 => usb tree USB device tree: 1 Hub (480 Mb/s, 0mA) u-boot EHCI Host Controller 1 Hub (480 Mb/s, 0mA) | u-boot EHCI Host Controller | uclass_id=64 |+-2 Mass Storage (480 Mb/s, 200mA) TDK LoR TF10 07032B6B1D0ACB96 uclass_id=22 uclass_id=25 "Synchronous Abort" handler, esr 0x96000010, far 0x948 elr: 00000000002157d4 lr : 00000000002157d4 (reloc) elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4 x0 : 0000000000000005 x1 : 0000000000000000 x2 : 0000000000000020 x3 : 00000000ff1a0000 x4 : 00000000ff1a0000 x5 : 0000000000000000 x6 : 00000000f1f1c000 x7 : 0000000000000001 x8 : 0000000000000001 x9 : 00000000ffffffd0 x10: 0000000000000006 x11: 000000000001869f x12: 0000000000000200 x13: 0000000000000000 x14: 00000000ffffffff x15: 00000000f1f1bbc9 x16: 000000007e4f2113 x17: 000000009a11f13e x18: 00000000f1f31d80 x19: 0000000000000000 x20: 00000000f1f1c110 x21: 0000000000000004 x22: 00000000f1f1c114 x23: 00000000f1f65398 x24: 00000000f1f653b8 x25: 0000000000000000 x26: 0000000000000000 x27: 0000000000000000 x28: 0000000000000000 x29: 00000000f1f1c000 Code: aa0003f5 900003c0 9123b800 940191f5 (f944a660) Resetting CPU ... resetting ... > > command I get a "Synchronous Error" and a reset just after printing the > > mass storage device in the usb tree or usb info. It might depend on the > > contents of the USB stick too, I'm not sure. > > > > It seems like I have two devices as children of the mass storage > > device. When there's only a UCLASS_BLK it works fine, but when there's > > a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as > > first parameter and fails. > > > > Likewise for usb_show_info(). > > > > Not sure if this should be a patch, an RFC or a bug report. There may > > be a better way to solve this, I haven't researched commit > > 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow > > properly, or thought about cases where udev is not null but the > > recursive call might need preventing too. Feel free to think it over > > before merging (or after). > > It is unclear what the real issue is, so until that is sorted out, no > merging will occur, sorry. > That's ok. I'm guessing the issue may be some mismatched assumption on what is under USB devices between the bootdev code and the cmd/usb.c code. > > But this at least fixes a reset at an > > innocent looking usb tree or usb info command. Maybe we can improve it > > later? > > NAK, please let's not add ad-hoc poorly understood changes into core code. Ok, sorry. I'll reply back if/when I get a clearer theory. Thanks.
On 6/8/23 09:39, Xavier Drudis Ferran wrote: > El Thu, Jun 08, 2023 at 12:05:18AM +0200, Marek Vasut deia: >> On 6/5/23 17:20, Xavier Drudis Ferran wrote: >>> Add a check to avoid dommed (by null pointer dereference) recursive >>> call, not only for UCLASS_BLK. >>> >>> When booting my Rock Pi 4B+ with a USB mass storage stick plugged >>> into one of the USB 2 ports (EHCI), when it is plugged before power >>> on, when issuing a >>> >>> usb tree >>> >>> or >>> >>> usb info >> >> I cannot reproduce the problem. Do you perform any other interaction with >> the USB stack, like e.g. 'usb start' or 'usb reset' before issuing the >> aforementioned commands ? >> > > No. Well, in some tests yes and some no, but I got the error in all cases. This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before you would get any meaningful result out of 'usb info'. Without either, you would get 'USB is stopped.' message. Could it be there are some extra scripts in your environment that manipulate the USB ? > Btw, I was testing on the next branch. I had those two commits on top > https://patchwork.ozlabs.org/project/uboot/patch/202013db5a47ecbac4a53c360ed1ca91ca663996.1685974993.git.xdrudis@tinet.cat/ > https://patchwork.ozlabs.org/project/uboot/patch/464111fca83008503022e8ada5305e69ffd1afbd.1685974993.git.xdrudis@tinet.cat/ > > And minor configuration changes (but I had bootstage active, might or might not be related) > > U-Boot was in a microSD card. Can you test with stock U-Boot ? Can you test with another USB stick, i.e. is the issue specific to this USB stick ? Is the issue specific to this partition layout of this USB stick, i.e. if you clone (dd if=... of=...) the content of the USB stick to another USB stick, does the error still occur. [...] > Model: Radxa ROCK Pi 4B > Net: eth0: ethernet@fe300000 > Hit any key to stop autoboot: 2 1 0 > rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! > Bus usb@fe380000: USB EHCI 1.00 > Bus usb@fe3c0000: USB EHCI 1.00 > Bus usb@fe800000: Register 2000140 NbrPorts 2 > Starting the controller > USB XHCI 1.10 > Bus usb@fe900000: Register 2000140 NbrPorts 2 > Starting the controller > USB XHCI 1.10 > scanning bus usb@fe380000 for devices... 1 USB Device(s) found > scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found > scanning bus usb@fe800000 for devices... 1 USB Device(s) found > scanning bus usb@fe900000 for devices... 1 USB Device(s) found > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > Could not initialize PHY ethernet@fe300000 > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > Could not initialize PHY ethernet@fe300000 Is this some $preboot script which initializes your hardware ? => printenv preboot > => usb tree > USB device tree: > 1 Hub (480 Mb/s, 0mA) > u-boot EHCI Host Controller > > 1 Hub (480 Mb/s, 0mA) > | u-boot EHCI Host Controller > | > uclass_id=64 > |+-2 Mass Storage (480 Mb/s, 200mA) > TDK LoR TF10 07032B6B1D0ACB96 > > uclass_id=22 > uclass_id=25 > "Synchronous Abort" handler, esr 0x96000010, far 0x948 > elr: 00000000002157d4 lr : 00000000002157d4 (reloc) > elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4 Take the u-boot (unstripped elf) which matches this binary, and run aarch64-...objdump -lSD on it, then search for the $lr value, see doc/develop/crash_dumps.rst for details. That should tell you where exactly the crash occurred. Where did it occur ? > x0 : 0000000000000005 x1 : 0000000000000000 > x2 : 0000000000000020 x3 : 00000000ff1a0000 > x4 : 00000000ff1a0000 x5 : 0000000000000000 > x6 : 00000000f1f1c000 x7 : 0000000000000001 > x8 : 0000000000000001 x9 : 00000000ffffffd0 > x10: 0000000000000006 x11: 000000000001869f > x12: 0000000000000200 x13: 0000000000000000 > x14: 00000000ffffffff x15: 00000000f1f1bbc9 > x16: 000000007e4f2113 x17: 000000009a11f13e > x18: 00000000f1f31d80 x19: 0000000000000000 > x20: 00000000f1f1c110 x21: 0000000000000004 > x22: 00000000f1f1c114 x23: 00000000f1f65398 > x24: 00000000f1f653b8 x25: 0000000000000000 > x26: 0000000000000000 x27: 0000000000000000 > x28: 0000000000000000 x29: 00000000f1f1c000 > > Code: aa0003f5 900003c0 9123b800 940191f5 (f944a660) > Resetting CPU ... > > resetting ... > > >>> command I get a "Synchronous Error" and a reset just after printing the >>> mass storage device in the usb tree or usb info. It might depend on the >>> contents of the USB stick too, I'm not sure. >>> >>> It seems like I have two devices as children of the mass storage >>> device. When there's only a UCLASS_BLK it works fine, but when there's >>> a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as >>> first parameter and fails. >>> >>> Likewise for usb_show_info(). >>> >>> Not sure if this should be a patch, an RFC or a bug report. There may >>> be a better way to solve this, I haven't researched commit >>> 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow >>> properly, or thought about cases where udev is not null but the >>> recursive call might need preventing too. Feel free to think it over >>> before merging (or after). >> >> It is unclear what the real issue is, so until that is sorted out, no >> merging will occur, sorry. >> > > That's ok. > I'm guessing the issue may be some mismatched assumption on what is > under USB devices between the bootdev code and the cmd/usb.c code. > > >>> But this at least fixes a reset at an >>> innocent looking usb tree or usb info command. Maybe we can improve it >>> later? >> >> NAK, please let's not add ad-hoc poorly understood changes into core code. > > Ok, sorry. > > I'll reply back if/when I get a clearer theory. The NAK is really only to prevent this from accidentally going on. Please see above, maybe it could be narrowed down ?
Sorry, I replied to Marek only but meant to reply to all. El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia: > > No. Well, in some tests yes and some no, but I got the error in all cases. > > This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before > you would get any meaningful result out of 'usb info'. Without either, you > would get 'USB is stopped.' message. Could it be there are some extra > scripts in your environment that manipulate the USB ? > I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c But maybe I don't get that called and it's really something silly in my setup as you say later... Maybe it doesn't get called unless it finds nothing else useful to boot. > > Can you test with stock U-Boot ? > I don't know. I'll see if I have time ... I'd rather read the code to understand what's the condition for finding bootdevices... > Can you test with another USB stick, i.e. is the issue specific to this USB > stick ? > I could test this, yes. > Is the issue specific to this partition layout of this USB stick, i.e. if > you clone (dd if=... of=...) the content of the USB stick to another USB > stick, does the error still occur. > I'll try to partition and flash a new USB. > [...] > > > Model: Radxa ROCK Pi 4B > > Net: eth0: ethernet@fe300000 > > Hit any key to stop autoboot: 2 1 0 > > rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! > > Bus usb@fe380000: USB EHCI 1.00 > > Bus usb@fe3c0000: USB EHCI 1.00 > > Bus usb@fe800000: Register 2000140 NbrPorts 2 > > Starting the controller > > USB XHCI 1.10 > > Bus usb@fe900000: Register 2000140 NbrPorts 2 > > Starting the controller > > USB XHCI 1.10 > > scanning bus usb@fe380000 for devices... 1 USB Device(s) found > > scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found > > scanning bus usb@fe800000 for devices... 1 USB Device(s) found > > scanning bus usb@fe900000 for devices... 1 USB Device(s) found > > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > > Could not initialize PHY ethernet@fe300000 > > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > > Could not initialize PHY ethernet@fe300000 > > Is this some $preboot script which initializes your hardware ? > Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck Anyway, one should be allowed to stop the boot, call usb start and usb tree and don't get a reset, shouldn't one? > => printenv preboot > I'll send this later when I repeat the test. I'd like to find a minimal test case or something... > > => usb tree > > USB device tree: > > 1 Hub (480 Mb/s, 0mA) > > u-boot EHCI Host Controller > > 1 Hub (480 Mb/s, 0mA) > > | u-boot EHCI Host Controller > > | > > uclass_id=64 > > |+-2 Mass Storage (480 Mb/s, 200mA) > > TDK LoR TF10 07032B6B1D0ACB96 > > uclass_id=22 > > uclass_id=25 > > "Synchronous Abort" handler, esr 0x96000010, far 0x948 > > elr: 00000000002157d4 lr : 00000000002157d4 (reloc) > > elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4 > > Take the u-boot (unstripped elf) which matches this binary, and run > aarch64-...objdump -lSD on it, then search for the $lr value, see > doc/develop/crash_dumps.rst for details. That should tell you where exactly > the crash occurred. Where did it occur ? > I didn't do it exactly so, but from u-boot.map I gathered that it was in cmd/usb.c and the fact that my patch fixed it implies the problem is the functions usb_show_tree_graph() or usb_show_info() get called recursively with null as a first parameter. Now I don't have that u-boot.map anymore and would have to repeat the experiment, to find out exactly as you say, so I won't do it right now. But thanks, understood. The reason usb_show_tree_graph() gets called with a null usb_device * is that the code in cmd/usb.c for usb info and usb tree assumes everything a UCLASS_MASS_STORAGE device can have as children are devices with some usb_device in their dev_get_parent_priv(). It carves out exceptions to this general rule for UCLASS_USB_EMUL and UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as usb_device, but the bootdev code did not put any usb_device there, it's null. So the first access causes a null pointer dereference. I would have to wrap my mind around more code to start understanding if it's better to give that UCLASS_BOOTDEV some usb_device as parent priv data, or it is better to give USB devices that can be enumerated for listing (usb tree or usb info) some RECURSIBLE flag that indicates their priv parent data is reliably a usb_device. So checking that the alledged usb_device at least isn't null as in my patch is possibly a partial solution. I'm sure if it's null we shouldn't call, but if it points to something other than a usb_device we shouldn't either, and it doesn't address why it is null (well, because it's not really a USB internal node, not even a proper leaf, so it shouldn't be recursed anyway). In usb_show_info() it is similar, usb_display_desc() gets called with a null udev because that's what came in. Recursion is avoided for UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV. A different solution could be to expand the exception to UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know everything you can find so you can list all the exceptions, instead of checking that you have something expected that you can recurse on. Not completely sure, just smells so to me. At least checking for null is more general. Maybe the solution is to fix common/usb_storage.c when it calls bootdev_setup_sibling_blk(), to ensure there's some useful usb_device there as parent priv. Or something needs fixing in drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really shouldn't call recursively for usb info or tree anyway. For now I was trying to understand when that UCLASS_BOOTDEV is added and why, and whether it should be removed at some time. This should hint me at what is the minimum scenario to reproduce the issue. > > The NAK is really only to prevent this from accidentally going on. > The NAK it's OK. I like people to want to understand stuff. Don't worry. > Please see above, maybe it could be narrowed down ? I'll see if I can test better and send more useful reports. Not sure when. I'm not sure I'll have the time to learn all I need. I just hoped someone else had it in mind...
On 6/9/23 20:52, Xavier Drudis Ferran wrote: > Sorry, I replied to Marek only but meant to reply to all. > > El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia: > >>> No. Well, in some tests yes and some no, but I got the error in all cases. >> >> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before >> you would get any meaningful result out of 'usb info'. Without either, you >> would get 'USB is stopped.' message. Could it be there are some extra >> scripts in your environment that manipulate the USB ? >> > > I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c > > But maybe I don't get that called and it's really something silly in > my setup as you say later... Maybe it doesn't get called unless it > finds nothing else useful to boot. > >> >> Can you test with stock U-Boot ? >> > > I don't know. I'll see if I have time ... > I'd rather read the code to understand what's the condition for finding bootdevices... > >> Can you test with another USB stick, i.e. is the issue specific to this USB >> stick ? >> > > I could test this, yes. > >> Is the issue specific to this partition layout of this USB stick, i.e. if >> you clone (dd if=... of=...) the content of the USB stick to another USB >> stick, does the error still occur. >> > > I'll try to partition and flash a new USB. > >> [...] >> >>> Model: Radxa ROCK Pi 4B >>> Net: eth0: ethernet@fe300000 >>> Hit any key to stop autoboot: 2 1 0 >>> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! >>> Bus usb@fe380000: USB EHCI 1.00 >>> Bus usb@fe3c0000: USB EHCI 1.00 >>> Bus usb@fe800000: Register 2000140 NbrPorts 2 >>> Starting the controller >>> USB XHCI 1.10 >>> Bus usb@fe900000: Register 2000140 NbrPorts 2 >>> Starting the controller >>> USB XHCI 1.10 >>> scanning bus usb@fe380000 for devices... 1 USB Device(s) found >>> scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found >>> scanning bus usb@fe800000 for devices... 1 USB Device(s) found >>> scanning bus usb@fe900000 for devices... 1 USB Device(s) found >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! >>> Could not initialize PHY ethernet@fe300000 >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! >>> Could not initialize PHY ethernet@fe300000 >> >> Is this some $preboot script which initializes your hardware ? >> > > Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck > > Anyway, one should be allowed to stop the boot, call usb start and usb tree > and don't get a reset, shouldn't one? > >> => printenv preboot >> > > I'll send this later when I repeat the test. I'd like to find a > minimal test case or something... Thank you >>> => usb tree >>> USB device tree: >>> 1 Hub (480 Mb/s, 0mA) >>> u-boot EHCI Host Controller >>> 1 Hub (480 Mb/s, 0mA) >>> | u-boot EHCI Host Controller >>> | >>> uclass_id=64 >>> |+-2 Mass Storage (480 Mb/s, 200mA) >>> TDK LoR TF10 07032B6B1D0ACB96 >>> uclass_id=22 >>> uclass_id=25 >>> "Synchronous Abort" handler, esr 0x96000010, far 0x948 >>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc) >>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4 >> >> Take the u-boot (unstripped elf) which matches this binary, and run >> aarch64-...objdump -lSD on it, then search for the $lr value, see >> doc/develop/crash_dumps.rst for details. That should tell you where exactly >> the crash occurred. Where did it occur ? >> > > I didn't do it exactly so, but from u-boot.map I gathered that it was > in cmd/usb.c and the fact that my patch fixed it implies the problem > is the functions usb_show_tree_graph() or usb_show_info() get called > recursively with null as a first parameter. > > Now I don't have that u-boot.map anymore and would have to repeat the > experiment, to find out exactly as you say, so I won't do it right > now. But thanks, understood. > > The reason usb_show_tree_graph() gets called with a null usb_device * > is that the code in cmd/usb.c for usb info and usb tree assumes > everything a UCLASS_MASS_STORAGE device can have as children are > devices with some usb_device in their dev_get_parent_priv(). It > carves out exceptions to this general rule for UCLASS_USB_EMUL and > UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is > UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as > usb_device, but the bootdev code did not put any usb_device there, > it's null. So the first access causes a null pointer dereference. > > I would have to wrap my mind around more code to start understanding > if it's better to give that UCLASS_BOOTDEV some usb_device as parent > priv data, or it is better to give USB devices that can be enumerated > for listing (usb tree or usb info) some RECURSIBLE flag that indicates > their priv parent data is reliably a usb_device. > > So checking that the alledged usb_device at least isn't null as in my > patch is possibly a partial solution. I'm sure if it's null we > shouldn't call, but if it points to something other than a usb_device we > shouldn't either, and it doesn't address why it is null (well, because > it's not really a USB internal node, not even a proper leaf, so it > shouldn't be recursed anyway). > > In usb_show_info() it is similar, usb_display_desc() gets called with a > null udev because that's what came in. Recursion is avoided for > UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV. > > A different solution could be to expand the exception to > UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know > everything you can find so you can list all the exceptions, instead of > checking that you have something expected that you can recurse on. Not > completely sure, just smells so to me. At least checking for null is > more general. > > Maybe the solution is to fix common/usb_storage.c when it calls > bootdev_setup_sibling_blk(), to ensure there's some useful usb_device > there as parent priv. Or something needs fixing in > drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really > shouldn't call recursively for usb info or tree anyway. > > For now I was trying to understand when that UCLASS_BOOTDEV is added > and why, and whether it should be removed at some time. This should > hint me at what is the minimum scenario to reproduce the issue. > >> >> The NAK is really only to prevent this from accidentally going on. >> > > The NAK it's OK. I like people to want to understand stuff. Don't worry. s@going on@going in@ ... typo. >> Please see above, maybe it could be narrowed down ? > > I'll see if I can test better and send more useful reports. > Not sure when. > > I'm not sure I'll have the time to learn all I need. I just hoped someone > else had it in mind... I reproducer would be a good starting point, i.e. something like "I run these ... commands and I get this crash" . But please make sure you use current codebase and no preboot or similar scripting .
Hi, On Sun, 11 Jun 2023 at 13:29, Marek Vasut <marex@denx.de> wrote: > > On 6/9/23 20:52, Xavier Drudis Ferran wrote: > > Sorry, I replied to Marek only but meant to reply to all. > > > > El Fri, Jun 09, 2023 at 03:20:33AM +0200, Marek Vasut deia: > > > >>> No. Well, in some tests yes and some no, but I got the error in all cases. > >> > >> This is doubtful. It is mandatory to run 'usb start' or 'usb reset' before > >> you would get any meaningful result out of 'usb info'. Without either, you > >> would get 'USB is stopped.' message. Could it be there are some extra > >> scripts in your environment that manipulate the USB ? > >> > > > > I saw usb_boootdev_hunt() calls usb_init() in drivers/usb/host/usb_bootdev.c > > > > But maybe I don't get that called and it's really something silly in > > my setup as you say later... Maybe it doesn't get called unless it > > finds nothing else useful to boot. > > > >> > >> Can you test with stock U-Boot ? > >> > > > > I don't know. I'll see if I have time ... > > I'd rather read the code to understand what's the condition for finding bootdevices... > > > >> Can you test with another USB stick, i.e. is the issue specific to this USB > >> stick ? > >> > > > > I could test this, yes. > > > >> Is the issue specific to this partition layout of this USB stick, i.e. if > >> you clone (dd if=... of=...) the content of the USB stick to another USB > >> stick, does the error still occur. > >> > > > > I'll try to partition and flash a new USB. > > > >> [...] > >> > >>> Model: Radxa ROCK Pi 4B > >>> Net: eth0: ethernet@fe300000 > >>> Hit any key to stop autoboot: 2 1 0 > >>> rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! > >>> Bus usb@fe380000: USB EHCI 1.00 > >>> Bus usb@fe3c0000: USB EHCI 1.00 > >>> Bus usb@fe800000: Register 2000140 NbrPorts 2 > >>> Starting the controller > >>> USB XHCI 1.10 > >>> Bus usb@fe900000: Register 2000140 NbrPorts 2 > >>> Starting the controller > >>> USB XHCI 1.10 > >>> scanning bus usb@fe380000 for devices... 1 USB Device(s) found > >>> scanning bus usb@fe3c0000 for devices... 2 USB Device(s) found > >>> scanning bus usb@fe800000 for devices... 1 USB Device(s) found > >>> scanning bus usb@fe900000 for devices... 1 USB Device(s) found > >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property > >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > >>> Could not initialize PHY ethernet@fe300000 > >>> rockchip_pcie pcie@f8000000: failed to find ep-gpios property > >>> ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > >>> Could not initialize PHY ethernet@fe300000 > >> > >> Is this some $preboot script which initializes your hardware ? > >> > > > > Mmm... yes, I used to have it... I thought not in this test, but I'd better recheck > > > > Anyway, one should be allowed to stop the boot, call usb start and usb tree > > and don't get a reset, shouldn't one? > > > >> => printenv preboot > >> > > > > I'll send this later when I repeat the test. I'd like to find a > > minimal test case or something... > > Thank you > > >>> => usb tree > >>> USB device tree: > >>> 1 Hub (480 Mb/s, 0mA) > >>> u-boot EHCI Host Controller > >>> 1 Hub (480 Mb/s, 0mA) > >>> | u-boot EHCI Host Controller > >>> | > >>> uclass_id=64 > >>> | +-2 Mass Storage (480 Mb/s, 200mA) > >>> TDK LoR TF10 07032B6B1D0ACB96 > >>> uclass_id=22 > >>> uclass_id=25 > >>> "Synchronous Abort" handler, esr 0x96000010, far 0x948 > >>> elr: 00000000002157d4 lr : 00000000002157d4 (reloc) > >>> elr: 00000000f3f4f7d4 lr : 00000000f3f4f7d4 > >> > >> Take the u-boot (unstripped elf) which matches this binary, and run > >> aarch64-...objdump -lSD on it, then search for the $lr value, see > >> doc/develop/crash_dumps.rst for details. That should tell you where exactly > >> the crash occurred. Where did it occur ? > >> > > > > I didn't do it exactly so, but from u-boot.map I gathered that it was > > in cmd/usb.c and the fact that my patch fixed it implies the problem > > is the functions usb_show_tree_graph() or usb_show_info() get called > > recursively with null as a first parameter. > > > > Now I don't have that u-boot.map anymore and would have to repeat the > > experiment, to find out exactly as you say, so I won't do it right > > now. But thanks, understood. > > > > The reason usb_show_tree_graph() gets called with a null usb_device * > > is that the code in cmd/usb.c for usb info and usb tree assumes > > everything a UCLASS_MASS_STORAGE device can have as children are > > devices with some usb_device in their dev_get_parent_priv(). It > > carves out exceptions to this general rule for UCLASS_USB_EMUL and > > UCLASS_BLK, but not for UCLASS_BOOTDEV. When it finds a child that is > > UCLASS_BOOTDEV it happily recurses on it passing its parent_priv as > > usb_device, but the bootdev code did not put any usb_device there, > > it's null. So the first access causes a null pointer dereference. > > > > I would have to wrap my mind around more code to start understanding > > if it's better to give that UCLASS_BOOTDEV some usb_device as parent > > priv data, or it is better to give USB devices that can be enumerated > > for listing (usb tree or usb info) some RECURSIBLE flag that indicates > > their priv parent data is reliably a usb_device. > > > > So checking that the alledged usb_device at least isn't null as in my > > patch is possibly a partial solution. I'm sure if it's null we > > shouldn't call, but if it points to something other than a usb_device we > > shouldn't either, and it doesn't address why it is null (well, because > > it's not really a USB internal node, not even a proper leaf, so it > > shouldn't be recursed anyway). > > > > In usb_show_info() it is similar, usb_display_desc() gets called with a > > null udev because that's what came in. Recursion is avoided for > > UCLASS_USB_EMUL or UCLASS_BLK but not for UCLASS_BOOTDEV. > > > > A different solution could be to expand the exception to > > UCLASS_BOOTDEV, but it still seems a wrong strategy to expect to know > > everything you can find so you can list all the exceptions, instead of > > checking that you have something expected that you can recurse on. Not > > completely sure, just smells so to me. At least checking for null is > > more general. > > > > Maybe the solution is to fix common/usb_storage.c when it calls > > bootdev_setup_sibling_blk(), to ensure there's some useful usb_device > > there as parent priv. Or something needs fixing in > > drivers/usb/host/usb_bootdev.c ??? But I'm not sure. We really > > shouldn't call recursively for usb info or tree anyway. > > > > For now I was trying to understand when that UCLASS_BOOTDEV is added > > and why, and whether it should be removed at some time. This should > > hint me at what is the minimum scenario to reproduce the issue. > > > >> > >> The NAK is really only to prevent this from accidentally going on. > >> > > > > The NAK it's OK. I like people to want to understand stuff. Don't worry. > > s@going on@going in@ ... typo. > > >> Please see above, maybe it could be narrowed down ? > > > > I'll see if I can test better and send more useful reports. > > Not sure when. > > > > I'm not sure I'll have the time to learn all I need. I just hoped someone > > else had it in mind... > > I reproducer would be a good starting point, i.e. something like "I run > these ... commands and I get this crash" . But please make sure you use > current codebase and no preboot or similar scripting . I'm not sure what is going on here. Which version are you testing? Do you have these two commits? 8c29b73278d6 bootstd: usb: Avoid initing USB twice 9fea3a799dde usb: Tidy up the usb_start flag Regards, Simon
El Mon, Jun 12, 2023 at 10:17:38PM +0100, Simon Glass deia: > > I'm not sure what is going on here. Which version are you testing? Do > you have these two commits? > > 8c29b73278d6 bootstd: usb: Avoid initing USB twice > 9fea3a799dde usb: Tidy up the usb_start flag > > Regards, > Simon Yes, I have both. I'm testing the next branch. I'll send shortly a reply to Marek Vasut with a cleaner test I did yesterday. Doesn't look hard to reproduce for me... I'm still trying to understand when the UCLASS_BOOTDEV device should be added under an usb mass storage device, and when (if ever) removed.
Ok. New test. This uses yesterday morning's next branch. commit 5b589e139620214f Merge: cc5a940923 32d2461e04 Merge branch 'next_net/phy_connect_dev' USB2 does not work for rk3399 in next (fixes are in master, thanks), but USB3 is enough. I compiled for rock-pi-4-rk3399_defconfig flashed to a new microSD card as per doc/board/rockchip/rockchip.rst : dd if=u-boot-rockchip.bin of=/dev/sda seek=64 sync Put this microSD card in a Rock Pi 4 B+ Put a new USB stick in the USB3 port (center blue port closer to board). (the microSD card and USB stick come from factory, I guess they were partitioned with a single FAT partition) (make sure emmc and spi are blank) Connected only serial console and power. Got this: U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47) lpddr4_set_rate: change freq to 400MHz 0, 1 Channel 0: LPDDR4, 400MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB Channel 1: LPDDR4, 400MHz BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB 256B stride lpddr4_set_rate: change freq to 800MHz 1, 0 Trying to boot from BOOTROM Returning to boot ROM... U-Boot SPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200) Trying to boot from MMC1 NOTICE: BL31: v2.1(release):v2.1-728-ged01e0c4-dirty NOTICE: BL31: Built : 18:29:11, Mar 22 2022 U-Boot 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200) SoC: Rockchip rk3399 Reset cause: POR Model: Radxa ROCK Pi 4B DRAM: 4 GiB (effective 3.9 GiB) PMIC: RK808 Core: 283 devices, 29 uclasses, devicetree: separate MMC: mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0 Loading Environment from MMC... *** Warning - bad CRC, using default environment In: serial Out: serial Err: serial Model: Radxa ROCK Pi 4B Net: eth0: ethernet@fe300000 Hit any key to stop autoboot: 2 1 0 rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! Bus usb@fe380000: ehci_generic usb@fe380000: Failed to get clocks (ret=-19) Port not available. Bus usb@fe3c0000: ehci_generic usb@fe3c0000: Failed to get clocks (ret=-19) Port not available. Bus usb@fe800000: Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 Bus usb@fe900000: Register 2000140 NbrPorts 2 Starting the controller USB XHCI 1.10 scanning bus usb@fe800000 for devices... 1 USB Device(s) found scanning bus usb@fe900000 for devices... cannot reset port 1!? 2 USB Device(s) found rockchip_pcie pcie@f8000000: failed to find ep-gpios property ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! Could not initialize PHY ethernet@fe300000 rockchip_pcie pcie@f8000000: failed to find ep-gpios property ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! Could not initialize PHY ethernet@fe300000 => printenv preboot ## Error: "preboot" not defined => printenv arch=arm baudrate=1500000 board=evb_rk3399 board_name=evb_rk3399 boot_targets=mmc1 mmc0 nvme scsi usb pxe dhcp spi bootcmd=bootflow scan bootdelay=2 cpu=armv8 cpuid#=[something] eth1addr=[:so:me:th:in:g] ethact=ethernet@fe300000 ethaddr=[:so:me:th:in:g] fdt_addr_r=0x01f00000 fdtcontroladdr=f1ef9170 fdtfile=rockchip/rk3399-rock-pi-4b.dtb fdtoverlay_addr_r=0x02000000 kernel_addr_r=0x02080000 kernel_comp_addr_r=0x08000000 kernel_comp_size=0x2000000 loadaddr=0x800800 partitions=uuid_disk=${uuid_gpt_disk};name=loader1,start=32K,size=4000K,uuid=${uuid_gpt_loader1};name=loader2,start=8MB,size=4MB,uuid=${uuid_gpt_loader2};name=trust,size=4M,uuid=${uuid_gpt_atf};name=boot,size=112M,bootable,uuid=${uuid_gpt_boot};name=rootfs,size=-,uuid=[something]; pxefile_addr_r=0x00600000 ramdisk_addr_r=0x06000000 script_offset_f=0xffe000 script_size_f=0x2000 scriptaddr=0x00500000 serial#=[something] soc=rk3399 stderr=serial,vidconsole stdin=serial,usbkbd stdout=serial,vidconsole vendor=rockchip Environment size: 1041/32764 bytes => usb info 1: Hub, USB Revision 3.0 - U-Boot XHCI Host Controller - Class: Hub - PacketSize: 512 Configurations: 1 - Vendor: 0x0000 Product 0x0000 Version 1.0 Configuration: 1 - Interfaces: 1 Self Powered 0mA Interface: 0 - Alternate Setting 0, Endpoints: 1 - Class Hub - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms 1: Hub, USB Revision 3.0 - U-Boot XHCI Host Controller - Class: Hub - PacketSize: 512 Configurations: 1 - Vendor: 0x0000 Product 0x0000 Version 1.0 Configuration: 1 - Interfaces: 1 Self Powered 0mA Interface: 0 - Alternate Setting 0, Endpoints: 1 - Class Hub - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms 2: Mass Storage, USB Revision 3.20 - USB SanDisk 3.2Gen1 05017d2e4d7b4ea0c5822c90c51e0b7 - Class: (from Interface) Mass Storage - PacketSize: 512 Configurations: 1 - Vendor: 0x0781 Product 0x5591 Version 1.0 Configuration: 1 - Interfaces: 1 Bus Powered 224mA Interface: 0 - Alternate Setting 0, Endpoints: 2 - Class Mass Storage, Transp. SCSI, Bulk only - Endpoint 1 In Bulk MaxPacket 1024 - Endpoint 2 Out Bulk MaxPacket 1024 "Synchronous Abort" handler, esr 0x96000010, far 0x101 elr: 000000000021c398 lr : 000000000021ca70 (reloc) elr: 00000000f3f32398 lr : 00000000f3f32a70 x0 : 0000000000000000 x1 : 00000000000010d1 x2 : 00000000f1f404b8 x3 : 00000000f1f41998 x4 : 00000000ff1a0000 x5 : 0000000000000034 x6 : 000000000000000a x7 : 0000000000000002 x8 : 0000000000000000 x9 : 0000000000000400 x10: 0000000000000006 x11: 000000000001869f x12: 0000000000000200 x13: 0000000000000000 x14: 00000000ffffffff x15: 00000000f1ef81c3 x16: 0000000000000000 x17: 0000000000000000 x18: 00000000f1f0dd90 x19: 0000000000000000 x20: 00000000f1ef8848 x21: 0000000000000002 x22: 00000000f1ef8848 x23: 0000000000000002 x24: 00000000f1ef8844 x25: 0000000000000000 x26: 0000000000000000 x27: 0000000000000000 x28: 0000000000000000 x29: 00000000f1ef86b0 Code: f0000460 a8c27bfd 91099000 1401e518 (39440401) Resetting CPU ... resetting ... bootcmd=bootflow scan is because DISTRO_DEFAULTS is not in configs/rock-pi-4-rk3399_defconfig I'd say the changes to Kconfig are in ef5e3891f57 and 2d653f686b6. In any case this causes bootflow scan to be called. This tries to access the media in boot_targets until it reaches usb ( do_bootflow_scan() in cmd/bootflow.c tries mmc1, mmc0 and nvme unsuccessfully). I think this adds a UCLASS_BOOTDEV device under a usb mass storage device as sibling of a UCLASS_BLK device, and this makes usb info recurse with a null usb_device pointer and a reset at this dereference. But I'm still reading the code to understand quite exactly how and which are the paths. I don't understand how much of this is intendeded and what should be prevented. Thsi is why my patch just fixed the last consequence, tthe null pointer dereference. I'll keep looking when I can.
Hi Xavier, On Tue, 13 Jun 2023 at 07:52, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > Ok. New test. > > This uses yesterday morning's next branch. > commit 5b589e139620214f > Merge: cc5a940923 32d2461e04 > Merge branch 'next_net/phy_connect_dev' > > USB2 does not work for rk3399 in next (fixes are in master, thanks), > but USB3 is enough. > > I compiled for rock-pi-4-rk3399_defconfig > > flashed to a new microSD card as per doc/board/rockchip/rockchip.rst : > > dd if=u-boot-rockchip.bin of=/dev/sda seek=64 > sync > > Put this microSD card in a Rock Pi 4 B+ > Put a new USB stick in the USB3 port (center blue port closer to board). > > (the microSD card and USB stick come from factory, I guess they were > partitioned with a single FAT partition) > > (make sure emmc and spi are blank) > > Connected only serial console and power. > > Got this: > > U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47) > lpddr4_set_rate: change freq to 400MHz 0, 1 > Channel 0: LPDDR4, 400MHz > BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB > Channel 1: LPDDR4, 400MHz > BW=32 Col=10 Bk=8 CS0 Row=15 CS1 Row=15 CS=2 Die BW=16 Size=2048MB > 256B stride > lpddr4_set_rate: change freq to 800MHz 1, 0 > Trying to boot from BOOTROM > Returning to boot ROM... > > U-Boot SPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200) > Trying to boot from MMC1 > NOTICE: BL31: v2.1(release):v2.1-728-ged01e0c4-dirty > NOTICE: BL31: Built : 18:29:11, Mar 22 2022 > > > U-Boot 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47 +0200) > > SoC: Rockchip rk3399 > Reset cause: POR > Model: Radxa ROCK Pi 4B > DRAM: 4 GiB (effective 3.9 GiB) > PMIC: RK808 > Core: 283 devices, 29 uclasses, devicetree: separate > MMC: mmc@fe310000: 2, mmc@fe320000: 1, mmc@fe330000: 0 > Loading Environment from MMC... *** Warning - bad CRC, using default environment > > In: serial > Out: serial > Err: serial > Model: Radxa ROCK Pi 4B > Net: eth0: ethernet@fe300000 > Hit any key to stop autoboot: 2 1 0 > rockchip_pcie pcie@f8000000: PCIe link training gen1 timeout! > Bus usb@fe380000: ehci_generic usb@fe380000: Failed to get clocks (ret=-19) > Port not available. > Bus usb@fe3c0000: ehci_generic usb@fe3c0000: Failed to get clocks (ret=-19) > Port not available. > Bus usb@fe800000: Register 2000140 NbrPorts 2 > Starting the controller > USB XHCI 1.10 > Bus usb@fe900000: Register 2000140 NbrPorts 2 > Starting the controller > USB XHCI 1.10 > scanning bus usb@fe800000 for devices... 1 USB Device(s) found > scanning bus usb@fe900000 for devices... cannot reset port 1!? > 2 USB Device(s) found > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > Could not initialize PHY ethernet@fe300000 > rockchip_pcie pcie@f8000000: failed to find ep-gpios property > ethernet@fe300000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > Could not initialize PHY ethernet@fe300000 > => printenv preboot > ## Error: "preboot" not defined > => printenv > arch=arm > baudrate=1500000 > board=evb_rk3399 > board_name=evb_rk3399 > boot_targets=mmc1 mmc0 nvme scsi usb pxe dhcp spi > > bootcmd=bootflow scan > > bootdelay=2 > cpu=armv8 > cpuid#=[something] > eth1addr=[:so:me:th:in:g] > ethact=ethernet@fe300000 > ethaddr=[:so:me:th:in:g] > fdt_addr_r=0x01f00000 > fdtcontroladdr=f1ef9170 > fdtfile=rockchip/rk3399-rock-pi-4b.dtb > fdtoverlay_addr_r=0x02000000 > kernel_addr_r=0x02080000 > kernel_comp_addr_r=0x08000000 > kernel_comp_size=0x2000000 > loadaddr=0x800800 > partitions=uuid_disk=${uuid_gpt_disk};name=loader1,start=32K,size=4000K,uuid=${uuid_gpt_loader1};name=loader2,start=8MB,size=4MB,uuid=${uuid_gpt_loader2};name=trust,size=4M,uuid=${uuid_gpt_atf};name=boot,size=112M,bootable,uuid=${uuid_gpt_boot};name=rootfs,size=-,uuid=[something]; > pxefile_addr_r=0x00600000 > ramdisk_addr_r=0x06000000 > script_offset_f=0xffe000 > script_size_f=0x2000 > scriptaddr=0x00500000 > serial#=[something] > soc=rk3399 > stderr=serial,vidconsole > stdin=serial,usbkbd > stdout=serial,vidconsole > vendor=rockchip > > Environment size: 1041/32764 bytes > => usb info > 1: Hub, USB Revision 3.0 > - U-Boot XHCI Host Controller > - Class: Hub > - PacketSize: 512 Configurations: 1 > - Vendor: 0x0000 Product 0x0000 Version 1.0 > Configuration: 1 > - Interfaces: 1 Self Powered 0mA > Interface: 0 > - Alternate Setting 0, Endpoints: 1 > - Class Hub > - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms > > 1: Hub, USB Revision 3.0 > - U-Boot XHCI Host Controller > - Class: Hub > - PacketSize: 512 Configurations: 1 > - Vendor: 0x0000 Product 0x0000 Version 1.0 > Configuration: 1 > - Interfaces: 1 Self Powered 0mA > Interface: 0 > - Alternate Setting 0, Endpoints: 1 > - Class Hub > - Endpoint 1 In Interrupt MaxPacket 8 Interval 255ms > > 2: Mass Storage, USB Revision 3.20 > - USB SanDisk 3.2Gen1 05017d2e4d7b4ea0c5822c90c51e0b7 > - Class: (from Interface) Mass Storage > - PacketSize: 512 Configurations: 1 > - Vendor: 0x0781 Product 0x5591 Version 1.0 > Configuration: 1 > - Interfaces: 1 Bus Powered 224mA > Interface: 0 > - Alternate Setting 0, Endpoints: 2 > - Class Mass Storage, Transp. SCSI, Bulk only > - Endpoint 1 In Bulk MaxPacket 1024 > - Endpoint 2 Out Bulk MaxPacket 1024 > > "Synchronous Abort" handler, esr 0x96000010, far 0x101 > elr: 000000000021c398 lr : 000000000021ca70 (reloc) > elr: 00000000f3f32398 lr : 00000000f3f32a70 > x0 : 0000000000000000 x1 : 00000000000010d1 > x2 : 00000000f1f404b8 x3 : 00000000f1f41998 > x4 : 00000000ff1a0000 x5 : 0000000000000034 > x6 : 000000000000000a x7 : 0000000000000002 > x8 : 0000000000000000 x9 : 0000000000000400 > x10: 0000000000000006 x11: 000000000001869f > x12: 0000000000000200 x13: 0000000000000000 > x14: 00000000ffffffff x15: 00000000f1ef81c3 > x16: 0000000000000000 x17: 0000000000000000 > x18: 00000000f1f0dd90 x19: 0000000000000000 > x20: 00000000f1ef8848 x21: 0000000000000002 > x22: 00000000f1ef8848 x23: 0000000000000002 > x24: 00000000f1ef8844 x25: 0000000000000000 > x26: 0000000000000000 x27: 0000000000000000 > x28: 0000000000000000 x29: 00000000f1ef86b0 > > Code: f0000460 a8c27bfd 91099000 1401e518 (39440401) > Resetting CPU ... > > resetting ... > > > > bootcmd=bootflow scan is because DISTRO_DEFAULTS is not > in configs/rock-pi-4-rk3399_defconfig > > I'd say the changes to Kconfig are in ef5e3891f57 and 2d653f686b6. > > In any case this causes > > bootflow scan > > to be called. This tries to access the media in boot_targets until it reaches > usb ( do_bootflow_scan() in cmd/bootflow.c tries mmc1, mmc0 and nvme unsuccessfully). > > I think this adds a UCLASS_BOOTDEV device under a usb mass storage device > as sibling of a UCLASS_BLK device, and this makes usb info recurse > with a null usb_device pointer and a reset at this dereference. Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices. That is better than checking for the NULL pointer. > > But I'm still reading the code to understand quite exactly how and > which are the paths. I don't understand how much of this is intendeded > and what should be prevented. Thsi is why my patch just fixed the last > consequence, tthe null pointer dereference. > > I'll keep looking when I can. Regards, Simon
El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia: > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices. That's a possibility, yes. It should work until someone adds another device there for some other purpose. > That is better than checking for the NULL pointer. > Why? What's wrong about checking for null ? Or maybe checking for both not null and not UCLASS_BOOTDEV ? Not that I care too much, just to understand your reasoning. Or can we check for recursible devices somehow ? Maybe flag them or something ? Why is better to state all devices are recursible except some UCLASSes (meaning they can have stuff that needs listed in usb info - or likewise usb tree -) instead of stating that some closed set are recursible ?
Hi Xavier, On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia: > > > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices. > > That's a possibility, yes. It should work until someone adds another > device there for some other purpose. > > > That is better than checking for the NULL pointer. > > > > Why? What's wrong about checking for null ? > Or maybe checking for both not null and not UCLASS_BOOTDEV ? > > Not that I care too much, just to understand your reasoning. Well, devices may have something attached there, so it may be non-NULL, but point to a different struct from the expected one. > > Or can we check for recursible devices somehow ? > Maybe flag them or something ? > > Why is better to state all devices are recursible except some UCLASSes > (meaning they can have stuff that needs listed in usb info - or > likewise usb tree -) instead of stating that some closed set are > recursible ? > For USB we can have lots of different types of devices as children, so it would be a pain to enumerate them all. At least that's how I see it. Regards, Simon
Thanks for explaining, Simon Glass. Can someone please stop me if I'm splitting hairs or bikeshedding or something ? I guess we agree both checking for null or UCLASS_BOOTDEV would work right now but we are looking for what's more future-proof. El Tue, Jun 13, 2023 at 09:12:30PM +0100, Simon Glass deia: > Hi Xavier, > > On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > > > El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia: > > > > > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices. > > > > That's a possibility, yes. It should work until someone adds another > > device there for some other purpose. > > > > > That is better than checking for the NULL pointer. > > > > > > > Why? What's wrong about checking for null ? > > Or maybe checking for both not null and not UCLASS_BOOTDEV ? > > > > Not that I care too much, just to understand your reasoning. > > Well, devices may have something attached there, so it may be > non-NULL, but point to a different struct from the expected one. > Yes. That is possible. How likely ? That "there" is dev_get_parent_priv(). If I understand the driver model, those devices shouldn't put things there themselves, it should be their parent who puts stuff there for them. And the parent should be an usb_device->dev (because of the recursive code). So what other struct would such a parent put in a child parent_priv_ ? Yes, whatever it wants, but I mean, isn't it likely to assume the child would be "adopted" with null as parent_priv_ or a "natural child" with a usb_device in parent_priv_ ? I did a very rough search grep -rIE 'UCLASS_(BLK|BOOTDEV|USB_EMUL)' . |grep -F .id |cut -f1 -d: |xargs -n 1 grep -l per_child_auto ./drivers/usb/emul/usb-emul-uclass.c Which seems to suggest only UCLASS_USB_EMUL would have parent_priv_, and that would be an usb_device, which the current code does not want to recurse anyway. (dev_set_parent_priv() is almost never called). It is also possible that one day a device that is not UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage device (just imagine a future system similar to bootstd for firmware updates, trust material, etc.). Is it likely to have a struct in parent_priv_ that is not a usb_device ? So which is more likely to survive future changes ? - checking for parent_priv_ not null and not UCLASS_USB_EMUL - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK (my patch, overcautious ?) - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL) (Simon Glass' idea) - checking for not UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL and parent_priv_ not null > > > > Or can we check for recursible devices somehow ? > > Maybe flag them or something ? > > > > Why is better to state all devices are recursible except some UCLASSes > > (meaning they can have stuff that needs listed in usb info - or > > likewise usb tree -) instead of stating that some closed set are > > recursible ? > > > > For USB we can have lots of different types of devices as children, so > it would be a pain to enumerate them all. At least that's how I see > it. > We can have lots of devices as children, but those do get listed before the recursive call. How many of those can have children that we have to list too, i.e. how many of those do we want to recurse into ? I can only think of usb hubs (maybe some node for multifunction devices too ???). Maybe I'm not understanding how U-Boot works with USB devices... but it looks like it would be enough to look for UCLASS_USB_HUB, then recursive call, else no recursion. I mean instead of cmd/usb.c : usb_show_tree_graph() : if ((device_get_uclass_id( child ) != UCLASS_USB_EMUL) && (device_get_uclass_id( child ) != UCLASS_BOOTDEV) && (device_get_uclass_id( child ) != UCLASS_BLK)) { usb_show_tree_graph(udev, pre); pre[index] = 0; } we could simply have if (device_get_uclass_id( dev->dev ) == UCLASS_USB_HUB) { usb_show_tree_graph(udev, pre); pre[index] = 0; } (or put the condition directly before the for loop) Or can we have an UCLASS_USB_EMUL, UCLASS_BLK or UCLASS_BOOTDEV as direct child of an UCLASS_USB_HUB ??? Am I opening any cans of worms ?
On 6/13/23 08:52, Xavier Drudis Ferran wrote: > Ok. New test. > > This uses yesterday morning's next branch. > commit 5b589e139620214f > Merge: cc5a940923 32d2461e04 > Merge branch 'next_net/phy_connect_dev' > > USB2 does not work for rk3399 in next (fixes are in master, thanks), > but USB3 is enough. > > I compiled for rock-pi-4-rk3399_defconfig > > flashed to a new microSD card as per doc/board/rockchip/rockchip.rst : > > dd if=u-boot-rockchip.bin of=/dev/sda seek=64 > sync > > Put this microSD card in a Rock Pi 4 B+ > Put a new USB stick in the USB3 port (center blue port closer to board). > > (the microSD card and USB stick come from factory, I guess they were > partitioned with a single FAT partition) > > (make sure emmc and spi are blank) > > Connected only serial console and power. > > Got this: > > U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47) Next is already at rc4 , what's this rc2 ?
El Tue, Jun 20, 2023 at 02:50:48AM +0200, Marek Vasut deia: > On 6/13/23 08:52, Xavier Drudis Ferran wrote: > > > > U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47) > > Next is already at rc4 , what's this rc2 ? It's a test I did last week (June 12th). What do you think has changed in next between rc2 and rc4 so that this shouldn't happen now ? My last patch (sent yesterday) was tested with the next branch as of yesterday morning. But you can test it yourself with any version you like. Do I have to send the logs of each test I do to the list every time ? I thought Simon Glass reply last week to the mail you quoted already showed agreement that the issue exists, and the adding of the UCLASS_BOOTDEV device is per design, so cmd/usb.c needs fixing to deal with this change.
On 6/20/23 09:03, Xavier Drudis Ferran wrote: > El Tue, Jun 20, 2023 at 02:50:48AM +0200, Marek Vasut deia: >> On 6/13/23 08:52, Xavier Drudis Ferran wrote: >>> >>> U-Boot TPL 2023.07-rc2-00144-g497967f1ee (Jun 12 2023 - 11:15:47) >> >> Next is already at rc4 , what's this rc2 ? > > It's a test I did last week (June 12th). What do you think has changed > in next between rc2 and rc4 so that this shouldn't happen now ? The beginning of the email states: " Ok. New test. This uses yesterday morning's next branch. commit 5b589e139620214f " So, that is not the case ? Also, commit 497967f1ee does not exist in upstream U-Boot, was this some patched U-Boot tree ? > My last patch (sent yesterday) was tested with the next branch as of > yesterday morning. But you can test it yourself with any version you like. > Do I have to send the logs of each test I do to the list every time ? Ideally the claims you make about what you tested and logs from the test should match, that would be a good starting point. > I thought Simon Glass reply last week to the mail you quoted already > showed agreement that the issue exists, and the adding of the > UCLASS_BOOTDEV device is per design, so cmd/usb.c needs fixing to deal > with this change. I am not disputing that. What bothers me is the still missing consistent test case, esp. if this is a fix which should be added this late in release.
Hi Xavier, On Wed, 14 Jun 2023 at 09:40, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > > Thanks for explaining, Simon Glass. > > Can someone please stop me if I'm splitting hairs or bikeshedding or > something ? I guess we agree both checking for null or UCLASS_BOOTDEV > would work right now but we are looking for what's more future-proof. > > El Tue, Jun 13, 2023 at 09:12:30PM +0100, Simon Glass deia: > > Hi Xavier, > > > > On Tue, 13 Jun 2023 at 17:04, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > > > > > El Tue, Jun 13, 2023 at 03:58:22PM +0100, Simon Glass deia: > > > > > > > > Yes that's right. So 'usb info' should ignore UCLASS_BOOTDEV devices. > > > > > > That's a possibility, yes. It should work until someone adds another > > > device there for some other purpose. > > > > > > > That is better than checking for the NULL pointer. > > > > > > > > > > Why? What's wrong about checking for null ? > > > Or maybe checking for both not null and not UCLASS_BOOTDEV ? > > > > > > Not that I care too much, just to understand your reasoning. > > > > Well, devices may have something attached there, so it may be > > non-NULL, but point to a different struct from the expected one. > > > > Yes. That is possible. How likely ? > > That "there" is dev_get_parent_priv(). If I understand the driver > model, those devices shouldn't put things there themselves, it should > be their parent who puts stuff there for them. And the parent should > be an usb_device->dev (because of the recursive code). So what other > struct would such a parent put in a child parent_priv_ ? Yes, whatever > it wants, but I mean, isn't it likely to assume the child would be > "adopted" with null as parent_priv_ or a "natural child" with a usb_device > in parent_priv_ ? > > I did a very rough search > > grep -rIE 'UCLASS_(BLK|BOOTDEV|USB_EMUL)' . |grep -F .id |cut -f1 -d: |xargs -n 1 grep -l per_child_auto > ./drivers/usb/emul/usb-emul-uclass.c > > Which seems to suggest only UCLASS_USB_EMUL would have parent_priv_, > and that would be an usb_device, which the current code does not want > to recurse anyway. (dev_set_parent_priv() is almost never called). > > It is also possible that one day a device that is not UCLASS_BLK, > UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage > device (just imagine a future system similar to bootstd for firmware > updates, trust material, etc.). Is it likely to have a struct in > parent_priv_ that is not a usb_device ? > > So which is more likely to survive future changes ? > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK > (my patch, overcautious ?) > > - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL) > (Simon Glass' idea) > > - checking for not UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL > and parent_priv_ not null Really the parent_priv thing is a separate issue, a side effect of something having a UCLASS_USB parent. The key point here is that we cannot iterate down into a bootdev device looking for USB devices. So we should use that as the check, since it is the most direct check. > > > > > > > Or can we check for recursible devices somehow ? > > > Maybe flag them or something ? > > > > > > Why is better to state all devices are recursible except some UCLASSes > > > (meaning they can have stuff that needs listed in usb info - or > > > likewise usb tree -) instead of stating that some closed set are > > > recursible ? > > > > > > > For USB we can have lots of different types of devices as children, so > > it would be a pain to enumerate them all. At least that's how I see > > it. > > > > We can have lots of devices as children, but those do get listed > before the recursive call. How many of those can have children that > we have to list too, i.e. how many of those do we want to recurse > into ? > > I can only think of usb hubs (maybe some node for multifunction > devices too ???). Maybe I'm not understanding how U-Boot works with > USB devices... but it looks like it would be enough to look for > UCLASS_USB_HUB, then recursive call, else no recursion. I mean instead of > > cmd/usb.c : usb_show_tree_graph() : > > if ((device_get_uclass_id( child ) != UCLASS_USB_EMUL) && > (device_get_uclass_id( child ) != UCLASS_BOOTDEV) && > (device_get_uclass_id( child ) != UCLASS_BLK)) { > usb_show_tree_graph(udev, pre); > pre[index] = 0; > } > > we could simply have > > if (device_get_uclass_id( dev->dev ) == UCLASS_USB_HUB) { > usb_show_tree_graph(udev, pre); > pre[index] = 0; > } > > (or put the condition directly before the for loop) > > Or can we have an UCLASS_USB_EMUL, UCLASS_BLK or UCLASS_BOOTDEV as > direct child of an UCLASS_USB_HUB ??? > > Am I opening any cans of worms ? From my memory, I think you can check for a USB hub instead, but I'm not completely sure. I suggest adding a test for the command (see test/dm/acpi.c for an example) since a test is the best way to ensure this doesn't happen again. Regards, Simon
El Tue, Jun 20, 2023 at 11:03:57AM +0100, Simon Glass deia: > Hi Xavier, > Hi Simon, > > > > It is also possible that one day a device that is not UCLASS_BLK, > > UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage > > device (just imagine a future system similar to bootstd for firmware > > updates, trust material, etc.). Is it likely to have a struct in > > parent_priv_ that is not a usb_device ? > > > > So which is more likely to survive future changes ? > > > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL > > > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK > > (my patch, overcautious ?) > > > > - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL) > > (Simon Glass' idea) > > > > - checking for not UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL > > and parent_priv_ not null > > Really the parent_priv thing is a separate issue, a side effect of > something having a UCLASS_USB parent. > I don't think it's a separate issue. If parent_priv is present it could be a usb_device (most likely) or not, but if it's null there's no way the recursive call can succeed. > The key point here is that we cannot iterate down into a bootdev > device looking for USB devices. So we should use that as the check, > since it is the most direct check. > But things keep appearing that have a UCLASS_USB* parent and no parent_priv. in 2017 Suneel Garapati already fixed the issue of UCLASS_BLK being a child of a device. Now it's UCLASS_BOOTDEV, and tomorrow may be something else. The most direct check will miss future cases as the devices tend to become more abstract instead of mapping one to one to physical stuff. > > >From my memory, I think you can check for a USB hub instead, but I'm > not completely sure. > On second thoughts I didn't find it so easy. There's the root hub, UCLASS_USB, I think and a UCLASS_USB_EMUL may also be a hub, so I don't know anymore how more elegant that could be, so I left it be. > I suggest adding a test for the command (see test/dm/acpi.c for an > example) since a test is the best way to ensure this doesn't happen > again. > Makes sense. But I don't have any more time for that, sorry. I think we'll have to leave it at this unless someone else has the time. Bye.
On Tue, 20 Jun 2023 at 12:20, Xavier Drudis Ferran <xdrudis@tinet.cat> wrote: > > El Tue, Jun 20, 2023 at 11:03:57AM +0100, Simon Glass deia: > > Hi Xavier, > > > > Hi Simon, > > > > > > > It is also possible that one day a device that is not UCLASS_BLK, > > > UCLASS_BOOTDEV or UCLASS_USB_EMUL is put as children of a usb storage > > > device (just imagine a future system similar to bootstd for firmware > > > updates, trust material, etc.). Is it likely to have a struct in > > > parent_priv_ that is not a usb_device ? > > > > > > So which is more likely to survive future changes ? > > > > > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL > > > > > > - checking for parent_priv_ not null and not UCLASS_USB_EMUL and not UCLASS_BLK > > > (my patch, overcautious ?) > > > > > > - checking for not (UCLASS_BLK, UCLASS_BOOTDEV or UCLASS_USB_EMUL) > > > (Simon Glass' idea) > > > > > > - checking for not UCLASS_BLK and not UCLASS_BOOTDEV and not UCLASS_USB_EMUL > > > and parent_priv_ not null > > > > Really the parent_priv thing is a separate issue, a side effect of > > something having a UCLASS_USB parent. > > > > I don't think it's a separate issue. If parent_priv is present it > could be a usb_device (most likely) or not, but if it's null there's > no way the recursive call can succeed. > > > The key point here is that we cannot iterate down into a bootdev > > device looking for USB devices. So we should use that as the check, > > since it is the most direct check. > > > > But things keep appearing that have a UCLASS_USB* parent and no > parent_priv. > in 2017 Suneel Garapati already fixed the issue of UCLASS_BLK > being a child of a device. Now it's UCLASS_BOOTDEV, and tomorrow > may be something else. > > The most direct check will miss future cases as the devices tend to > become more abstract instead of mapping one to one to physical stuff. > > > > > >From my memory, I think you can check for a USB hub instead, but I'm > > not completely sure. > > > > On second thoughts I didn't find it so easy. There's the root hub, > UCLASS_USB, I think and a UCLASS_USB_EMUL may also be a hub, so I > don't know anymore how more elegant that could be, so I left it be. > > > I suggest adding a test for the command (see test/dm/acpi.c for an > > example) since a test is the best way to ensure this doesn't happen > > again. > > > > Makes sense. But I don't have any more time for that, sorry. > > I think we'll have to leave it at this unless someone else has the time. Reviewed-by: Simon Glass <sjg@chromium.org>
diff --git a/cmd/usb.c b/cmd/usb.c index 73addb04c4..7e6065aa51 100644 --- a/cmd/usb.c +++ b/cmd/usb.c @@ -421,7 +421,8 @@ static void usb_show_tree_graph(struct usb_device *dev, char *pre) * Ignore emulators and block child devices, we only want * real devices */ - if ((device_get_uclass_id(child) != UCLASS_USB_EMUL) && + if (udev && + (device_get_uclass_id(child) != UCLASS_USB_EMUL) && (device_get_uclass_id(child) != UCLASS_BLK)) { usb_show_tree_graph(udev, pre); pre[index] = 0; @@ -607,7 +608,8 @@ static void usb_show_info(struct usb_device *udev) (device_get_uclass_id(child) != UCLASS_USB_EMUL) && (device_get_uclass_id(child) != UCLASS_BLK)) { udev = dev_get_parent_priv(child); - usb_show_info(udev); + if (udev) + usb_show_info(udev); } } }
Add a check to avoid dommed (by null pointer dereference) recursive call, not only for UCLASS_BLK. When booting my Rock Pi 4B+ with a USB mass storage stick plugged into one of the USB 2 ports (EHCI), when it is plugged before power on, when issuing a usb tree or usb info command I get a "Synchronous Error" and a reset just after printing the mass storage device in the usb tree or usb info. It might depend on the contents of the USB stick too, I'm not sure. It seems like I have two devices as children of the mass storage device. When there's only a UCLASS_BLK it works fine, but when there's a UCLASS_BLK and a UCLASS_BOOTDEV, it recurses with a null udev as first parameter and fails. Likewise for usb_show_info(). Not sure if this should be a patch, an RFC or a bug report. There may be a better way to solve this, I haven't researched commit 201417d700a2ab09 ("bootstd: Add the bootdev uclass") and bootdev flow properly, or thought about cases where udev is not null but the recursive call might need preventing too. Feel free to think it over before merging (or after). But this at least fixes a reset at an innocent looking usb tree or usb info command. Maybe we can improve it later? Cc: Simon Glass <sjg@chromium.org> Cc: Lukasz Majewski <lukma@denx.de> Cc: Marek Vasut <marex@denx.de> Signed-off-by: Xavier Drudis Ferran <xdrudis@tinet.cat> --- cmd/usb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)