diff mbox

[U-Boot,1/1] core/uclass: uclass_get_device_tail: always set devp

Message ID 1492541089-12778-1-git-send-email-xypron.glpk@gmx.de
State Superseded
Delegated to: Simon Glass
Headers show

Commit Message

Heinrich Schuchardt April 18, 2017, 6:44 p.m. UTC
Set devp even if probing fails.

Without the patch the following problem occurs:
If the first block device is not probed successfully no block
device is passed by bootefi to the EFI executable.

The problem was reported by Andreas Färber in
https://lists.denx.de/pipermail/u-boot/2017-April/287432.html

For testing I used an odroid-c2 with a dts including
&sd_emmc_a {
	status = "okay";
}
This device does not exist on the board and cannot be initialized.

Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
---
 drivers/core/uclass.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Simon Glass April 19, 2017, 12:12 a.m. UTC | #1
Hi,

On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
> Set devp even if probing fails.
>
> Without the patch the following problem occurs:
> If the first block device is not probed successfully no block
> device is passed by bootefi to the EFI executable.
>
> The problem was reported by Andreas Färber in
> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html
>
> For testing I used an odroid-c2 with a dts including
> &sd_emmc_a {
>         status = "okay";
> }
> This device does not exist on the board and cannot be initialized.

Thanks for finding this. Which function is calling this? Can you
please explain the call sequence to hit this problem? I am worried
that something else is wrong.

>
> Reported-by: Andreas Färber <afaerber@suse.de>
> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> ---
>  drivers/core/uclass.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/core/uclass.c b/drivers/core/uclass.c
> index 04fb45b..b647384 100644
> --- a/drivers/core/uclass.c
> +++ b/drivers/core/uclass.c
> @@ -349,12 +349,13 @@ int uclass_get_device_tail(struct udevice *dev, int ret,
>                 return ret;
>
>         assert(dev);
> +
> +       *devp = dev;
> +
>         ret = device_probe(dev);
>         if (ret)
>                 return ret;
>
> -       *devp = dev;
> -
>         return 0;
>  }
>
> --
> 2.1.4
>

Regards,
Simon
Andreas Färber April 19, 2017, 3:03 a.m. UTC | #2
Hi Simon,

Am 19.04.2017 um 02:12 schrieb Simon Glass:
> On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>> Set devp even if probing fails.
>>
>> Without the patch the following problem occurs:
>> If the first block device is not probed successfully no block
>> device is passed by bootefi to the EFI executable.
>>
>> The problem was reported by Andreas Färber in
>> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html
>>
>> For testing I used an odroid-c2 with a dts including
>> &sd_emmc_a {
>>         status = "okay";
>> }
>> This device does not exist on the board and cannot be initialized.
> 
> Thanks for finding this. Which function is calling this? Can you
> please explain the call sequence to hit this problem? I am worried
> that something else is wrong.

It is lib/efi_loader/efi_disk.c:efi_disk_register() that is calling
uclass_first_device() and uclass_next_device(). Based on the output we
had concluded that uclass_first_device() fails already - therefore Alex
CC'ed you.

Unfortunately I find the function interactions inside uclass.c rather
complex and unintuitive, so I am truely amazed at Heinrich's findings.
(Passing the ret code from one function to the next just to return?!)

Based on his patch we can assume that uclass_find_first_device() set dev
to a non-NULL value, otherwise we wouldn't end up in
uclass_get_device_tail().

So if you're saying this patch is wrong, then that would leave
device_probe(). The actual meson_mmc_probe() function you had given a
late Reviewed-by, and it always returns 0.

Jaehoon had asked about cfg->voltages in meson_mmc_probe(), but I don't
see how that would lead to probe failure here? Whether the values are
correct I have no idea.

The error message "mmc_init: -95, time 1807", which appears to be a
result of this iteration/probe process, comes from
drivers/mmc/mmc.c:mmc_init(), which is called from
mmc-uclass.c:mmc_blk_probe(). mmc_init() calls mmc_start_init(), which
returns this -EOPNOTSUPP if sd_send_op_cond return -ETIMEDOUT and
mmc_send_op_cond() failed. However, our output does not show "Card did
not respond to voltage select!", which it should in that case. So the
error must be coming from elsewhere. sd_send_op_cond() may return
-EOPNOTSUPP through mmc_start_init(), and so does mmc_complete_op_cond()
through mmc_complete_init(). I would expect either to fail, as in my
case it's an SDIO, and in Heinrich's case it's not connected to
anything. So I don't think MMC is at fault here.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/amlogic/meson-gxbb-vega-s95.dtsi

See sd_mmc_a, which uses an mmc-pwrseq that depends on a pwm-clock,
neither of which have any support in U-Boot, nor is it actually needed
for booting.

In any case, failure to probe one device should not break the iteration
of other devices. I.e., if a device fails to probe then instead of
returning from uclass_{first,next}_device() we should look at the next
device until we find one that's okay or have reached the end of the
list. How to best implement that in uclass.c I'm less sure of.

If I am right you could test this on any board with multiple, e.g., blk
devices where a first or non-last device returns an error code from its
probe, i.e. try changing return 0 in some probe function based on a
static variable not yet being initialized or something. Probably you
could even write some tiny sandbox based test, without actual hardware.

IIUC this patch changes behavior from iterating over no devices to
iterating over all devices including ones not probed successfully? And
what you intended was to instead iterate over only the probed devices?

Regards,
Andreas
Simon Glass April 19, 2017, 3:37 a.m. UTC | #3
Hi,

On 18 April 2017 at 21:03, Andreas Färber <afaerber@suse.de> wrote:
> Hi Simon,
>
> Am 19.04.2017 um 02:12 schrieb Simon Glass:
>> On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>>> Set devp even if probing fails.
>>>
>>> Without the patch the following problem occurs:
>>> If the first block device is not probed successfully no block
>>> device is passed by bootefi to the EFI executable.
>>>
>>> The problem was reported by Andreas Färber in
>>> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html
>>>
>>> For testing I used an odroid-c2 with a dts including
>>> &sd_emmc_a {
>>>         status = "okay";
>>> }
>>> This device does not exist on the board and cannot be initialized.
>>
>> Thanks for finding this. Which function is calling this? Can you
>> please explain the call sequence to hit this problem? I am worried
>> that something else is wrong.
>
> It is lib/efi_loader/efi_disk.c:efi_disk_register() that is calling
> uclass_first_device() and uclass_next_device(). Based on the output we
> had concluded that uclass_first_device() fails already - therefore Alex
> CC'ed you.
>
> Unfortunately I find the function interactions inside uclass.c rather
> complex and unintuitive, so I am truely amazed at Heinrich's findings.
> (Passing the ret code from one function to the next just to return?!)
>
> Based on his patch we can assume that uclass_find_first_device() set dev
> to a non-NULL value, otherwise we wouldn't end up in
> uclass_get_device_tail().
>
> So if you're saying this patch is wrong, then that would leave
> device_probe(). The actual meson_mmc_probe() function you had given a
> late Reviewed-by, and it always returns 0.
>
> Jaehoon had asked about cfg->voltages in meson_mmc_probe(), but I don't
> see how that would lead to probe failure here? Whether the values are
> correct I have no idea.
>
> The error message "mmc_init: -95, time 1807", which appears to be a
> result of this iteration/probe process, comes from
> drivers/mmc/mmc.c:mmc_init(), which is called from
> mmc-uclass.c:mmc_blk_probe(). mmc_init() calls mmc_start_init(), which
> returns this -EOPNOTSUPP if sd_send_op_cond return -ETIMEDOUT and
> mmc_send_op_cond() failed. However, our output does not show "Card did
> not respond to voltage select!", which it should in that case. So the
> error must be coming from elsewhere. sd_send_op_cond() may return
> -EOPNOTSUPP through mmc_start_init(), and so does mmc_complete_op_cond()
> through mmc_complete_init(). I would expect either to fail, as in my
> case it's an SDIO, and in Heinrich's case it's not connected to
> anything. So I don't think MMC is at fault here.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/amlogic/meson-gxbb-vega-s95.dtsi
>
> See sd_mmc_a, which uses an mmc-pwrseq that depends on a pwm-clock,
> neither of which have any support in U-Boot, nor is it actually needed
> for booting.
>
> In any case, failure to probe one device should not break the iteration
> of other devices. I.e., if a device fails to probe then instead of
> returning from uclass_{first,next}_device() we should look at the next
> device until we find one that's okay or have reached the end of the
> list. How to best implement that in uclass.c I'm less sure of.
>
> If I am right you could test this on any board with multiple, e.g., blk
> devices where a first or non-last device returns an error code from its
> probe, i.e. try changing return 0 in some probe function based on a
> static variable not yet being initialized or something. Probably you
> could even write some tiny sandbox based test, without actual hardware.
>
> IIUC this patch changes behavior from iterating over no devices to
> iterating over all devices including ones not probed successfully? And
> what you intended was to instead iterate over only the probed devices?

I think this is a genuine bug, but exactly where is less obvious.

The uclass function don't return a device if they get an error. So at
present you need to iterate through the uclass if the intention is to
continue after error. That would be uclass_foreach_dev(). But before
using the device, device_probe() needs to be called since the
iteration does not probe it automatically. That is why people normally
use uclass_first/next_device(), since it probes as it goes.

However in this case I think it is reasonable to change
uclass_first/next_device() to always return the device, even on error.
That can best be done by changing those functions rather than
uclass_get_device_tail(), which is shared by many functions. We should
not change uclass_get_device_tail() since what is does is correct for
all other uses (I think).

In that case, the function comments for the first/next functions
should be updated to indicate the new behaviour, and a test created,
perhaps in test/dm/core.c.

You were asking about uclass_get_device_tail(). This is common code
and is put into a function to ensure consistent behaviour across all
functions that return an error code and a device. Consistency is very
important with driver model since things can get version confusing
when something odd happens in the bowels of the system, as this bug
has shown.

Regards,
Simon
Heinrich Schuchardt April 19, 2017, 4:48 a.m. UTC | #4
On 04/19/2017 05:37 AM, Simon Glass wrote:
> Hi,
> 
> On 18 April 2017 at 21:03, Andreas Färber <afaerber@suse.de> wrote:
>> Hi Simon,
>>
>> Am 19.04.2017 um 02:12 schrieb Simon Glass:
>>> On 18 April 2017 at 12:44, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote:
>>>> Set devp even if probing fails.
>>>>
>>>> Without the patch the following problem occurs:
>>>> If the first block device is not probed successfully no block
>>>> device is passed by bootefi to the EFI executable.
>>>>
>>>> The problem was reported by Andreas Färber in
>>>> https://lists.denx.de/pipermail/u-boot/2017-April/287432.html
>>>>
>>>> For testing I used an odroid-c2 with a dts including
>>>> &sd_emmc_a {
>>>>         status = "okay";
>>>> }
>>>> This device does not exist on the board and cannot be initialized.
>>>
>>> Thanks for finding this. Which function is calling this? Can you
>>> please explain the call sequence to hit this problem? I am worried
>>> that something else is wrong.
>>
>> It is lib/efi_loader/efi_disk.c:efi_disk_register() that is calling
>> uclass_first_device() and uclass_next_device(). Based on the output we
>> had concluded that uclass_first_device() fails already - therefore Alex
>> CC'ed you.
>>
>> Unfortunately I find the function interactions inside uclass.c rather
>> complex and unintuitive, so I am truely amazed at Heinrich's findings.
>> (Passing the ret code from one function to the next just to return?!)
>>
>> Based on his patch we can assume that uclass_find_first_device() set dev
>> to a non-NULL value, otherwise we wouldn't end up in
>> uclass_get_device_tail().
>>
>> So if you're saying this patch is wrong, then that would leave
>> device_probe(). The actual meson_mmc_probe() function you had given a
>> late Reviewed-by, and it always returns 0.
>>
>> Jaehoon had asked about cfg->voltages in meson_mmc_probe(), but I don't
>> see how that would lead to probe failure here? Whether the values are
>> correct I have no idea.
>>
>> The error message "mmc_init: -95, time 1807", which appears to be a
>> result of this iteration/probe process, comes from
>> drivers/mmc/mmc.c:mmc_init(), which is called from
>> mmc-uclass.c:mmc_blk_probe(). mmc_init() calls mmc_start_init(), which
>> returns this -EOPNOTSUPP if sd_send_op_cond return -ETIMEDOUT and
>> mmc_send_op_cond() failed. However, our output does not show "Card did
>> not respond to voltage select!", which it should in that case. So the
>> error must be coming from elsewhere. sd_send_op_cond() may return
>> -EOPNOTSUPP through mmc_start_init(), and so does mmc_complete_op_cond()
>> through mmc_complete_init(). I would expect either to fail, as in my
>> case it's an SDIO, and in Heinrich's case it's not connected to
>> anything. So I don't think MMC is at fault here.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/arm64/boot/dts/amlogic/meson-gxbb-vega-s95.dtsi
>>
>> See sd_mmc_a, which uses an mmc-pwrseq that depends on a pwm-clock,
>> neither of which have any support in U-Boot, nor is it actually needed
>> for booting.
>>
>> In any case, failure to probe one device should not break the iteration
>> of other devices. I.e., if a device fails to probe then instead of
>> returning from uclass_{first,next}_device() we should look at the next
>> device until we find one that's okay or have reached the end of the
>> list. How to best implement that in uclass.c I'm less sure of.
>>
>> If I am right you could test this on any board with multiple, e.g., blk
>> devices where a first or non-last device returns an error code from its
>> probe, i.e. try changing return 0 in some probe function based on a
>> static variable not yet being initialized or something. Probably you
>> could even write some tiny sandbox based test, without actual hardware.
>>
>> IIUC this patch changes behavior from iterating over no devices to
>> iterating over all devices including ones not probed successfully? And
>> what you intended was to instead iterate over only the probed devices?
> 
> I think this is a genuine bug, but exactly where is less obvious.
> 
> The uclass function don't return a device if they get an error. So at
> present you need to iterate through the uclass if the intention is to
> continue after error. That would be uclass_foreach_dev(). But before
> using the device, device_probe() needs to be called since the
> iteration does not probe it automatically. That is why people normally
> use uclass_first/next_device(), since it probes as it goes.
> 
> However in this case I think it is reasonable to change
> uclass_first/next_device() to always return the device, even on error.
> That can best be done by changing those functions rather than
> uclass_get_device_tail(), which is shared by many functions. We should
> not change uclass_get_device_tail() since what is does is correct for
> all other uses (I think).
> 
> In that case, the function comments for the first/next functions
> should be updated to indicate the new behaviour, and a test created,
> perhaps in test/dm/core.c.
> 
> You were asking about uclass_get_device_tail(). This is common code
> and is put into a function to ensure consistent behaviour across all
> functions that return an error code and a device. Consistency is very
> important with driver model since things can get version confusing
> when something odd happens in the bowels of the system, as this bug
> has shown.
> 
> Regards,
> Simon
> 

Thank you Simon for reviewing. I will try to rework the patch.

For reference I append the debug output with and without the current
patch. This should make the call sequence clear.

In uclass_first_device: id 12
12 refers to UCLASS_BLK.

Best regards

Heinrich Schuchardt


Without [PATCH 1/1] core/uclass: uclass_get_device_tail: always set devp

=> bootefi hello
## Starting EFI application at 01000000 ...
WARNING: Invalid device tree, expect boot to fail
efi_add_memory_map: 0x7cf50000 0x1 2 yes
efi_disk_register
uclass_first_device: id 12
uclass_find_first_device: id 12
uclass_get_device_tail: ret 0
uclass_resolve_seq:
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq -1, find_req_seq 0
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq 0, find_req_seq 0
mmc@70000.blk - -1 -1
mmc@72000.blk - -1 -1
mmc@74000.blk - -1 -1
   - not found
uclass_get_device: id 41, index 0
uclass_get_device_tail: ret 0
set_state_simple op missing
find_mmc_device: dev_num 0
blk_get_device: if_type=6, devnum=0: mmc@70000.blk, 6, 0
mmc_blk_probe
mmc_init: -95, time 1807
Found 0 disks
uclass_first_device: id 18
uclass_find_first_device: id 18
uclass_get_device_tail: ret 0
efi_add_memory_map: 0x7cf4f000 0x1 6 yes
do_bootefi_exec:234 Jumping to 0x7cf50148
EFI: Entry efi_cout_output_string(000000007ff93ac0, 000000007cf50274)
Hello, world!
EFI: Entry efi_exit(000000007ffa47d0, 0, 0, 0000000000000000)
## Application terminated, r = 0
=>


With [PATCH 1/1] core/uclass: uclass_get_device_tail: always set devp

=> bootefi hello
## Starting EFI application at 01000000 ...
WARNING: Invalid device tree, expect boot to fail
efi_add_memory_map: 0x7cf50000 0x1 2 yes
efi_disk_register
uclass_first_device: id 12
uclass_find_first_device: id 12
uclass_get_device_tail: ret 0
uclass_resolve_seq:
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq -1, find_req_seq 0
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq 0, find_req_seq 0
mmc@70000.blk - -1 -1
mmc@72000.blk - -1 0
   - found
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq 1, find_req_seq 0
mmc@70000.blk - -1 -1
mmc@72000.blk - -1 0
mmc@74000.blk - -1 -1
   - not found
uclass_get_device: id 41, index 0
uclass_get_device_tail: ret 0
set_state_simple op missing
find_mmc_device: dev_num 0
blk_get_device: if_type=6, devnum=0: mmc@70000.blk, 6, 0
mmc_blk_probe
mmc_init: -95, time 1806
Scanning disk mmc@70000.blk...
uclass_next_device:
uclass_find_next_device:
uclass_get_device_tail: ret 0
Scanning disk mmc@72000.blk...
Adding disk device 'mmc@72000.blk'
uclass_next_device:
uclass_find_next_device:
uclass_get_device_tail: ret 0
uclass_resolve_seq:
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq -1, find_req_seq 0
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq 0, find_req_seq 0
mmc@70000.blk - -1 -1
mmc@72000.blk - -1 0
   - found
uclass_find_device_by_seq: uclass_id 0, seq_or_req_seq 1, find_req_seq 0
mmc@70000.blk - -1 -1
mmc@72000.blk - -1 0
mmc@74000.blk - -1 -1
   - not found
uclass_get_device: id 41, index 0
uclass_get_device_tail: ret 0
set_state_simple op missing
find_mmc_device: dev_num 2
blk_get_device: if_type=6, devnum=2: mmc@70000.blk, 6, 0
blk_get_device: if_type=6, devnum=2: mmc@72000.blk, 6, 1
blk_get_device: if_type=6, devnum=2: mmc@74000.blk, 6, 2
mmc_blk_probe
Card did not respond to voltage select!
mmc_init: -95, time 10
Scanning disk mmc@74000.blk...
uclass_next_device:
uclass_find_next_device:
Found 3 disks
efi_add_memory_map: 0x7cf4f000 0x1 6 yes
do_bootefi_exec:234 Jumping to 0x7cf50148
EFI: Entry efi_cout_output_string(000000007ff93ad0, 000000007cf50274)
Hello, world!
EFI: Entry efi_exit(000000007ffa47e0, 0, 0, 0000000000000000)
## Application terminated, r = 0
=>
diff mbox

Patch

diff --git a/drivers/core/uclass.c b/drivers/core/uclass.c
index 04fb45b..b647384 100644
--- a/drivers/core/uclass.c
+++ b/drivers/core/uclass.c
@@ -349,12 +349,13 @@  int uclass_get_device_tail(struct udevice *dev, int ret,
 		return ret;
 
 	assert(dev);
+
+	*devp = dev;
+
 	ret = device_probe(dev);
 	if (ret)
 		return ret;
 
-	*devp = dev;
-
 	return 0;
 }