diff mbox series

Re: [PATCH] handler: ubivol: relax accessing UBI volume

Message ID 8405030.T7Z3S40VBb@n9w6sw14
State Changes Requested
Headers show
Series Re: [PATCH] handler: ubivol: relax accessing UBI volume | expand

Commit Message

Christian Eggers July 10, 2024, 9:38 a.m. UTC
Hi Stefano,

thanks for preparing the patch.

On Tuesday, 9 July 2024, 21:24:25 CEST, Stefano Babic wrote:
> It could be that some race conditions happens when SWUpdate tries to
> lock the UBI volume for updating, for example after creating the volume
> if udev is running and it accesses for some milliseconds. This is not an
> issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
> the volume before giving up, making the update itself more robust.
> 
> Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
> Reported-by: Christian Eggers <ceggers@arri.de>
> ---
>  handlers/ubivol_handler.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
> index 0ad0321e..6d27c89f 100644
> --- a/handlers/ubivol_handler.c
> +++ b/handlers/ubivol_handler.c
> @@ -248,7 +248,22 @@ static int update_volume(libubi_t libubi, struct img_type *img,
>  		ERROR("cannot open UBI volume \"%s\"", node);
>  		return -1;
>  	}
> -	err = ubi_update_start(libubi, fdout, bytes);
> +
> +	unsigned int retries = 3;
> +	do {
> +		/*
> +		 * libubi just returns -1, no check is possible

I am quite sure that the value of errno (EBUSY) can be used in this case.

> +		 * Simply retries in case of error if the volume
> +		 * was locked by another process
s/was locked/is currently accessed/

> +		 */
> +		err = ubi_update_start(libubi, fdout, bytes);
> +		retries--;
> +		if (err) {
> +			WARN("Not possible to lock UBI, retry");
> +			sleep(1);
> +		}
> +	} while (err && retries > 0);
> +
>  	if (err) {
>  		ERROR("cannot start volume \"%s\" update", node);

Additionally here should be a close() statement.

>  		return -1;
> --
> 2.34.1
> 
> 


Based on your patch, I have developed and tested a slightly improved version
(attached). Do you like to post this as v2 (please set the *-by: tags as you like)?
Now I get the following output:

...
[TRACE] : SWUPDATE running :  [resize_volume] : Removed UBI Volume kernel1
[TRACE] : SWUPDATE running :  [resize_volume] : Created dynamic UBI volume kernel1 of 2671473 bytes (old size 2793472)
[   65.530992] ubi0 error: 0xc02587b9: 2 users for volume 1
[WARN ] : SWUPDATE running :  [update_volume] : Not possible to lock UBI, retry
[INFO ] : SWUPDATE running :  Installing image lios-fitimage-ubifs-orbiter.fitimg into volume /dev/ubi0_1(kernel1)
[TRACE] : SWUPDATE running :  [update_volume] : Updating UBI : lios-fitimage-ubifs-orbiter.fitimg 2671473
[TRACE] : SWUPDATE running :  [extract_files] : END INSTALLING STREAMING
[TRACE] : SWUPDATE running :  [extract_files] : Found file
[TRACE] : SWUPDATE running :  [extract_files] :         filename lios-rootfs-release-orbiter.ubifs
[TRACE] : SWUPDATE running :  [extract_files] :         size 33013760 required
[TRACE] : SWUPDATE running :  [extract_files] : Installing STREAM lios-rootfs-release-orbiter.ubifs, 33013760 bytes
[TRACE] : SWUPDATE running :  [install_single_image] : Found installer for stream lios-rootfs-release-orbiter.ubifs ubivol
[TRACE] : SWUPDATE running :  [resize_volume] : Removed UBI Volume rootfs1
[TRACE] : SWUPDATE running :  [resize_volume] : Created dynamic UBI volume rootfs1 of 33013760 bytes (old size 33013760)
[   67.583171] ubi0 error: 0xc02587b9: 2 users for volume 4
[WARN ] : SWUPDATE running :  [update_volume] : Not possible to lock UBI, retry
[INFO ] : SWUPDATE running :  Installing image lios-rootfs-release-orbiter.ubifs into volume /dev/ubi0_4(rootfs1)
[TRACE] : SWUPDATE running :  [update_volume] : Updating UBI : lios-rootfs-release-orbiter.ubifs 33013760
[TRACE] : SWUPDATE running :  [extract_files] : END INSTALLING STREAMING
...

At next I'll switch from v2023.12.1 to v2024.05.2.

regards,
Christian

Comments

Stefano Babic July 10, 2024, 9:54 a.m. UTC | #1
Hi Christian,

On 10.07.24 11:38, Christian Eggers wrote:
> Hi Stefano,
>
> thanks for preparing the patch.
>
> On Tuesday, 9 July 2024, 21:24:25 CEST, Stefano Babic wrote:
>> It could be that some race conditions happens when SWUpdate tries to
>> lock the UBI volume for updating, for example after creating the volume
>> if udev is running and it accesses for some milliseconds. This is not an
>> issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
>> the volume before giving up, making the update itself more robust.
>>
>> Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
>> Reported-by: Christian Eggers <ceggers@arri.de>
>> ---
>>   handlers/ubivol_handler.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
>> index 0ad0321e..6d27c89f 100644
>> --- a/handlers/ubivol_handler.c
>> +++ b/handlers/ubivol_handler.c
>> @@ -248,7 +248,22 @@ static int update_volume(libubi_t libubi, struct img_type *img,
>>   		ERROR("cannot open UBI volume \"%s\"", node);
>>   		return -1;
>>   	}
>> -	err = ubi_update_start(libubi, fdout, bytes);
>> +
>> +	unsigned int retries = 3;
>> +	do {
>> +		/*
>> +		 * libubi just returns -1, no check is possible
>
> I am quite sure that the value of errno (EBUSY) can be used in this case.
>

This is the code in mtd-utils:

1099 int ubi_update_start(libubi_t desc, int fd, long long bytes)
  1100 {
  1101         (void)desc;
  1102         if (ioctl(fd, UBI_IOCVOLUP, &bytes))
  1103                 return -1;
  1104         return 0;
  1105 }

The function rather discards the return code from kernel and I cannot
use it.


>> +		 * Simply retries in case of error if the volume
>> +		 * was locked by another process
> s/was locked/is currently accessed/

I fix in V2

>
>> +		 */
>> +		err = ubi_update_start(libubi, fdout, bytes);
>> +		retries--;
>> +		if (err) {
>> +			WARN("Not possible to lock UBI, retry");
>> +			sleep(1);
>> +		}
>> +	} while (err && retries > 0);
>> +
>>   	if (err) {
>>   		ERROR("cannot start volume \"%s\" update", node);
>
> Additionally here should be a close() statement.

Sure, thanks !

Best regards,
Stefano

>
>>   		return -1;
>> --
>> 2.34.1
>>
>>
>
>
> Based on your patch, I have developed and tested a slightly improved version
> (attached). Do you like to post this as v2 (please set the *-by: tags as you like)?
> Now I get the following output:
>
> ...
> [TRACE] : SWUPDATE running :  [resize_volume] : Removed UBI Volume kernel1
> [TRACE] : SWUPDATE running :  [resize_volume] : Created dynamic UBI volume kernel1 of 2671473 bytes (old size 2793472)
> [   65.530992] ubi0 error: 0xc02587b9: 2 users for volume 1
> [WARN ] : SWUPDATE running :  [update_volume] : Not possible to lock UBI, retry
> [INFO ] : SWUPDATE running :  Installing image lios-fitimage-ubifs-orbiter.fitimg into volume /dev/ubi0_1(kernel1)
> [TRACE] : SWUPDATE running :  [update_volume] : Updating UBI : lios-fitimage-ubifs-orbiter.fitimg 2671473
> [TRACE] : SWUPDATE running :  [extract_files] : END INSTALLING STREAMING
> [TRACE] : SWUPDATE running :  [extract_files] : Found file
> [TRACE] : SWUPDATE running :  [extract_files] :         filename lios-rootfs-release-orbiter.ubifs
> [TRACE] : SWUPDATE running :  [extract_files] :         size 33013760 required
> [TRACE] : SWUPDATE running :  [extract_files] : Installing STREAM lios-rootfs-release-orbiter.ubifs, 33013760 bytes
> [TRACE] : SWUPDATE running :  [install_single_image] : Found installer for stream lios-rootfs-release-orbiter.ubifs ubivol
> [TRACE] : SWUPDATE running :  [resize_volume] : Removed UBI Volume rootfs1
> [TRACE] : SWUPDATE running :  [resize_volume] : Created dynamic UBI volume rootfs1 of 33013760 bytes (old size 33013760)
> [   67.583171] ubi0 error: 0xc02587b9: 2 users for volume 4
> [WARN ] : SWUPDATE running :  [update_volume] : Not possible to lock UBI, retry
> [INFO ] : SWUPDATE running :  Installing image lios-rootfs-release-orbiter.ubifs into volume /dev/ubi0_4(rootfs1)
> [TRACE] : SWUPDATE running :  [update_volume] : Updating UBI : lios-rootfs-release-orbiter.ubifs 33013760
> [TRACE] : SWUPDATE running :  [extract_files] : END INSTALLING STREAMING
> ...
>
> At next I'll switch from v2023.12.1 to v2024.05.2.
>
> regards,
> Christian
>
Christian Eggers July 10, 2024, 10:14 a.m. UTC | #2
On Wednesday, 10 July 2024, 11:54:15 CEST, Stefano Babic wrote:
> Hi Christian,
> 
> ...
> 
> This is the code in mtd-utils:
> 
> 1099 int ubi_update_start(libubi_t desc, int fd, long long bytes)
>   1100 {
>   1101         (void)desc;
>   1102         if (ioctl(fd, UBI_IOCVOLUP, &bytes))
>   1103                 return -1;
>   1104         return 0;
>   1105 }
> 
> The function rather discards the return code from kernel and I cannot
> use it.

The return code of ioctl() usually only indicates success (0) or 
error (-1). In case of an error, the actual error code can be accessed via
the 'errno' variable (see the patch file attached in my previous mail).

Instead of retrying regardless of the error, I would suggest doing this only
in case of EBUSY. This is a little bit like calling read() or write() a
second time after getting EAGAIN in case of non-blocking I/O.

> 
> 
> >> +		 * Simply retries in case of error if the volume
> >> +		 * was locked by another process
> > s/was locked/is currently accessed/
> 
> I fix in V2
> 
> >
> >> +		 */
> >> +		err = ubi_update_start(libubi, fdout, bytes);
> >> +		retries--;
> >> +		if (err) {
> >> +			WARN("Not possible to lock UBI, retry");
> >> +			sleep(1);
> >> +		}
> >> +	} while (err && retries > 0);
> >> +
> >>   	if (err) {
> >>   		ERROR("cannot start volume \"%s\" update", node);
> >
> > Additionally here should be a close() statement.
> 
> Sure, thanks !
> 
> Best regards,
> Stefano
> 
regards,
Christian
Stefano Babic July 10, 2024, 10:23 a.m. UTC | #3
Hi Christian,

On 10.07.24 12:14, Christian Eggers wrote:
> On Wednesday, 10 July 2024, 11:54:15 CEST, Stefano Babic wrote:
>> Hi Christian,
>>
>> ...
>>
>> This is the code in mtd-utils:
>>
>> 1099 int ubi_update_start(libubi_t desc, int fd, long long bytes)
>>    1100 {
>>    1101         (void)desc;
>>    1102         if (ioctl(fd, UBI_IOCVOLUP, &bytes))
>>    1103                 return -1;
>>    1104         return 0;
>>    1105 }
>>
>> The function rather discards the return code from kernel and I cannot
>> use it.
>
> The return code of ioctl() usually only indicates success (0) or
> error (-1). In case of an error, the actual error code can be accessed via
> the 'errno' variable (see the patch file attached in my previous mail).
>

Yes, but this should be done inside libubi. Doing in SWUpdate is the
wrong place. SWUpdate just use this library and shouldn't know how it is
implemented. It is an API.

Your proposal is, well, we know that libubi is not checking, we know the
implementation (but well, as library user we should just know the API
reported by libubi.h)  but we know that it calls ioctl() and errno is
set. But it is not the right way.

The right way should be to check errno inside ubi_update_start() and
return errno instead of a fix -1.

The API just says:

  /**
   * ubi_update_start - start UBI volume update.
   * @desc: UBI library descriptor
   * @fd: volume character device file descriptor
   * @bytes: how many bytes will be written to the volume
   *
   * This function initiates UBI volume update and returns %0 in case of
success
   * and %-1 in case of error. The caller is assumed to write @bytes
data to the
   * volume @fd afterward.
   */


> Instead of retrying regardless of the error, I would suggest doing this only
> in case of EBUSY. This is a little bit like calling read() or write() a
> second time after getting EAGAIN in case of non-blocking I/O.

It will be better, but breaks API. I suggest to send a patch to
mtd-utils instead.

Best regards,
Stefano

>
>>
>>
>>>> +		 * Simply retries in case of error if the volume
>>>> +		 * was locked by another process
>>> s/was locked/is currently accessed/
>>
>> I fix in V2
>>
>>>
>>>> +		 */
>>>> +		err = ubi_update_start(libubi, fdout, bytes);
>>>> +		retries--;
>>>> +		if (err) {
>>>> +			WARN("Not possible to lock UBI, retry");
>>>> +			sleep(1);
>>>> +		}
>>>> +	} while (err && retries > 0);
>>>> +
>>>>    	if (err) {
>>>>    		ERROR("cannot start volume \"%s\" update", node);
>>>
>>> Additionally here should be a close() statement.
>>
>> Sure, thanks !
>>
>> Best regards,
>> Stefano
>>
> regards,
> Christian
>
>
>
diff mbox series

Patch

From fd2e8f9cc862aa33b3b819180762306ef69f0028 Mon Sep 17 00:00:00 2001
From: Christian Eggers <ceggers@arri.de>
Date: Wed, 10 Jul 2024 09:57:27 +0200
Subject: [PATCH 2/2] handler: ubivol: relax accessing UBI volume

It could be that some race conditions happens when SWUpdate tries to
lock the UBI volume for updating, for example after creating the volume
if udev is running and it accesses for some milliseconds. This is not an
issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
the volume before giving up, making the update itself more robust.

Co-developed-by: Stefano Babic <stefano.babic@swupdate.org>
Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
Signed-off-by: Christian Eggers <ceggers@arri.de>
---
 handlers/ubivol_handler.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
index 2ad0bc3c9db1..5f2aacf60ec4 100644
--- a/handlers/ubivol_handler.c
+++ b/handlers/ubivol_handler.c
@@ -225,9 +225,23 @@  static int update_volume(libubi_t libubi, struct img_type *img,
 		ERROR("cannot open UBI volume \"%s\"", node);
 		return -1;
 	}
-	err = ubi_update_start(libubi, fdout, bytes);
+
+	unsigned int retries = 3;
+	do {
+		/*
+		 * Retry only in case of error if the volume
+		 * is currently accessed by another process (e.g. udevd)
+		 */
+		err = ubi_update_start(libubi, fdout, bytes);
+		if (!err || errno != EBUSY || !--retries )
+			break;
+
+		WARN("Not possible to lock UBI, retry");
+		sleep(1);
+	} while (1);
+
 	if (err) {
-		ERROR("cannot start volume \"%s\" update", node);
+		ERROR("cannot start volume \"%s\" update: %s", node, strerror(errno));
 		close(fdout);
 		return -1;
 	}
-- 
2.44.0