diff mbox series

[V3] handler: ubivol: relax accessing UBI volume

Message ID 20240712080731.2238837-1-stefano.babic@swupdate.org
State Accepted
Headers show
Series [V3] handler: ubivol: relax accessing UBI volume | expand

Commit Message

Stefano Babic July 12, 2024, 8:07 a.m. UTC
It could be that some race conditions happens when SWUpdate tries to
lock the UBI volume for updating, for example after creating the volume
if udev is running and it accesses for some milliseconds. This is not an
issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
the volume before giving up, making the update itself more robust.

Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
Reported-by: Christian Eggers <ceggers@arri.de>
---
 handlers/ubivol_handler.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

Changes since V2:
	- check for EBUSY
Changes since V1:
	- fix comment
	- add missing close() in error case


--
2.34.1

Comments

Christian Eggers July 12, 2024, 10:12 a.m. UTC | #1
Hi Stefano,

for some unknown reason, I have difficulties today to reproduce
the race condition which initially lead to the error (maybe I would
need to use the identical board, I am working from at home today).

But your patch looks fine for me and testing (without having the
race condition) was successful.

Thank you very much
Christian

On Friday, 12 July 2024, 10:07:31 CEST, Stefano Babic wrote:
> It could be that some race conditions happens when SWUpdate tries to
> lock the UBI volume for updating, for example after creating the volume
> if udev is running and it accesses for some milliseconds. This is not an
> issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
> the volume before giving up, making the update itself more robust.
> 
> Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
> Reported-by: Christian Eggers <ceggers@arri.de>

Tested-by: Christian Eggers <ceggers@arri.de>

> ---
>  handlers/ubivol_handler.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> Changes since V2:
> 	- check for EBUSY
> Changes since V1:
> 	- fix comment
> 	- add missing close() in error case
> 
> 
> diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
> index 0ad0321e..bf1ec00a 100644
> --- a/handlers/ubivol_handler.c
> +++ b/handlers/ubivol_handler.c
> @@ -248,9 +248,30 @@ static int update_volume(libubi_t libubi, struct img_type *img,
>  		ERROR("cannot open UBI volume \"%s\"", node);
>  		return -1;
>  	}
> -	err = ubi_update_start(libubi, fdout, bytes);
> +
> +	unsigned int retries = 3;
> +	do {
> +		/*
> +		 * libubi just returns -1, errno can be checked
> +		 * for the source of errors.
> +		 * This should be changed in case ubi_update_start()
> +		 * will do own error recovery in future.
> +		 * Simply retries in case of error if the volume
> +		 * is accessed by another process
> +		 */
> +		err = ubi_update_start(libubi, fdout, bytes);
> +		retries--;
> +		if (err) {
> +			if (errno != EBUSY)
> +				break;
> +			WARN("Not possible to lock UBI, retry");
> +			sleep(1);
> +		}
> +	} while (err && retries > 0);
> +
>  	if (err) {
>  		ERROR("cannot start volume \"%s\" update", node);
> +		close(fdout);
>  		return -1;
>  	}
> 
> --
> 2.34.1
> 
>
Stefano Babic July 12, 2024, 10:50 a.m. UTC | #2
On 12.07.24 12:12, Christian Eggers wrote:
> Hi Stefano,
>
> for some unknown reason, I have difficulties today to reproduce
> the race condition which initially lead to the error (maybe I would
> need to use the identical board, I am working from at home today).
>
> But your patch looks fine for me and testing (without having the
> race condition) was successful.

Good, then I will merge it.

Stefano

>
> Thank you very much
> Christian
>
> On Friday, 12 July 2024, 10:07:31 CEST, Stefano Babic wrote:
>> It could be that some race conditions happens when SWUpdate tries to
>> lock the UBI volume for updating, for example after creating the volume
>> if udev is running and it accesses for some milliseconds. This is not an
>> issue inside SWUpdate, but SWUpdate can retry to get exclusive access of
>> the volume before giving up, making the update itself more robust.
>>
>> Signed-off-by: Stefano Babic <stefano.babic@swupdate.org>
>> Reported-by: Christian Eggers <ceggers@arri.de>
>
> Tested-by: Christian Eggers <ceggers@arri.de>
>
>> ---
>>   handlers/ubivol_handler.c | 23 ++++++++++++++++++++++-
>>   1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> Changes since V2:
>> 	- check for EBUSY
>> Changes since V1:
>> 	- fix comment
>> 	- add missing close() in error case
>>
>>
>> diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
>> index 0ad0321e..bf1ec00a 100644
>> --- a/handlers/ubivol_handler.c
>> +++ b/handlers/ubivol_handler.c
>> @@ -248,9 +248,30 @@ static int update_volume(libubi_t libubi, struct img_type *img,
>>   		ERROR("cannot open UBI volume \"%s\"", node);
>>   		return -1;
>>   	}
>> -	err = ubi_update_start(libubi, fdout, bytes);
>> +
>> +	unsigned int retries = 3;
>> +	do {
>> +		/*
>> +		 * libubi just returns -1, errno can be checked
>> +		 * for the source of errors.
>> +		 * This should be changed in case ubi_update_start()
>> +		 * will do own error recovery in future.
>> +		 * Simply retries in case of error if the volume
>> +		 * is accessed by another process
>> +		 */
>> +		err = ubi_update_start(libubi, fdout, bytes);
>> +		retries--;
>> +		if (err) {
>> +			if (errno != EBUSY)
>> +				break;
>> +			WARN("Not possible to lock UBI, retry");
>> +			sleep(1);
>> +		}
>> +	} while (err && retries > 0);
>> +
>>   	if (err) {
>>   		ERROR("cannot start volume \"%s\" update", node);
>> +		close(fdout);
>>   		return -1;
>>   	}
>>
>> --
>> 2.34.1
>>
>>
>
>
>
>
diff mbox series

Patch

diff --git a/handlers/ubivol_handler.c b/handlers/ubivol_handler.c
index 0ad0321e..bf1ec00a 100644
--- a/handlers/ubivol_handler.c
+++ b/handlers/ubivol_handler.c
@@ -248,9 +248,30 @@  static int update_volume(libubi_t libubi, struct img_type *img,
 		ERROR("cannot open UBI volume \"%s\"", node);
 		return -1;
 	}
-	err = ubi_update_start(libubi, fdout, bytes);
+
+	unsigned int retries = 3;
+	do {
+		/*
+		 * libubi just returns -1, errno can be checked
+		 * for the source of errors.
+		 * This should be changed in case ubi_update_start()
+		 * will do own error recovery in future.
+		 * Simply retries in case of error if the volume
+		 * is accessed by another process
+		 */
+		err = ubi_update_start(libubi, fdout, bytes);
+		retries--;
+		if (err) {
+			if (errno != EBUSY)
+				break;
+			WARN("Not possible to lock UBI, retry");
+			sleep(1);
+		}
+	} while (err && retries > 0);
+
 	if (err) {
 		ERROR("cannot start volume \"%s\" update", node);
+		close(fdout);
 		return -1;
 	}