diff mbox series

Fix REQ_INSTALL/GET_STATUS synchronization

Message ID a2f31237-4a73-4431-be65-b096219270f6@witekio.com
State Changes Requested
Headers show
Series Fix REQ_INSTALL/GET_STATUS synchronization | expand

Commit Message

fhoerni.opensource July 10, 2024, 8:28 a.m. UTC
From time to time, there was an install error raised by swupdate-client when
the daemon processed GET_STATUS too quickly after a REQ_INSTALL:

- when starting, inst.last_install=0, inst.status=IDLE
- swupdate-client:
     * sends REQ_INSTALL to the daemon
     * ... followed by the payload (the image), which is buffered by the OS
     * ... and swupdate-client continues its execution whereas the daemon has
           not started receiving the payload so far
     * sends REQ_STATUS to the daemon to get the progress status
- swupdate daemon:
     * receives REQ_INSTALL and sends ACK
     * (1) receives GET_STATUS and answers with inst.last_install=0,
       inst.status=IDLE
     * (2) notifies another thread that sets inst.status=RUN and proceeds with
       the installation
- swupdate-client:
     * receives GET_STATUS with inst.last_install=0, inst.status=IDLE
     * ... and considers this as a failure, and exits with code 1 (error)
- swupdate daemon:
     * continues normally and succeeds

The fact that (1) occurs before (2) is the root cause of the issue as it gives
a wrong information to the client.

It happens when the thread "network_thread" re-acquires the mutex before
pthread_cond_wait (in the thread "network_initializer") re-acquires it himself
(the re-acquiring of the mutex is not said atomic in the man page).

This has been seen on some machines with a swu image of 20480 bytes.

Other improvements:
- remove useless stream_wkup
- move locking to before starting the other thread in order not to miss the
   first signal
- do not release the mutex between status=IDLE and pthread_cond_wait in order
   not to miss a signal

Signed-off-by: Frederic Hoerni <fhoerni@witekio.com>

---
  core/network_thread.c       |  4 +++-
  core/stream_interface.c     | 24 ++++++++++--------------
  include/network_interface.h |  1 -
  3 files changed, 13 insertions(+), 16 deletions(-)

Comments

Stefano Babic July 17, 2024, 9:51 a.m. UTC | #1
Hi Frederic,

On 10.07.24 10:28, 'Frederic Hoerni' via swupdate wrote:
>  From time to time, there was an install error raised by swupdate-client 
> when
> the daemon processed GET_STATUS too quickly after a REQ_INSTALL:
> 
> - when starting, inst.last_install=0, inst.status=IDLE
> - swupdate-client:
>      * sends REQ_INSTALL to the daemon
>      * ... followed by the payload (the image), which is buffered by the OS
>      * ... and swupdate-client continues its execution whereas the 
> daemon has
>            not started receiving the payload so far
>      * sends REQ_STATUS to the daemon to get the progress status
> - swupdate daemon:
>      * receives REQ_INSTALL and sends ACK
>      * (1) receives GET_STATUS and answers with inst.last_install=0,
>        inst.status=IDLE
>      * (2) notifies another thread that sets inst.status=RUN and 
> proceeds with
>        the installation
> - swupdate-client:
>      * receives GET_STATUS with inst.last_install=0, inst.status=IDLE
>      * ... and considers this as a failure, and exits with code 1 (error)
> - swupdate daemon:
>      * continues normally and succeeds
> 

Thanks for analyses and yes this can happen when the SWU is very small.

However, your patch is putting the status back like the first 
implementation. Later, commit 7e80ad491 introduced the condition to be 
checked ( stream_wkup ). According to several documentation about 
pthread_cond_wait(), the condition could change before the awakened 
thread locks the mutex and returns from  pthread_cond_wait(), so the 
recommended method is like the one in commit 7e80ad491, that you are 
removing.

Apart of this, you are moving the writer of inst.status to another 
thread, and this seems dangerous. The network thread has just received 
the message to trigger an install, but this is managed by the thread 
defined in network_initializer (core/stream_interface.c). This is the 
only writer for that structure., and in fact, it is in charge to start / 
reject  an update.

I would say that the issue is outside this code. The GET_STATUS message 
(polling) was the first implementation to retrieve it. Later, I 
implemented the "progress" interface, that means an event driven, whose 
events are managed by the core, and the race cannot happen. But this 
requires to listen for events to get the changes in the status machinery 
(IDLE --> RUN --> [SUCCESS | FAILURE].

Best regards,
Stefano Babic

> The fact that (1) occurs before (2) is the root cause of the issue as it 
> gives
> a wrong information to the client.
> 
> It happens when the thread "network_thread" re-acquires the mutex before
> pthread_cond_wait (in the thread "network_initializer") re-acquires it 
> himself
> (the re-acquiring of the mutex is not said atomic in the man page).
> 
> This has been seen on some machines with a swu image of 20480 bytes.
> 
> Other improvements:
> - remove useless stream_wkup
> - move locking to before starting the other thread in order not to miss the
>    first signal
> - do not release the mutex between status=IDLE and pthread_cond_wait in 
> order
>    not to miss a signal
> 
> Signed-off-by: Frederic Hoerni <fhoerni@witekio.com>
> 
> ---
>   core/network_thread.c       |  4 +++-
>   core/stream_interface.c     | 24 ++++++++++--------------
>   include/network_interface.h |  1 -
>   3 files changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/core/network_thread.c b/core/network_thread.c
> index d7b713fb..e033f3ab 100644
> --- a/core/network_thread.c
> +++ b/core/network_thread.c
> @@ -486,8 +486,10 @@ void *network_thread (void *data)
>                                                   /* Drop all old 
> notification from last run */
>                                                   cleanum_msg_list();
> 
> +                                                /* Switch to RUN */
> +                                                instp->status = RUN;
> +
>                                                   /* Wake-up the 
> installer */
> -                                                stream_wkup = true;
>                                                   
> pthread_cond_signal(&stream_cond);
>                                           } else {
>                                                   msg.type = NACK;
> diff --git a/core/stream_interface.c b/core/stream_interface.c
> index 5f3ad2e3..59449815 100644
> --- a/core/stream_interface.c
> +++ b/core/stream_interface.c
> @@ -67,7 +67,6 @@ static pthread_t network_thread_id;
>    * reception of an install request
>    *
>    */
> -bool stream_wkup = false;
>   pthread_mutex_t stream_mutex = PTHREAD_MUTEX_INITIALIZER;
>   pthread_cond_t stream_cond = PTHREAD_COND_INITIALIZER;
> 
> @@ -533,6 +532,9 @@ void *network_initializer(void *data)
>           inst.status = IDLE;
>           inst.software = software;
> 
> +        /* Lock in order not to miss the first signal (before 
> pthread_cond_wait) */
> +        pthread_mutex_lock(&stream_mutex);
> +
>           /* fork off the local dialogs and network service */
>           network_thread_id = start_thread(network_thread, &inst);
> 
> @@ -543,12 +545,7 @@ void *network_initializer(void *data)
>                   ret = 0;
> 
>                   /* wait for someone to issue an install request */
> -                pthread_mutex_lock(&stream_mutex);
> -                while (stream_wkup != true) {
> -                        pthread_cond_wait(&stream_cond, &stream_mutex);
> -                }
> -                stream_wkup = false;
> -                inst.status = RUN;
> +                pthread_cond_wait(&stream_cond, &stream_mutex);
>                   pthread_mutex_unlock(&stream_mutex);
>                   notify(START, RECOVERY_NO_ERROR, INFOLEVEL, "Software 
> Update started !");
>                   TRACE("Software update started");
> @@ -731,13 +728,6 @@ void *network_initializer(void *data)
>                   swupdate_remove_directory(DATADST_DIR_SUFFIX);
>   #endif
> 
> -                pthread_mutex_lock(&stream_mutex);
> -                inst.status = IDLE;
> -                inst.req.source = SOURCE_UNKNOWN;
> -                pthread_mutex_unlock(&stream_mutex);
> -                TRACE("Main thread sleep again !");
> -                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for 
> requests...");
> -
>                   /*
>                    * Last step, if no restart is required,
>                    * SWUpdate can send automatically the feedback.
> @@ -768,6 +758,12 @@ void *network_initializer(void *data)
>                           ipc_send_cmd(&msg);
>                   }
> 
> +                pthread_mutex_lock(&stream_mutex);
> +                inst.status = IDLE;
> +                inst.req.source = SOURCE_UNKNOWN;
> +                /* Keep stream_mutex locked until pthread_cond_wait */
> +                TRACE("Main thread sleep again !");
> +                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for 
> requests...");
> 
>           }
> 
> diff --git a/include/network_interface.h b/include/network_interface.h
> index 0a60cad4..53e7fa44 100644
> --- a/include/network_interface.h
> +++ b/include/network_interface.h
> @@ -9,6 +9,5 @@
>   void *network_initializer(void *data);
>   void *network_thread(void *data);
> 
> -extern bool stream_wkup;
>   extern pthread_mutex_t stream_mutex;
>   extern pthread_cond_t stream_cond;
fhoerni.opensource July 18, 2024, 8:41 a.m. UTC | #2
Hi Stefano,

On 17/07/2024 11:51, Stefano Babic wrote:
> Hi Frederic,
> 
> On 10.07.24 10:28, 'Frederic Hoerni' via swupdate wrote:
>>  From time to time, there was an install error raised by swupdate-client when
>> the daemon processed GET_STATUS too quickly after a REQ_INSTALL:
>>
>> - when starting, inst.last_install=0, inst.status=IDLE
>> - swupdate-client:
>>      * sends REQ_INSTALL to the daemon
>>      * ... followed by the payload (the image), which is buffered by the OS
>>      * ... and swupdate-client continues its execution whereas the daemon has
>>            not started receiving the payload so far
>>      * sends REQ_STATUS to the daemon to get the progress status
>> - swupdate daemon:
>>      * receives REQ_INSTALL and sends ACK
>>      * (1) receives GET_STATUS and answers with inst.last_install=0,
>>        inst.status=IDLE
>>      * (2) notifies another thread that sets inst.status=RUN and proceeds with
>>        the installation
>> - swupdate-client:
>>      * receives GET_STATUS with inst.last_install=0, inst.status=IDLE
>>      * ... and considers this as a failure, and exits with code 1 (error)
>> - swupdate daemon:
>>      * continues normally and succeeds
>>
> 
> Thanks for analyses and yes this can happen when the SWU is very small.
> 
> However, your patch is putting the status back like the first implementation. Later, commit 7e80ad491 introduced the condition to be checked ( stream_wkup ). According to several documentation about pthread_cond_wait(), the condition could change before the awakened thread locks the mutex and returns from  pthread_cond_wait(), so the recommended method is like the one in commit 7e80ad491, that you are removing.

Sure, I agree with that.
> 
> Apart of this, you are moving the writer of inst.status to another thread, and this seems dangerous. The network thread has just received the message to trigger an install, but this is managed by the thread defined in network_initializer (core/stream_interface.c). This is the only writer for that structure., and in fact, it is in charge to start / reject  an update.
> 
> I would say that the issue is outside this code. The GET_STATUS message (polling) was the first implementation to retrieve it. Later, I implemented the "progress" interface, that means an event driven, whose events are managed by the core, and the race cannot happen. But this requires to listen for events to get the changes in the status machinery (IDLE --> RUN --> [SUCCESS | FAILURE].

If I correctly understand, this means that we should rather patch the client side.
I will figure out how to do this. I guess this should be done somewhere in swupdate_async_start, using progress_ipc_connect and progress_ipc_receive.

Best regards,
Frederic

> 
> Best regards,
> Stefano Babic
> 
>> The fact that (1) occurs before (2) is the root cause of the issue as it gives
>> a wrong information to the client.
>>
>> It happens when the thread "network_thread" re-acquires the mutex before
>> pthread_cond_wait (in the thread "network_initializer") re-acquires it himself
>> (the re-acquiring of the mutex is not said atomic in the man page).
>>
>> This has been seen on some machines with a swu image of 20480 bytes.
>>
>> Other improvements:
>> - remove useless stream_wkup
>> - move locking to before starting the other thread in order not to miss the
>>    first signal
>> - do not release the mutex between status=IDLE and pthread_cond_wait in order
>>    not to miss a signal
>>
>> Signed-off-by: Frederic Hoerni <fhoerni@witekio.com>
>>
>> ---
>>   core/network_thread.c       |  4 +++-
>>   core/stream_interface.c     | 24 ++++++++++--------------
>>   include/network_interface.h |  1 -
>>   3 files changed, 13 insertions(+), 16 deletions(-)
>>
>> diff --git a/core/network_thread.c b/core/network_thread.c
>> index d7b713fb..e033f3ab 100644
>> --- a/core/network_thread.c
>> +++ b/core/network_thread.c
>> @@ -486,8 +486,10 @@ void *network_thread (void *data)
>>                                                   /* Drop all old notification from last run */
>>                                                   cleanum_msg_list();
>>
>> +                                                /* Switch to RUN */
>> +                                                instp->status = RUN;
>> +
>>                                                   /* Wake-up the installer */
>> -                                                stream_wkup = true;
>> pthread_cond_signal(&stream_cond);
>>                                           } else {
>>                                                   msg.type = NACK;
>> diff --git a/core/stream_interface.c b/core/stream_interface.c
>> index 5f3ad2e3..59449815 100644
>> --- a/core/stream_interface.c
>> +++ b/core/stream_interface.c
>> @@ -67,7 +67,6 @@ static pthread_t network_thread_id;
>>    * reception of an install request
>>    *
>>    */
>> -bool stream_wkup = false;
>>   pthread_mutex_t stream_mutex = PTHREAD_MUTEX_INITIALIZER;
>>   pthread_cond_t stream_cond = PTHREAD_COND_INITIALIZER;
>>
>> @@ -533,6 +532,9 @@ void *network_initializer(void *data)
>>           inst.status = IDLE;
>>           inst.software = software;
>>
>> +        /* Lock in order not to miss the first signal (before pthread_cond_wait) */
>> +        pthread_mutex_lock(&stream_mutex);
>> +
>>           /* fork off the local dialogs and network service */
>>           network_thread_id = start_thread(network_thread, &inst);
>>
>> @@ -543,12 +545,7 @@ void *network_initializer(void *data)
>>                   ret = 0;
>>
>>                   /* wait for someone to issue an install request */
>> -                pthread_mutex_lock(&stream_mutex);
>> -                while (stream_wkup != true) {
>> -                        pthread_cond_wait(&stream_cond, &stream_mutex);
>> -                }
>> -                stream_wkup = false;
>> -                inst.status = RUN;
>> +                pthread_cond_wait(&stream_cond, &stream_mutex);
>>                   pthread_mutex_unlock(&stream_mutex);
>>                   notify(START, RECOVERY_NO_ERROR, INFOLEVEL, "Software Update started !");
>>                   TRACE("Software update started");
>> @@ -731,13 +728,6 @@ void *network_initializer(void *data)
>>                   swupdate_remove_directory(DATADST_DIR_SUFFIX);
>>   #endif
>>
>> -                pthread_mutex_lock(&stream_mutex);
>> -                inst.status = IDLE;
>> -                inst.req.source = SOURCE_UNKNOWN;
>> -                pthread_mutex_unlock(&stream_mutex);
>> -                TRACE("Main thread sleep again !");
>> -                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for requests...");
>> -
>>                   /*
>>                    * Last step, if no restart is required,
>>                    * SWUpdate can send automatically the feedback.
>> @@ -768,6 +758,12 @@ void *network_initializer(void *data)
>>                           ipc_send_cmd(&msg);
>>                   }
>>
>> +                pthread_mutex_lock(&stream_mutex);
>> +                inst.status = IDLE;
>> +                inst.req.source = SOURCE_UNKNOWN;
>> +                /* Keep stream_mutex locked until pthread_cond_wait */
>> +                TRACE("Main thread sleep again !");
>> +                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for requests...");
>>
>>           }
>>
>> diff --git a/include/network_interface.h b/include/network_interface.h
>> index 0a60cad4..53e7fa44 100644
>> --- a/include/network_interface.h
>> +++ b/include/network_interface.h
>> @@ -9,6 +9,5 @@
>>   void *network_initializer(void *data);
>>   void *network_thread(void *data);
>>
>> -extern bool stream_wkup;
>>   extern pthread_mutex_t stream_mutex;
>>   extern pthread_cond_t stream_cond;
>
Stefano Babic July 18, 2024, 8:45 a.m. UTC | #3
On 18.07.24 10:41, 'Frederic Hoerni' via swupdate wrote:
> Hi Stefano,
> 
> On 17/07/2024 11:51, Stefano Babic wrote:
>> Hi Frederic,
>>
>> On 10.07.24 10:28, 'Frederic Hoerni' via swupdate wrote:
>>>  From time to time, there was an install error raised by 
>>> swupdate-client when
>>> the daemon processed GET_STATUS too quickly after a REQ_INSTALL:
>>>
>>> - when starting, inst.last_install=0, inst.status=IDLE
>>> - swupdate-client:
>>>      * sends REQ_INSTALL to the daemon
>>>      * ... followed by the payload (the image), which is buffered by 
>>> the OS
>>>      * ... and swupdate-client continues its execution whereas the 
>>> daemon has
>>>            not started receiving the payload so far
>>>      * sends REQ_STATUS to the daemon to get the progress status
>>> - swupdate daemon:
>>>      * receives REQ_INSTALL and sends ACK
>>>      * (1) receives GET_STATUS and answers with inst.last_install=0,
>>>        inst.status=IDLE
>>>      * (2) notifies another thread that sets inst.status=RUN and 
>>> proceeds with
>>>        the installation
>>> - swupdate-client:
>>>      * receives GET_STATUS with inst.last_install=0, inst.status=IDLE
>>>      * ... and considers this as a failure, and exits with code 1 
>>> (error)
>>> - swupdate daemon:
>>>      * continues normally and succeeds
>>>
>>
>> Thanks for analyses and yes this can happen when the SWU is very small.
>>
>> However, your patch is putting the status back like the first 
>> implementation. Later, commit 7e80ad491 introduced the condition to be 
>> checked ( stream_wkup ). According to several documentation about 
>> pthread_cond_wait(), the condition could change before the awakened 
>> thread locks the mutex and returns from  pthread_cond_wait(), so the 
>> recommended method is like the one in commit 7e80ad491, that you are 
>> removing.
> 
> Sure, I agree with that.
>>
>> Apart of this, you are moving the writer of inst.status to another 
>> thread, and this seems dangerous. The network thread has just received 
>> the message to trigger an install, but this is managed by the thread 
>> defined in network_initializer (core/stream_interface.c). This is the 
>> only writer for that structure., and in fact, it is in charge to start 
>> / reject  an update.
>>
>> I would say that the issue is outside this code. The GET_STATUS 
>> message (polling) was the first implementation to retrieve it. Later, 
>> I implemented the "progress" interface, that means an event driven, 
>> whose events are managed by the core, and the race cannot happen. But 
>> this requires to listen for events to get the changes in the status 
>> machinery (IDLE --> RUN --> [SUCCESS | FAILURE].
> 
> If I correctly understand, this means that we should rather patch the 
> client side.

In the IPC code, yes.

> I will figure out how to do this. I guess this should be done somewhere 
> in swupdate_async_start, using progress_ipc_connect and 
> progress_ipc_receive.

Yes, the race is in this code, swupdate_async_start and 
ipc_wait_for_complete.

Best regards,
Stefano

> 
> Best regards,
> Frederic
> 
>>
>> Best regards,
>> Stefano Babic
>>
>>> The fact that (1) occurs before (2) is the root cause of the issue as 
>>> it gives
>>> a wrong information to the client.
>>>
>>> It happens when the thread "network_thread" re-acquires the mutex before
>>> pthread_cond_wait (in the thread "network_initializer") re-acquires 
>>> it himself
>>> (the re-acquiring of the mutex is not said atomic in the man page).
>>>
>>> This has been seen on some machines with a swu image of 20480 bytes.
>>>
>>> Other improvements:
>>> - remove useless stream_wkup
>>> - move locking to before starting the other thread in order not to 
>>> miss the
>>>    first signal
>>> - do not release the mutex between status=IDLE and pthread_cond_wait 
>>> in order
>>>    not to miss a signal
>>>
>>> Signed-off-by: Frederic Hoerni <fhoerni@witekio.com>
>>>
>>> ---
>>>   core/network_thread.c       |  4 +++-
>>>   core/stream_interface.c     | 24 ++++++++++--------------
>>>   include/network_interface.h |  1 -
>>>   3 files changed, 13 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/core/network_thread.c b/core/network_thread.c
>>> index d7b713fb..e033f3ab 100644
>>> --- a/core/network_thread.c
>>> +++ b/core/network_thread.c
>>> @@ -486,8 +486,10 @@ void *network_thread (void *data)
>>>                                                   /* Drop all old 
>>> notification from last run */
>>>                                                   cleanum_msg_list();
>>>
>>> +                                                /* Switch to RUN */
>>> +                                                instp->status = RUN;
>>> +
>>>                                                   /* Wake-up the 
>>> installer */
>>> -                                                stream_wkup = true;
>>> pthread_cond_signal(&stream_cond);
>>>                                           } else {
>>>                                                   msg.type = NACK;
>>> diff --git a/core/stream_interface.c b/core/stream_interface.c
>>> index 5f3ad2e3..59449815 100644
>>> --- a/core/stream_interface.c
>>> +++ b/core/stream_interface.c
>>> @@ -67,7 +67,6 @@ static pthread_t network_thread_id;
>>>    * reception of an install request
>>>    *
>>>    */
>>> -bool stream_wkup = false;
>>>   pthread_mutex_t stream_mutex = PTHREAD_MUTEX_INITIALIZER;
>>>   pthread_cond_t stream_cond = PTHREAD_COND_INITIALIZER;
>>>
>>> @@ -533,6 +532,9 @@ void *network_initializer(void *data)
>>>           inst.status = IDLE;
>>>           inst.software = software;
>>>
>>> +        /* Lock in order not to miss the first signal (before 
>>> pthread_cond_wait) */
>>> +        pthread_mutex_lock(&stream_mutex);
>>> +
>>>           /* fork off the local dialogs and network service */
>>>           network_thread_id = start_thread(network_thread, &inst);
>>>
>>> @@ -543,12 +545,7 @@ void *network_initializer(void *data)
>>>                   ret = 0;
>>>
>>>                   /* wait for someone to issue an install request */
>>> -                pthread_mutex_lock(&stream_mutex);
>>> -                while (stream_wkup != true) {
>>> -                        pthread_cond_wait(&stream_cond, &stream_mutex);
>>> -                }
>>> -                stream_wkup = false;
>>> -                inst.status = RUN;
>>> +                pthread_cond_wait(&stream_cond, &stream_mutex);
>>>                   pthread_mutex_unlock(&stream_mutex);
>>>                   notify(START, RECOVERY_NO_ERROR, INFOLEVEL, 
>>> "Software Update started !");
>>>                   TRACE("Software update started");
>>> @@ -731,13 +728,6 @@ void *network_initializer(void *data)
>>>                   swupdate_remove_directory(DATADST_DIR_SUFFIX);
>>>   #endif
>>>
>>> -                pthread_mutex_lock(&stream_mutex);
>>> -                inst.status = IDLE;
>>> -                inst.req.source = SOURCE_UNKNOWN;
>>> -                pthread_mutex_unlock(&stream_mutex);
>>> -                TRACE("Main thread sleep again !");
>>> -                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting 
>>> for requests...");
>>> -
>>>                   /*
>>>                    * Last step, if no restart is required,
>>>                    * SWUpdate can send automatically the feedback.
>>> @@ -768,6 +758,12 @@ void *network_initializer(void *data)
>>>                           ipc_send_cmd(&msg);
>>>                   }
>>>
>>> +                pthread_mutex_lock(&stream_mutex);
>>> +                inst.status = IDLE;
>>> +                inst.req.source = SOURCE_UNKNOWN;
>>> +                /* Keep stream_mutex locked until pthread_cond_wait */
>>> +                TRACE("Main thread sleep again !");
>>> +                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting 
>>> for requests...");
>>>
>>>           }
>>>
>>> diff --git a/include/network_interface.h b/include/network_interface.h
>>> index 0a60cad4..53e7fa44 100644
>>> --- a/include/network_interface.h
>>> +++ b/include/network_interface.h
>>> @@ -9,6 +9,5 @@
>>>   void *network_initializer(void *data);
>>>   void *network_thread(void *data);
>>>
>>> -extern bool stream_wkup;
>>>   extern pthread_mutex_t stream_mutex;
>>>   extern pthread_cond_t stream_cond;
>>
> 
>
fhoerni.opensource July 19, 2024, 9:22 a.m. UTC | #4
Hi Stefano,

On 18/07/2024 10:45, Stefano Babic wrote:
> 
>> I will figure out how to do this. I guess this should be done somewhere in swupdate_async_start, using progress_ipc_connect and progress_ipc_receive.
> 
> Yes, the race is in this code, swupdate_async_start and ipc_wait_for_complete.
> 


I looked at how we could use progress_ipc_receive() to have the client notified about the result of the installation and I see 2 solutions. I would like to have your point of view, and if you think of other solutions that would be simpler or more consistent.

Solution 1:

In swupdate_async_thread:
- start listening to progress events before sending the image to the server (progress_ipc_connect)
- send the image to the server by chunks, interleaved with progress_ipc_receive_nb (a non-blocking variant of progress_ipc_receive, to be created)
- wait until SUCCESS or FAILURE (progress_ipc_receive)

Example in simplified code:

         progressfd = progress_ipc_connect(0 /* no reconnect */);

         do {
                 if (!rq->wr)
                         break;

                 rq->wr(&pbuf, &size);
                 if (size) {
                         if (swupdate_image_write(pbuf, size) != size) {
                                 perror("swupdate_image_write failed");
                                 swupdate_result = FAILURE;
                                 goto out;
                         }
                 }

                 /* Consume progress events */
                 int ret = progress_ipc_receive_nb(&progressfd, &msg);
                 /* ... leave the loop on error ... */

         } while(size > 0);

         /* Wait until the end of the installation (FAILURE or SUCCESS) */
         while (1) {
                 int ret = progress_ipc_receive(progressfd, &msg);
                 /* Leave the loop on SUCCESS or FAILURE */
         }


Reasons:
- we start listening before sending the image to be sure not to miss the final result (SUCCESS or FAILURE);
- we read events interleaved with swupdate_image_write in order to consume the events and prevent having the socket full, which would lead to errors on the server side and the server closing the progress socket.


Solution 2:

In swupdate_async_start or swupdate_async_thread, create a new thread in charge of listening to progress events and notifying its parent thread when reaching SUCCESS or FAILURE. This solution is more complex as it involves 1 more thread, and more synchronization plumbing between threads.



However, with any solution using progress_ipc_receive, we won't have the messages on stdout as before:

Status: 1 message: Software Update started !
Status: 2 message: [network_initializer] : Software update started
Status: 2 message: [extract_file_to_tmp] : Found file
Status: 2 message: [extract_file_to_tmp] :  filename sw-description
Status: 2 message: [extract_file_to_tmp] :  size 352
...

The progress_ipc_receive API does not provide these messages, as far as I know.
Would it be acceptable for you to change these messages on the stdout interface? (only swupdate-client is impacted)

Best regards,
Frederic



> Best regards,
> Stefano
> 
>>
>> Best regards,
>> Frederic
fhoerni.opensource Aug. 6, 2024, 1:01 p.m. UTC | #5
Hi,
I will send another patch for addressing this issue on the client side.

Frederic

On 19/07/2024 11:22, Frederic Hoerni wrote:
> Hi Stefano,
> 
> On 18/07/2024 10:45, Stefano Babic wrote:
>>
>>> I will figure out how to do this. I guess this should be done somewhere in swupdate_async_start, using progress_ipc_connect and progress_ipc_receive.
>>
>> Yes, the race is in this code, swupdate_async_start and ipc_wait_for_complete.
>>
> 
> 
> I looked at how we could use progress_ipc_receive() to have the client notified about the result of the installation and I see 2 solutions. I would like to have your point of view, and if you think of other solutions that would be simpler or more consistent.
> 
> Solution 1:
> 
> In swupdate_async_thread:
> - start listening to progress events before sending the image to the server (progress_ipc_connect)
> - send the image to the server by chunks, interleaved with progress_ipc_receive_nb (a non-blocking variant of progress_ipc_receive, to be created)
> - wait until SUCCESS or FAILURE (progress_ipc_receive)
> 
> Example in simplified code:
> 
>          progressfd = progress_ipc_connect(0 /* no reconnect */);
> 
>          do {
>                  if (!rq->wr)
>                          break;
> 
>                  rq->wr(&pbuf, &size);
>                  if (size) {
>                          if (swupdate_image_write(pbuf, size) != size) {
>                                  perror("swupdate_image_write failed");
>                                  swupdate_result = FAILURE;
>                                  goto out;
>                          }
>                  }
> 
>                  /* Consume progress events */
>                  int ret = progress_ipc_receive_nb(&progressfd, &msg);
>                  /* ... leave the loop on error ... */
> 
>          } while(size > 0);
> 
>          /* Wait until the end of the installation (FAILURE or SUCCESS) */
>          while (1) {
>                  int ret = progress_ipc_receive(progressfd, &msg);
>                  /* Leave the loop on SUCCESS or FAILURE */
>          }
> 
> 
> Reasons:
> - we start listening before sending the image to be sure not to miss the final result (SUCCESS or FAILURE);
> - we read events interleaved with swupdate_image_write in order to consume the events and prevent having the socket full, which would lead to errors on the server side and the server closing the progress socket.
> 
> 
> Solution 2:
> 
> In swupdate_async_start or swupdate_async_thread, create a new thread in charge of listening to progress events and notifying its parent thread when reaching SUCCESS or FAILURE. This solution is more complex as it involves 1 more thread, and more synchronization plumbing between threads.
> 
> 
> 
> However, with any solution using progress_ipc_receive, we won't have the messages on stdout as before:
> 
> Status: 1 message: Software Update started !
> Status: 2 message: [network_initializer] : Software update started
> Status: 2 message: [extract_file_to_tmp] : Found file
> Status: 2 message: [extract_file_to_tmp] :  filename sw-description
> Status: 2 message: [extract_file_to_tmp] :  size 352
> ...
> 
> The progress_ipc_receive API does not provide these messages, as far as I know.
> Would it be acceptable for you to change these messages on the stdout interface? (only swupdate-client is impacted)
> 
> Best regards,
> Frederic
> 
> 
> 
>> Best regards,
>> Stefano
>>
>>>
>>> Best regards,
>>> Frederic
diff mbox series

Patch

diff --git a/core/network_thread.c b/core/network_thread.c
index d7b713fb..e033f3ab 100644
--- a/core/network_thread.c
+++ b/core/network_thread.c
@@ -486,8 +486,10 @@  void *network_thread (void *data)
                                                  /* Drop all old notification from last run */
                                                  cleanum_msg_list();
  
+                                                /* Switch to RUN */
+                                                instp->status = RUN;
+
                                                  /* Wake-up the installer */
-                                                stream_wkup = true;
                                                  pthread_cond_signal(&stream_cond);
                                          } else {
                                                  msg.type = NACK;
diff --git a/core/stream_interface.c b/core/stream_interface.c
index 5f3ad2e3..59449815 100644
--- a/core/stream_interface.c
+++ b/core/stream_interface.c
@@ -67,7 +67,6 @@  static pthread_t network_thread_id;
   * reception of an install request
   *
   */
-bool stream_wkup = false;
  pthread_mutex_t stream_mutex = PTHREAD_MUTEX_INITIALIZER;
  pthread_cond_t stream_cond = PTHREAD_COND_INITIALIZER;
  
@@ -533,6 +532,9 @@  void *network_initializer(void *data)
          inst.status = IDLE;
          inst.software = software;
  
+        /* Lock in order not to miss the first signal (before pthread_cond_wait) */
+        pthread_mutex_lock(&stream_mutex);
+
          /* fork off the local dialogs and network service */
          network_thread_id = start_thread(network_thread, &inst);
  
@@ -543,12 +545,7 @@  void *network_initializer(void *data)
                  ret = 0;
  
                  /* wait for someone to issue an install request */
-                pthread_mutex_lock(&stream_mutex);
-                while (stream_wkup != true) {
-                        pthread_cond_wait(&stream_cond, &stream_mutex);
-                }
-                stream_wkup = false;
-                inst.status = RUN;
+                pthread_cond_wait(&stream_cond, &stream_mutex);
                  pthread_mutex_unlock(&stream_mutex);
                  notify(START, RECOVERY_NO_ERROR, INFOLEVEL, "Software Update started !");
                  TRACE("Software update started");
@@ -731,13 +728,6 @@  void *network_initializer(void *data)
                  swupdate_remove_directory(DATADST_DIR_SUFFIX);
  #endif
  
-                pthread_mutex_lock(&stream_mutex);
-                inst.status = IDLE;
-                inst.req.source = SOURCE_UNKNOWN;
-                pthread_mutex_unlock(&stream_mutex);
-                TRACE("Main thread sleep again !");
-                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for requests...");
-
                  /*
                   * Last step, if no restart is required,
                   * SWUpdate can send automatically the feedback.
@@ -768,6 +758,12 @@  void *network_initializer(void *data)
                          ipc_send_cmd(&msg);
                  }
  
+                pthread_mutex_lock(&stream_mutex);
+                inst.status = IDLE;
+                inst.req.source = SOURCE_UNKNOWN;
+                /* Keep stream_mutex locked until pthread_cond_wait */
+                TRACE("Main thread sleep again !");
+                notify(IDLE, RECOVERY_NO_ERROR, INFOLEVEL, "Waiting for requests...");
  
          }
  
diff --git a/include/network_interface.h b/include/network_interface.h
index 0a60cad4..53e7fa44 100644
--- a/include/network_interface.h
+++ b/include/network_interface.h
@@ -9,6 +9,5 @@ 
  void *network_initializer(void *data);
  void *network_thread(void *data);
  
-extern bool stream_wkup;
  extern pthread_mutex_t stream_mutex;
  extern pthread_cond_t stream_cond;