[01/12] migration: do not wait if no free thread

Message ID	20180604095520.8563-2-xiaoguangrong@tencent.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: guangrong.xiao@gmail.com To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com Date: Mon, 4 Jun 2018 17:55:09 +0800 Message-Id: <20180604095520.8563-2-xiaoguangrong@tencent.com> In-Reply-To: <20180604095520.8563-1-xiaoguangrong@tencent.com> References: <20180604095520.8563-1-xiaoguangrong@tencent.com> Subject: [Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread Precedence: list Cc: kvm@vger.kernel.org, Xiao Guangrong <xiaoguangrong@tencent.com>, qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	migration: improve multithreads for compression and decompression \| expand [00/12] migration: improve multithreads for compression and decompression [01/12] migration: do not wait if no free thread [02/12] migration: fix counting normal page for compression [03/12] migration: fix counting xbzrle cache_miss_rate [04/12] migration: introduce migration_update_rates [05/12] migration: show the statistics of compression [06/12] migration: do not detect zero page for compression [07/12] migration: hold the lock only if it is really needed [08/12] migration: do not flush_compressed_data at the end of each iteration [09/12] ring: introduce lockless ring buffer [10/12] migration: introduce lockless multithreads model [11/12] migration: use lockless Multithread model for compression [12/12] migration: use lockless Multithread model for decompression

Xiao Guangrong June 4, 2018, 9:55 a.m. UTC

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Instead of putting the main thread to sleep state to wait for
free compression thread, we can directly post it out as normal
page that reduces the latency and uses CPUs more efficiently

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 34 +++++++++++++++-------------------
 1 file changed, 15 insertions(+), 19 deletions(-)

Peter Xu June 11, 2018, 7:39 a.m. UTC | #1

On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Instead of putting the main thread to sleep state to wait for
> free compression thread, we can directly post it out as normal
> page that reduces the latency and uses CPUs more efficiently

The feature looks good, though I'm not sure whether we should make a
capability flag for this feature since otherwise it'll be hard to
switch back to the old full-compression way no matter for what
reason.  Would that be a problem?

> 
> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> ---
>  migration/ram.c | 34 +++++++++++++++-------------------
>  1 file changed, 15 insertions(+), 19 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 5bcbf7a9f9..0caf32ab0a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1423,25 +1423,18 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
>  
>      thread_count = migrate_compress_threads();
>      qemu_mutex_lock(&comp_done_lock);

Can we drop this lock in this case?

> -    while (true) {
> -        for (idx = 0; idx < thread_count; idx++) {
> -            if (comp_param[idx].done) {
> -                comp_param[idx].done = false;
> -                bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> -                qemu_mutex_lock(&comp_param[idx].mutex);
> -                set_compress_params(&comp_param[idx], block, offset);
> -                qemu_cond_signal(&comp_param[idx].cond);
> -                qemu_mutex_unlock(&comp_param[idx].mutex);
> -                pages = 1;
> -                ram_counters.normal++;
> -                ram_counters.transferred += bytes_xmit;
> -                break;
> -            }
> -        }
> -        if (pages > 0) {
> +    for (idx = 0; idx < thread_count; idx++) {
> +        if (comp_param[idx].done) {
> +            comp_param[idx].done = false;
> +            bytes_xmit = qemu_put_qemu_file(rs->f, comp_param[idx].file);
> +            qemu_mutex_lock(&comp_param[idx].mutex);
> +            set_compress_params(&comp_param[idx], block, offset);
> +            qemu_cond_signal(&comp_param[idx].cond);
> +            qemu_mutex_unlock(&comp_param[idx].mutex);
> +            pages = 1;
> +            ram_counters.normal++;
> +            ram_counters.transferred += bytes_xmit;
>              break;
> -        } else {
> -            qemu_cond_wait(&comp_done_cond, &comp_done_lock);
>          }
>      }
>      qemu_mutex_unlock(&comp_done_lock);

Regards,

Xiao Guangrong June 12, 2018, 2:42 a.m. UTC | #2

On 06/11/2018 03:39 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Instead of putting the main thread to sleep state to wait for
>> free compression thread, we can directly post it out as normal
>> page that reduces the latency and uses CPUs more efficiently
> 
> The feature looks good, though I'm not sure whether we should make a
> capability flag for this feature since otherwise it'll be hard to
> switch back to the old full-compression way no matter for what
> reason.  Would that be a problem?
> 

We assume this optimization should always be optimistic for all cases,
particularly, we introduced the statistics of compression, then the user
should adjust its parameters based on those statistics if anything works
worse.

Furthermore, we really need to improve this optimization if it hurts
any case rather than leaving a option to the user. :)

>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
>> ---
>>   migration/ram.c | 34 +++++++++++++++-------------------
>>   1 file changed, 15 insertions(+), 19 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5bcbf7a9f9..0caf32ab0a 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1423,25 +1423,18 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
>>   
>>       thread_count = migrate_compress_threads();
>>       qemu_mutex_lock(&comp_done_lock);
> 
> Can we drop this lock in this case?

The lock is used to protect comp_param[].done...

Well, we are able to possibly remove it if we redesign the implementation, e.g, use atomic
access for comp_param.done, however, it still can not work efficiently i believe. Please see
more in the later reply to your comments in the cover-letter.

Peter Xu June 12, 2018, 3:15 a.m. UTC | #3

On Tue, Jun 12, 2018 at 10:42:25AM +0800, Xiao Guangrong wrote:
> 
> 
> On 06/11/2018 03:39 PM, Peter Xu wrote:
> > On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.xiao@gmail.com wrote:
> > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > 
> > > Instead of putting the main thread to sleep state to wait for
> > > free compression thread, we can directly post it out as normal
> > > page that reduces the latency and uses CPUs more efficiently
> > 
> > The feature looks good, though I'm not sure whether we should make a
> > capability flag for this feature since otherwise it'll be hard to
> > switch back to the old full-compression way no matter for what
> > reason.  Would that be a problem?
> > 
> 
> We assume this optimization should always be optimistic for all cases,
> particularly, we introduced the statistics of compression, then the user
> should adjust its parameters based on those statistics if anything works
> worse.

Ah, that'll be good.

> 
> Furthermore, we really need to improve this optimization if it hurts
> any case rather than leaving a option to the user. :)

Yeah, even if we make it a parameter/capability we can still turn that
on by default in new versions but keep the old behavior in old
versions. :) The major difference is that, then we can still _have_ a
way to compress every page. I'm just thinking if we don't have a
switch for that then if someone wants to measure e.g.  how a new
compression algo could help VM migration, then he/she won't be
possible to do that again since the numbers will be meaningless if
that bit is out of control on which page will be compressed.

Though I don't know how much use it'll bring...  But if that won't be
too hard, it still seems good.  Not a strong opinion.

> 
> > > 
> > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > ---
> > >   migration/ram.c | 34 +++++++++++++++-------------------
> > >   1 file changed, 15 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5bcbf7a9f9..0caf32ab0a 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1423,25 +1423,18 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
> > >       thread_count = migrate_compress_threads();
> > >       qemu_mutex_lock(&comp_done_lock);
> > 
> > Can we drop this lock in this case?
> 
> The lock is used to protect comp_param[].done...

IMHO it's okay?

It's used in this way:

  if (done) {
    done = false;
  }

So it only switches done from true->false.

And the compression thread is the only one that did the other switch
(false->true).  IMHO this special case will allow no-lock since as
long as "done" is true here then current thread will be the only one
to modify it, then no race at all.

> 
> Well, we are able to possibly remove it if we redesign the implementation, e.g, use atomic
> access for comp_param.done, however, it still can not work efficiently i believe. Please see
> more in the later reply to your comments in the cover-letter.

Will read that after it arrives; though I didn't receive a reply.
Have you missed clicking the "send" button? ;)

Regards,

Dr. David Alan Gilbert June 13, 2018, 3:43 p.m. UTC | #4

* Peter Xu (peterx@redhat.com) wrote:
> On Tue, Jun 12, 2018 at 10:42:25AM +0800, Xiao Guangrong wrote:
> > 
> > 
> > On 06/11/2018 03:39 PM, Peter Xu wrote:
> > > On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.xiao@gmail.com wrote:
> > > > From: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > 
> > > > Instead of putting the main thread to sleep state to wait for
> > > > free compression thread, we can directly post it out as normal
> > > > page that reduces the latency and uses CPUs more efficiently
> > > 
> > > The feature looks good, though I'm not sure whether we should make a
> > > capability flag for this feature since otherwise it'll be hard to
> > > switch back to the old full-compression way no matter for what
> > > reason.  Would that be a problem?
> > > 
> > 
> > We assume this optimization should always be optimistic for all cases,
> > particularly, we introduced the statistics of compression, then the user
> > should adjust its parameters based on those statistics if anything works
> > worse.
> 
> Ah, that'll be good.
> 
> > 
> > Furthermore, we really need to improve this optimization if it hurts
> > any case rather than leaving a option to the user. :)
> 
> Yeah, even if we make it a parameter/capability we can still turn that
> on by default in new versions but keep the old behavior in old
> versions. :) The major difference is that, then we can still _have_ a
> way to compress every page. I'm just thinking if we don't have a
> switch for that then if someone wants to measure e.g.  how a new
> compression algo could help VM migration, then he/she won't be
> possible to do that again since the numbers will be meaningless if
> that bit is out of control on which page will be compressed.
> 
> Though I don't know how much use it'll bring...  But if that won't be
> too hard, it still seems good.  Not a strong opinion.

I think that is needed; it might be that some users have really awful
networking and need the compression; I'd expect that for people who turn
on compression they really expect the slowdown because they need it for
their network, so changing that is a bit odd.

Dave
> > 
> > > > 
> > > > Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
> > > > ---
> > > >   migration/ram.c | 34 +++++++++++++++-------------------
> > > >   1 file changed, 15 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 5bcbf7a9f9..0caf32ab0a 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1423,25 +1423,18 @@ static int compress_page_with_multi_thread(RAMState *rs, RAMBlock *block,
> > > >       thread_count = migrate_compress_threads();
> > > >       qemu_mutex_lock(&comp_done_lock);
> > > 
> > > Can we drop this lock in this case?
> > 
> > The lock is used to protect comp_param[].done...
> 
> IMHO it's okay?
> 
> It's used in this way:
> 
>   if (done) {
>     done = false;
>   }
> 
> So it only switches done from true->false.
> 
> And the compression thread is the only one that did the other switch
> (false->true).  IMHO this special case will allow no-lock since as
> long as "done" is true here then current thread will be the only one
> to modify it, then no race at all.
> 
> > 
> > Well, we are able to possibly remove it if we redesign the implementation, e.g, use atomic
> > access for comp_param.done, however, it still can not work efficiently i believe. Please see
> > more in the later reply to your comments in the cover-letter.
> 
> Will read that after it arrives; though I didn't receive a reply.
> Have you missed clicking the "send" button? ;)
> 
> Regards,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Xiao Guangrong June 14, 2018, 3:19 a.m. UTC | #5

On 06/13/2018 11:43 PM, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
>> On Tue, Jun 12, 2018 at 10:42:25AM +0800, Xiao Guangrong wrote:
>>>
>>>
>>> On 06/11/2018 03:39 PM, Peter Xu wrote:
>>>> On Mon, Jun 04, 2018 at 05:55:09PM +0800, guangrong.xiao@gmail.com wrote:
>>>>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>>>>
>>>>> Instead of putting the main thread to sleep state to wait for
>>>>> free compression thread, we can directly post it out as normal
>>>>> page that reduces the latency and uses CPUs more efficiently
>>>>
>>>> The feature looks good, though I'm not sure whether we should make a
>>>> capability flag for this feature since otherwise it'll be hard to
>>>> switch back to the old full-compression way no matter for what
>>>> reason.  Would that be a problem?
>>>>
>>>
>>> We assume this optimization should always be optimistic for all cases,
>>> particularly, we introduced the statistics of compression, then the user
>>> should adjust its parameters based on those statistics if anything works
>>> worse.
>>
>> Ah, that'll be good.
>>
>>>
>>> Furthermore, we really need to improve this optimization if it hurts
>>> any case rather than leaving a option to the user. :)
>>
>> Yeah, even if we make it a parameter/capability we can still turn that
>> on by default in new versions but keep the old behavior in old
>> versions. :) The major difference is that, then we can still _have_ a
>> way to compress every page. I'm just thinking if we don't have a
>> switch for that then if someone wants to measure e.g.  how a new
>> compression algo could help VM migration, then he/she won't be
>> possible to do that again since the numbers will be meaningless if
>> that bit is out of control on which page will be compressed.
>>
>> Though I don't know how much use it'll bring...  But if that won't be
>> too hard, it still seems good.  Not a strong opinion.
> 
> I think that is needed; it might be that some users have really awful
> networking and need the compression; I'd expect that for people who turn
> on compression they really expect the slowdown because they need it for
> their network, so changing that is a bit odd.

People should make sure the system has enough CPU resource to do
compression as well, so the perfect usage is that the 'busy-rate'
is low enough i think.

However, it's not a big deal, i will introduce a parameter,
maybe, compress-wait-free-thread.

Thank you all, Dave and Peter! :)

[01/12] migration: do not wait if no free thread

Commit Message

Comments

Patch