[v2,6/8] migration: move handle of zero page to the thread

Message ID	20180719121520.30026-7-xiaoguangrong@tencent.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: guangrong.xiao@gmail.com To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com Date: Thu, 19 Jul 2018 20:15:18 +0800 Message-Id: <20180719121520.30026-7-xiaoguangrong@tencent.com> In-Reply-To: <20180719121520.30026-1-xiaoguangrong@tencent.com> References: <20180719121520.30026-1-xiaoguangrong@tencent.com> Subject: [Qemu-devel] [PATCH v2 6/8] migration: move handle of zero page to the thread Precedence: list Cc: kvm@vger.kernel.org, Xiao Guangrong <xiaoguangrong@tencent.com>, qemu-devel@nongnu.org, peterx@redhat.com, dgilbert@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	migration: compression optimization \| expand [v2,0/8] migration: compression optimization [v2,1/8] migration: do not wait for free thread [v2,2/8] migration: fix counting normal page for compression [v2,3/8] migration: show the statistics of compression [v2,4/8] migration: introduce save_zero_page_to_file [v2,5/8] migration: drop the return value of do_compress_ram_page [v2,6/8] migration: move handle of zero page to the thread [v2,7/8] migration: hold the lock only if it is really needed [v2,8/8] migration: do not flush_compressed_data at the end of each iteration

Xiao Guangrong July 19, 2018, 12:15 p.m. UTC

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Detecting zero page is not a light work, moving it to the thread to
speed the main thread up

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
---
 migration/ram.c | 112 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 78 insertions(+), 34 deletions(-)

Peter Xu July 23, 2018, 5:03 a.m. UTC | #1

On Thu, Jul 19, 2018 at 08:15:18PM +0800, guangrong.xiao@gmail.com wrote:

[...]

> @@ -1950,12 +1971,16 @@ retry:
>              set_compress_params(&comp_param[idx], block, offset);
>              qemu_cond_signal(&comp_param[idx].cond);
>              qemu_mutex_unlock(&comp_param[idx].mutex);
> -            pages = 1;
> -            /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> -            compression_counters.reduced_size += TARGET_PAGE_SIZE -
> -                                                 bytes_xmit + 8;
> -            compression_counters.pages++;
>              ram_counters.transferred += bytes_xmit;
> +            pages = 1;

(moving of this line seems irrelevant; meanwhile more duplicated codes
so even better to have a helper now)

> +            if (comp_param[idx].zero_page) {
> +                ram_counters.duplicate++;
> +            } else {
> +                /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
> +                compression_counters.reduced_size += TARGET_PAGE_SIZE -
> +                                                     bytes_xmit + 8;
> +                compression_counters.pages++;
> +            }
>              break;
>          }
>      }

[...]

> @@ -2249,15 +2308,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>          return res;
>      }
>  
> -    /*
> -     * When starting the process of a new block, the first page of
> -     * the block should be sent out before other pages in the same
> -     * block, and all the pages in last block should have been sent
> -     * out, keeping this order is important, because the 'cont' flag
> -     * is used to avoid resending the block name.
> -     */
> -    if (block != rs->last_sent_block && save_page_use_compression(rs)) {
> -            flush_compressed_data(rs);
> +    if (save_compress_page(rs, block, offset)) {
> +        return 1;

It's a bit tricky (though it seems to be a good idea too) to move the
zero detect into the compression thread, though I noticed that we also
do something else for zero pages:

    res = save_zero_page(rs, block, offset);
    if (res > 0) {
        /* Must let xbzrle know, otherwise a previous (now 0'd) cached
         * page would be stale
         */
        if (!save_page_use_compression(rs)) {
            XBZRLE_cache_lock();
            xbzrle_cache_zero_page(rs, block->offset + offset);
            XBZRLE_cache_unlock();
        }
        ram_release_pages(block->idstr, offset, res);
        return res;
    }

I'd guess that the xbzrle update of the zero page is not needed for
compression since after all xbzrle is not enabled when compression is
enabled, however do we need to do ram_release_pages() somehow?

>      }
>  
>      res = save_zero_page(rs, block, offset);
> @@ -2275,18 +2327,10 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>      }
>  
>      /*
> -     * Make sure the first page is sent out before other pages.
> -     *
> -     * we post it as normal page as compression will take much
> -     * CPU resource.
> -     */
> -    if (block == rs->last_sent_block && save_page_use_compression(rs)) {
> -        res = compress_page_with_multi_thread(rs, block, offset);
> -        if (res > 0) {
> -            return res;
> -        }
> -        compression_counters.busy++;
> -    } else if (migrate_use_multifd()) {
> +    * do not use multifd for compression as the first page in the new
> +    * block should be posted out before sending the compressed page
> +    */
> +    if (!save_page_use_compression(rs) && migrate_use_multifd()) {
>          return ram_save_multifd_page(rs, block, offset);
>      }
>  
> -- 
> 2.14.4
> 

Regards,

Xiao Guangrong July 23, 2018, 7:56 a.m. UTC | #2

On 07/23/2018 01:03 PM, Peter Xu wrote:
> On Thu, Jul 19, 2018 at 08:15:18PM +0800, guangrong.xiao@gmail.com wrote:
> 
> [...]
> 
>> @@ -1950,12 +1971,16 @@ retry:
>>               set_compress_params(&comp_param[idx], block, offset);
>>               qemu_cond_signal(&comp_param[idx].cond);
>>               qemu_mutex_unlock(&comp_param[idx].mutex);
>> -            pages = 1;
>> -            /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
>> -            compression_counters.reduced_size += TARGET_PAGE_SIZE -
>> -                                                 bytes_xmit + 8;
>> -            compression_counters.pages++;
>>               ram_counters.transferred += bytes_xmit;
>> +            pages = 1;
> 
> (moving of this line seems irrelevant; meanwhile more duplicated codes
> so even better to have a helper now)
> 

Good to me. :)

>> +            if (comp_param[idx].zero_page) {
>> +                ram_counters.duplicate++;
>> +            } else {
>> +                /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
>> +                compression_counters.reduced_size += TARGET_PAGE_SIZE -
>> +                                                     bytes_xmit + 8;
>> +                compression_counters.pages++;
>> +            }
>>               break;
>>           }
>>       }
> 
> [...]
> 
>> @@ -2249,15 +2308,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>>           return res;
>>       }
>>   
>> -    /*
>> -     * When starting the process of a new block, the first page of
>> -     * the block should be sent out before other pages in the same
>> -     * block, and all the pages in last block should have been sent
>> -     * out, keeping this order is important, because the 'cont' flag
>> -     * is used to avoid resending the block name.
>> -     */
>> -    if (block != rs->last_sent_block && save_page_use_compression(rs)) {
>> -            flush_compressed_data(rs);
>> +    if (save_compress_page(rs, block, offset)) {
>> +        return 1;
> 
> It's a bit tricky (though it seems to be a good idea too) to move the
> zero detect into the compression thread, though I noticed that we also
> do something else for zero pages:
> 
>      res = save_zero_page(rs, block, offset);
>      if (res > 0) {
>          /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>           * page would be stale
>           */
>          if (!save_page_use_compression(rs)) {
>              XBZRLE_cache_lock();
>              xbzrle_cache_zero_page(rs, block->offset + offset);
>              XBZRLE_cache_unlock();
>          }
>          ram_release_pages(block->idstr, offset, res);
>          return res;
>      }
> 
> I'd guess that the xbzrle update of the zero page is not needed for
> compression since after all xbzrle is not enabled when compression is

Yup. if they are both enabled, compression works only for the first
iteration (i.e, ram_bulk_stage), at that point, nothing is cached
in xbzrle's cahe, in other words, xbzrle has posted nothing to the
destination.

> enabled, however do we need to do ram_release_pages() somehow?
> 

We have done it in the thread:

+static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
                                   ram_addr_t offset, uint8_t *source_buf)
  {


+    if (save_zero_page_to_file(rs, f, block, offset)) {
+        zero_page = true;
+        goto exit;
+    }
......

+exit:
      ram_release_pages(block->idstr, offset & TARGET_PAGE_MASK, 1);
+    return zero_page;
  }

However, it is not safe to do ram_release_pages in the thread as it's
not protected it multithreads. Fortunately, compression will be disabled
if it switches to post-copy, so i preferred to keep current behavior and
deferred to fix it after this patchset has been merged.

Peter Xu July 23, 2018, 8:28 a.m. UTC | #3

On Mon, Jul 23, 2018 at 03:56:33PM +0800, Xiao Guangrong wrote:

[...]

> > > @@ -2249,15 +2308,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
> > >           return res;
> > >       }
> > > -    /*
> > > -     * When starting the process of a new block, the first page of
> > > -     * the block should be sent out before other pages in the same
> > > -     * block, and all the pages in last block should have been sent
> > > -     * out, keeping this order is important, because the 'cont' flag
> > > -     * is used to avoid resending the block name.
> > > -     */
> > > -    if (block != rs->last_sent_block && save_page_use_compression(rs)) {
> > > -            flush_compressed_data(rs);
> > > +    if (save_compress_page(rs, block, offset)) {
> > > +        return 1;
> > 
> > It's a bit tricky (though it seems to be a good idea too) to move the
> > zero detect into the compression thread, though I noticed that we also
> > do something else for zero pages:
> > 
> >      res = save_zero_page(rs, block, offset);
> >      if (res > 0) {
> >          /* Must let xbzrle know, otherwise a previous (now 0'd) cached
> >           * page would be stale
> >           */
> >          if (!save_page_use_compression(rs)) {
> >              XBZRLE_cache_lock();
> >              xbzrle_cache_zero_page(rs, block->offset + offset);
> >              XBZRLE_cache_unlock();
> >          }
> >          ram_release_pages(block->idstr, offset, res);
> >          return res;
> >      }
> > 
> > I'd guess that the xbzrle update of the zero page is not needed for
> > compression since after all xbzrle is not enabled when compression is
> 
> Yup. if they are both enabled, compression works only for the first
> iteration (i.e, ram_bulk_stage), at that point, nothing is cached
> in xbzrle's cahe, in other words, xbzrle has posted nothing to the
> destination.
> 
> > enabled, however do we need to do ram_release_pages() somehow?
> > 
> 
> We have done it in the thread:
> 
> +static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>                                   ram_addr_t offset, uint8_t *source_buf)
>  {
> 
> 
> +    if (save_zero_page_to_file(rs, f, block, offset)) {
> +        zero_page = true;
> +        goto exit;
> +    }
> ......
> 
> +exit:
>      ram_release_pages(block->idstr, offset & TARGET_PAGE_MASK, 1);
> +    return zero_page;
>  }

Ah, then it seems fine.  Though I'd suggest you comment these into the
commit message in case people won't get it easily.

> 
> However, it is not safe to do ram_release_pages in the thread as it's
> not protected it multithreads. Fortunately, compression will be disabled
> if it switches to post-copy, so i preferred to keep current behavior and
> deferred to fix it after this patchset has been merged.

Do you mean ram_release_pages() is not thread-safe?  Why?  I didn't
notice it before but I feel like it is safe.

Regards,

Xiao Guangrong July 23, 2018, 8:44 a.m. UTC | #4

On 07/23/2018 04:28 PM, Peter Xu wrote:
> On Mon, Jul 23, 2018 at 03:56:33PM +0800, Xiao Guangrong wrote:
> 
> [...]
> 
>>>> @@ -2249,15 +2308,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss,
>>>>            return res;
>>>>        }
>>>> -    /*
>>>> -     * When starting the process of a new block, the first page of
>>>> -     * the block should be sent out before other pages in the same
>>>> -     * block, and all the pages in last block should have been sent
>>>> -     * out, keeping this order is important, because the 'cont' flag
>>>> -     * is used to avoid resending the block name.
>>>> -     */
>>>> -    if (block != rs->last_sent_block && save_page_use_compression(rs)) {
>>>> -            flush_compressed_data(rs);
>>>> +    if (save_compress_page(rs, block, offset)) {
>>>> +        return 1;
>>>
>>> It's a bit tricky (though it seems to be a good idea too) to move the
>>> zero detect into the compression thread, though I noticed that we also
>>> do something else for zero pages:
>>>
>>>       res = save_zero_page(rs, block, offset);
>>>       if (res > 0) {
>>>           /* Must let xbzrle know, otherwise a previous (now 0'd) cached
>>>            * page would be stale
>>>            */
>>>           if (!save_page_use_compression(rs)) {
>>>               XBZRLE_cache_lock();
>>>               xbzrle_cache_zero_page(rs, block->offset + offset);
>>>               XBZRLE_cache_unlock();
>>>           }
>>>           ram_release_pages(block->idstr, offset, res);
>>>           return res;
>>>       }
>>>
>>> I'd guess that the xbzrle update of the zero page is not needed for
>>> compression since after all xbzrle is not enabled when compression is
>>
>> Yup. if they are both enabled, compression works only for the first
>> iteration (i.e, ram_bulk_stage), at that point, nothing is cached
>> in xbzrle's cahe, in other words, xbzrle has posted nothing to the
>> destination.
>>
>>> enabled, however do we need to do ram_release_pages() somehow?
>>>
>>
>> We have done it in the thread:
>>
>> +static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block,
>>                                    ram_addr_t offset, uint8_t *source_buf)
>>   {
>>
>>
>> +    if (save_zero_page_to_file(rs, f, block, offset)) {
>> +        zero_page = true;
>> +        goto exit;
>> +    }
>> ......
>>
>> +exit:
>>       ram_release_pages(block->idstr, offset & TARGET_PAGE_MASK, 1);
>> +    return zero_page;
>>   }
> 
> Ah, then it seems fine.  Though I'd suggest you comment these into the
> commit message in case people won't get it easily.
> 

Okay, will update the commit log addressed your comments.

>>
>> However, it is not safe to do ram_release_pages in the thread as it's
>> not protected it multithreads. Fortunately, compression will be disabled
>> if it switches to post-copy, so i preferred to keep current behavior and
>> deferred to fix it after this patchset has been merged.
> 
> Do you mean ram_release_pages() is not thread-safe?  Why?  I didn't
> notice it before but I feel like it is safe.

bitmap_clear() called in the function is not safe.

Peter Xu July 23, 2018, 9:40 a.m. UTC | #5

On Mon, Jul 23, 2018 at 04:44:49PM +0800, Xiao Guangrong wrote:

[...]

> > > 
> > > However, it is not safe to do ram_release_pages in the thread as it's
> > > not protected it multithreads. Fortunately, compression will be disabled
> > > if it switches to post-copy, so i preferred to keep current behavior and
> > > deferred to fix it after this patchset has been merged.
> > 
> > Do you mean ram_release_pages() is not thread-safe?  Why?  I didn't
> > notice it before but I feel like it is safe.
> 
> bitmap_clear() called in the function is not safe.

Yeah, and the funny thing is that I don't think ram_release_pages()
should even touch the receivedmap...  It's possible that the
release-ram feature for postcopy is broken.

Never mind on that.  I'll post a patch to fix it, then I think the
ram_release_pages() will be thread safe.

Then this patch shouldn't be affected and it should be fine after that
fix.

Regards,

Xiao Guangrong July 24, 2018, 7:39 a.m. UTC | #6

On 07/23/2018 05:40 PM, Peter Xu wrote:
> On Mon, Jul 23, 2018 at 04:44:49PM +0800, Xiao Guangrong wrote:
> 
> [...]
> 
>>>>
>>>> However, it is not safe to do ram_release_pages in the thread as it's
>>>> not protected it multithreads. Fortunately, compression will be disabled
>>>> if it switches to post-copy, so i preferred to keep current behavior and
>>>> deferred to fix it after this patchset has been merged.
>>>
>>> Do you mean ram_release_pages() is not thread-safe?  Why?  I didn't
>>> notice it before but I feel like it is safe.
>>
>> bitmap_clear() called in the function is not safe.
> 
> Yeah, and the funny thing is that I don't think ram_release_pages()
> should even touch the receivedmap...  It's possible that the
> release-ram feature for postcopy is broken.
> 
> Never mind on that.  I'll post a patch to fix it, then I think the
> ram_release_pages() will be thread safe.
> 
> Then this patch shouldn't be affected and it should be fine after that
> fix

That would be great, thanks for your work.

[v2,6/8] migration: move handle of zero page to the thread

Commit Message

Comments

Patch