[1/1] migration: fix expected_downtime

Message ID	1443103624-17230-1-git-send-email-den@openvz.org
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: "Denis V. Lunev" <den@openvz.org> To: Date: Thu, 24 Sep 2015 17:07:04 +0300 Message-Id: <1443103624-17230-1-git-send-email-den@openvz.org> Cc: Amit Shah <amit.shah@redhat.com>, Igor Redko <redkoi@virtuozzo.com>, Juan Quintela <quintela@redhat.com>, qemu-devel@nongnu.org, "Denis V. Lunev" <den@openvz.org> Subject: [Qemu-devel] [PATCH 1/1] migration: fix expected_downtime Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Denis V. Lunev Sept. 24, 2015, 2:07 p.m. UTC

From: Igor Redko <redkoi@virtuozzo.com>

To get this estimation we must divide pending_size by bandwidth
according to description of expected-downtime ("qmp-commands.hx:3246"):
  "expected-downtime": only present while migration is active
              total amount in ms for downtime that was calculated on
              the last bitmap round (json-int)

Previous version was just wrong because dirty_bytes_rate and bandwidth
are measured in Bytes/ms, so dividing first by second we get some
dimensionless quantity.
As it said in description above this value is showed during active
migration phase and recalculated only after transferring all memory
and if this process took more than 1 sec. So maybe just nobody noticed
that bug.

Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
 migration/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Dr. David Alan Gilbert Sept. 28, 2015, 7:22 p.m. UTC | #1

* Denis V. Lunev (den@openvz.org) wrote:
> From: Igor Redko <redkoi@virtuozzo.com>
> 
> To get this estimation we must divide pending_size by bandwidth
> according to description of expected-downtime ("qmp-commands.hx:3246"):
>   "expected-downtime": only present while migration is active
>               total amount in ms for downtime that was calculated on
>               the last bitmap round (json-int)
> 
> Previous version was just wrong because dirty_bytes_rate and bandwidth
> are measured in Bytes/ms, so dividing first by second we get some
> dimensionless quantity.
> As it said in description above this value is showed during active
> migration phase and recalculated only after transferring all memory
> and if this process took more than 1 sec. So maybe just nobody noticed
> that bug.

While I agree the existing code looks wrong, I don't see how this is
any more correct.

 I think 'pending_size' is an estimate of the number of bytes left
to transfer, the intention being that most of those are transferred
prior to pausing the machine, if those are transferred before pausing
then they aren't part of the downtime.

It feels that:
   * If the guest wasn't dirtying pages, then you wouldn't have to
     pause the guest; if it was just dirtying them a little then you
     wouldn't have much to transfer after the pages you'd already
     sent; so if the guest dirty pages fast then the estimate should be
     larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
     
   * If the bandwidth is higher then the estimate should be smaller; so
     'bandwidth' being on the bottom of the fraction feels right.

Dave

> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
>  migration/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 662e77e..d55d545 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
>              /* if we haven't sent anything, we don't want to recalculate
>                 10000 is a small enough number for our purposes */
>              if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> -                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
> +                s->expected_downtime = pending_size / bandwidth;
>              }
>  
>              qemu_file_reset_rate_limit(s->file);
> -- 
> 2.1.4
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Igor Redko Oct. 1, 2015, 8:31 a.m. UTC | #2

On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> From: Igor Redko <redkoi@virtuozzo.com>
>>
>> To get this estimation we must divide pending_size by bandwidth
>> according to description of expected-downtime ("qmp-commands.hx:3246"):
>>    "expected-downtime": only present while migration is active
>>                total amount in ms for downtime that was calculated on
>>                the last bitmap round (json-int)
>>
>> Previous version was just wrong because dirty_bytes_rate and bandwidth
>> are measured in Bytes/ms, so dividing first by second we get some
>> dimensionless quantity.
>> As it said in description above this value is showed during active
>> migration phase and recalculated only after transferring all memory
>> and if this process took more than 1 sec. So maybe just nobody noticed
>> that bug.
>
> While I agree the existing code looks wrong, I don't see how this is
> any more correct.

This patch is aimed to fix units of expected_downtime. It is reasonable 
that expected_downtime should be measured in milliseconds. While the 
existing implementation lacks of any units.
>
>   I think 'pending_size' is an estimate of the number of bytes left
> to transfer, the intention being that most of those are transferred
> prior to pausing the machine, if those are transferred before pausing
> then they aren't part of the downtime.
>
Yes, 'pending_size' is an estimate of the number of bytes left to 
transfer, indeed.
But the condition:
 >	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
slightly modifies the meaning of pending_size correspondingly. 
dirty_bytes_rate is set in migration_sync() that is called when 
pending_size < max_downtime * bandwidth. This estimation is higher than 
max_downtime by design
> It feels that:
>     * If the guest wasn't dirtying pages, then you wouldn't have to
>       pause the guest; if it was just dirtying them a little then you
>       wouldn't have much to transfer after the pages you'd already
>       sent; so if the guest dirty pages fast then the estimate should be
>       larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
>
>     * If the bandwidth is higher then the estimate should be smaller; so
>       'bandwidth' being on the bottom of the fraction feels right.
>
> Dave
>
The 'expected_downtime' in the existing code takes two types of values:
   * positive - dirty_bytes_rate is higher than bandwidth. In this
     case migration doesn't complete.
   * zero - bandwidth is higher than dirty_bytes_rate. In this case
     migration is possible, but we don’t have the downtime value.

This patch has some imperfections. But if we would look back into 
history, it seems that this patch just restores the broken logic.
The existing code is introduced by commit 
https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327 
which claims, that it just restores the mistakenly deleted estimation 
(commit 
https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
Meanwhile, the estimation has changed during this restore operation. The 
estimation before the removal (before 
e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

So maybe we should think about improvement of this estimation.
I'm suggest using something like:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / 
bandwidth

In my opinion this is more correct than the existing approach since the 
last step of the migration process (before pause) is transferring of 
max_size bytes (max_size =  bandwidth * migrate_max_downtime() / 
1000000). So the bytes that were dirtied at this step will be 
transferred during downtime. The transferred bytes count is 
dirty_bytes_rate * max_size/bandwidth or migrate_max_downtime * 
dirty_bytes_rate and division by the bandwidth results in a formula:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Igor

>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Amit Shah <amit.shah@redhat.com>
>> ---
>>   migration/migration.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 662e77e..d55d545 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
>>               /* if we haven't sent anything, we don't want to recalculate
>>                  10000 is a small enough number for our purposes */
>>               if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>> -                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
>> +                s->expected_downtime = pending_size / bandwidth;
>>               }
>>
>>               qemu_file_reset_rate_limit(s->file);
>> --
>> 2.1.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

Igor Redko Oct. 1, 2015, 2:33 p.m. UTC | #3

On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> * Denis V. Lunev (den@openvz.org) wrote:
>> From: Igor Redko <redkoi@virtuozzo.com>
>>
>> To get this estimation we must divide pending_size by bandwidth
>> according to description of expected-downtime ("qmp-commands.hx:3246"):
>>    "expected-downtime": only present while migration is active
>>                total amount in ms for downtime that was calculated on
>>                the last bitmap round (json-int)
>>
>> Previous version was just wrong because dirty_bytes_rate and bandwidth
>> are measured in Bytes/ms, so dividing first by second we get some
>> dimensionless quantity.
>> As it said in description above this value is showed during active
>> migration phase and recalculated only after transferring all memory
>> and if this process took more than 1 sec. So maybe just nobody noticed
>> that bug.
>
> While I agree the existing code looks wrong, I don't see how this is
> any more correct.

This patch is aimed to fix units of expected_downtime. It is reasonable 
that expected_downtime should be measured in milliseconds. While the 
existing implementation lacks of any units.
>
>   I think 'pending_size' is an estimate of the number of bytes left
> to transfer, the intention being that most of those are transferred
> prior to pausing the machine, if those are transferred before pausing
> then they aren't part of the downtime.
>
Yes, 'pending_size' is an estimate of the number of bytes left to 
transfer, indeed.
But the condition:
>	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
slightly modifies the meaning of pending_size correspondingly. 
dirty_bytes_rate is set in migration_sync() that is called when 
pending_size < max_downtime * bandwidth. This estimation is higher than 
max_downtime by design
> It feels that:
>     * If the guest wasn't dirtying pages, then you wouldn't have to
>       pause the guest; if it was just dirtying them a little then you
>       wouldn't have much to transfer after the pages you'd already
>       sent; so if the guest dirty pages fast then the estimate should be
>       larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
>
>     * If the bandwidth is higher then the estimate should be smaller; so
>       'bandwidth' being on the bottom of the fraction feels right.
>
> Dave
>
The 'expected_downtime' in the existing code takes two types of values:
   * positive - dirty_bytes_rate is higher than bandwidth. In this
     case migration doesn't complete.
   * zero - bandwidth is higher than dirty_bytes_rate. In this case
     migration is possible, but we don’t have the downtime value.

This patch has some imperfections. But if we would look back into 
history, it seems that this patch just restores the broken logic.
The existing code is introduced by commit 
https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327 
which claims, that it just restores the mistakenly deleted estimation 
(commit 
https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
Meanwhile, the estimation has changed during this restore operation. The 
estimation before the removal (before 
e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

So maybe we should think about improvement of this estimation.
I'm suggest using something like:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

In my opinion this is more correct than the existing approach since the 
last step of the migration process (before pause) is transferring of 
max_size bytes (max_size =  bandwidth * migrate_max_downtime() / 
1000000). So the bytes that were dirtied at this step will be 
transferred during downtime. The transferred bytes count is 
dirty_bytes_rate * max_size/bandwidth or migrate_max_downtime * 
dirty_bytes_rate and division by the bandwidth results in a formula:
     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Igor

>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Amit Shah <amit.shah@redhat.com>
>> ---
>>   migration/migration.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 662e77e..d55d545 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
>>               /* if we haven't sent anything, we don't want to recalculate
>>                  10000 is a small enough number for our purposes */
>>               if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>> -                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
>> +                s->expected_downtime = pending_size / bandwidth;
>>               }
>>
>>               qemu_file_reset_rate_limit(s->file);
>> --
>> 2.1.4
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>

Dr. David Alan Gilbert Oct. 7, 2015, 10:56 a.m. UTC | #4

* Igor Redko (redkoi@virtuozzo.com) wrote:
> On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
> >* Denis V. Lunev (den@openvz.org) wrote:
> >>From: Igor Redko <redkoi@virtuozzo.com>
> >>
> >>To get this estimation we must divide pending_size by bandwidth
> >>according to description of expected-downtime ("qmp-commands.hx:3246"):
> >>   "expected-downtime": only present while migration is active
> >>               total amount in ms for downtime that was calculated on
> >>               the last bitmap round (json-int)
> >>
> >>Previous version was just wrong because dirty_bytes_rate and bandwidth
> >>are measured in Bytes/ms, so dividing first by second we get some
> >>dimensionless quantity.
> >>As it said in description above this value is showed during active
> >>migration phase and recalculated only after transferring all memory
> >>and if this process took more than 1 sec. So maybe just nobody noticed
> >>that bug.
> >
> >While I agree the existing code looks wrong, I don't see how this is
> >any more correct.
> 
> This patch is aimed to fix units of expected_downtime. It is reasonable that
> expected_downtime should be measured in milliseconds. While the existing
> implementation lacks of any units.

I agree your units are correct where the old one isn't; and I agree
it needs fixing.
However I'm worrying about whether the value in your fix is correct.

> >  I think 'pending_size' is an estimate of the number of bytes left
> >to transfer, the intention being that most of those are transferred
> >prior to pausing the machine, if those are transferred before pausing
> >then they aren't part of the downtime.
> >
> Yes, 'pending_size' is an estimate of the number of bytes left to transfer,
> indeed.
> But the condition:
> >	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> slightly modifies the meaning of pending_size correspondingly.
> dirty_bytes_rate is set in migration_sync() that is called when pending_size
> < max_downtime * bandwidth. This estimation is higher than max_downtime by
> design

I don't think that check really modifies the meaning of pending_size;
it's just a sanity check that means we don't start trying to predict downtime
when we've not transmitted much yet.

> >It feels that:
> >    * If the guest wasn't dirtying pages, then you wouldn't have to
> >      pause the guest; if it was just dirtying them a little then you
> >      wouldn't have much to transfer after the pages you'd already
> >      sent; so if the guest dirty pages fast then the estimate should be
> >      larger; so 'dirty_bytes_rate' being on top of the fraction feels right.
> >
> >    * If the bandwidth is higher then the estimate should be smaller; so
> >      'bandwidth' being on the bottom of the fraction feels right.
> >
> >Dave
> >
> The 'expected_downtime' in the existing code takes two types of values:
>   * positive - dirty_bytes_rate is higher than bandwidth. In this
>     case migration doesn't complete.
>   * zero - bandwidth is higher than dirty_bytes_rate. In this case
>     migration is possible, but we don’t have the downtime value.

OK, I don't think that should give zero; my argument being that given
each pass throuhg RAM takes some time, you're always going to some proportion
of RAM that's dirty.

> This patch has some imperfections. But if we would look back into history,
> it seems that this patch just restores the broken logic.
> The existing code is introduced by commit
> https://github.com/qemu/qemu/commit/90f8ae724a575861f093fbdbfd49a925bcfec327
> which claims, that it just restores the mistakenly deleted estimation
> (commit
> https://github.com/qemu/qemu/commit/e4ed1541ac9413eac494a03532e34beaf8a7d1c5)
> Meanwhile, the estimation has changed during this restore operation. The
> estimation before the removal (before
> e4ed1541ac9413eac494a03532e34beaf8a7d1c5) was just like the one in my patch.

Yes; remember I believe that the current code is wrong - I'm just not
sure your suggested fix is right.

> So maybe we should think about improvement of this estimation.
> I'm suggest using something like:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth
> 
> In my opinion this is more correct than the existing approach since the last
> step of the migration process (before pause) is transferring of max_size
> bytes (max_size =  bandwidth * migrate_max_downtime() / 1000000). So the
> bytes that were dirtied at this step will be transferred during downtime.
> The transferred bytes count is dirty_bytes_rate * max_size/bandwidth or
> migrate_max_downtime * dirty_bytes_rate and division by the bandwidth
> results in a formula:
>     expected_downtime = migrate_max_downtime * dirty_bytes_rate / bandwidth

Are you sure - I thought migrate_max_downtime is governing the time *after* pause;
and it doesn't make sense to me for the expectation of the system to be
related to the expectation of the user.

Dave

> Igor
> 
> >>Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
> >>Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
> >>Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>CC: Juan Quintela <quintela@redhat.com>
> >>CC: Amit Shah <amit.shah@redhat.com>
> >>---
> >>  migration/migration.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/migration/migration.c b/migration/migration.c
> >>index 662e77e..d55d545 100644
> >>--- a/migration/migration.c
> >>+++ b/migration/migration.c
> >>@@ -994,7 +994,7 @@ static void *migration_thread(void *opaque)
> >>              /* if we haven't sent anything, we don't want to recalculate
> >>                 10000 is a small enough number for our purposes */
> >>              if (s->dirty_bytes_rate && transferred_bytes > 10000) {
> >>-                s->expected_downtime = s->dirty_bytes_rate / bandwidth;
> >>+                s->expected_downtime = pending_size / bandwidth;
> >>              }
> >>
> >>              qemu_file_reset_rate_limit(s->file);
> >>--
> >>2.1.4
> >>
> >>
> >--
> >Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Juan Quintela Oct. 9, 2015, 9:08 a.m. UTC | #5

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Igor Redko (redkoi@virtuozzo.com) wrote:
>> On 28.09.2015 22:22, Dr. David Alan Gilbert wrote:
>> >* Denis V. Lunev (den@openvz.org) wrote:
>> >>From: Igor Redko <redkoi@virtuozzo.com>
>> >>
>> >>To get this estimation we must divide pending_size by bandwidth
>> >>according to description of expected-downtime ("qmp-commands.hx:3246"):
>> >>   "expected-downtime": only present while migration is active
>> >>               total amount in ms for downtime that was calculated on
>> >>               the last bitmap round (json-int)
>> >>
>> >>Previous version was just wrong because dirty_bytes_rate and bandwidth
>> >>are measured in Bytes/ms, so dividing first by second we get some
>> >>dimensionless quantity.
>> >>As it said in description above this value is showed during active
>> >>migration phase and recalculated only after transferring all memory
>> >>and if this process took more than 1 sec. So maybe just nobody noticed
>> >>that bug.
>> >
>> >While I agree the existing code looks wrong, I don't see how this is
>> >any more correct.
>> 
>> This patch is aimed to fix units of expected_downtime. It is reasonable that
>> expected_downtime should be measured in milliseconds. While the existing
>> implementation lacks of any units.
>
> I agree your units are correct where the old one isn't; and I agree
> it needs fixing.
> However I'm worrying about whether the value in your fix is correct.

The code (and calculation) is as clear as mud.  I will try to explain
what went behind the current code.

First of all, notice that we call this code under:

        if (current_time >= initial_time + BUFFER_DELAY) {

        }

        i.e. we haven't recalculated things for a while, so we need to
        recalculate (this is not the place when we decided to move from
        iterating stage to completion stage.  We have decided that above
        it.

        So, we have already decided that this will take so many time.

check for s->dirty_bytes_rate:

  If this value is zero, we haven't do a whole round of migration, so we
  don't really know how much information it is dirty.

check for transferred_bytes > 10000
  arbitrary value, we were fixing the case where we were only able to
  transmit very few data for whatever reason (emphasis on whatever).  If
  the number of bytes and/or time are very small, we could get really,
  really very big values here.

So, we decided to recalculate expected_downtime value:
- we have decided that it still no time to finish
- we have transferred some chunk of information since last time.

Now the current code:
    s->expected_downtime = s->dirty_bytes_rate / bandwidth;

And the proposed code is:
    s->expected_downtime = pending_size / bandwidth;

We are going to start why the proposed code is a bad idea:

    pending_size: means how much data we know that is dirty, but it can
    be more data that is dirty.  We would know if we do a
    migration_bitmap_sync(). But we don't want to do this here because
    this is a very costly operation when our guest have hundred of
    gigabytes of pages.

    I.e. if we are at the end of a walk for all the ram, the pending
    size can be 4MB, but there are another 100MB dirtied that we would
    only know when we do a migration_bitmap_sync().

So, what is the best value that we have?  The best one that we have is
the value that was there the last time that we did a
migration_bitmap_sync(), and what was that value?  the number of pages
that were dirtied on that precise moment, s->dirty_bytes_rate is that
number.

So, talking about the units, dirty_bytes_rate is the number of bytes
that we sent in one second.  In particular  if it also a number of
bytes.

    number_bytes / (number_bytes/second) = second

So the units are right, it could be a good idea to put some more
comments around the code, but I think that this is the best value that
we can get at that point.

pending_size is *too* optimistic in general.

We recalculate it here, and not having a fixed value since last
migration_bitmap_sync, because the network bandwidth can change.

Makes it a bit clearer?

>> >  I think 'pending_size' is an estimate of the number of bytes left
>> >to transfer, the intention being that most of those are transferred
>> >prior to pausing the machine, if those are transferred before pausing
>> >then they aren't part of the downtime.
>> >
>> Yes, 'pending_size' is an estimate of the number of bytes left to transfer,
>> indeed.

It is the "minimum" number of bytes left to transfer.  It can be
greater.

I can be convinced to calculate the right value if we do a info migrate,
and then we don't do any estimate.

>> But the condition:
>> >	if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>> slightly modifies the meaning of pending_size correspondingly.
>> dirty_bytes_rate is set in migration_sync() that is called when pending_size
>> < max_downtime * bandwidth. This estimation is higher than max_downtime by
>> design

No, it is bigger than max_downtime with the bandwidth that we had at
that time.  With current  bandwidth, it can be true, or not.

static uint64_t ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size)
{
    uint64_t remaining_size;

    remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;

    if (remaining_size < max_size) {
        qemu_mutex_lock_iothread();
        rcu_read_lock();
        migration_bitmap_sync();
        rcu_read_unlock();
        qemu_mutex_unlock_iothread();
        remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
    }
    return remaining_size;
}

We can be convinced to change this function to return if the pending_size
value is an estimate or the real one.

> I don't think that check really modifies the meaning of pending_size;
> it's just a sanity check that means we don't start trying to predict downtime
> when we've not transmitted much yet.

The problem is that pending_size is only a real value when we have done
a migration_sync() otherwise, it is just big enough.  If we know that we
can transmit 1MB of data in max_downtime, and we know that the amount of
dirty data is bigger than it, why search the exact value (as said it is costly)

>> >It feels that:
>> >    * If the guest wasn't dirtying pages, then you wouldn't have to
>> >      pause the guest; if it was just dirtying them a little then you
>> >      wouldn't have much to transfer after the pages you'd already
>> >      sent; so if the guest dirty pages fast then the estimate should be
>> >      larger; so 'dirty_bytes_rate' being on top of the fraction feels right.

If we are here, it is because pending_size > max_size, so we *know* that
the guest is dirtying too much pages.  This value is only to let know
app management how much we guest the estimated_downtime is going to be,
we don't use it.

>> >
>> >    * If the bandwidth is higher then the estimate should be smaller; so
>> >      'bandwidth' being on the bottom of the fraction feels right.

I think I explained this before.  I can try to improve it is not clear yet.

>> The 'expected_downtime' in the existing code takes two types of values:
>>   * positive - dirty_bytes_rate is higher than bandwidth. In this
>>     case migration doesn't complete.
>>   * zero - bandwidth is higher than dirty_bytes_rate. In this case
>>     migration is possible, but we don’t have the downtime value.

No, real values are:
- positive: this is the best guess we have with previous
migration_bitmap_sync() dirty pages and current bandwidth.

- zero: we haven't yet completed a whole round of RAM memory, we can
  only guess, really.

I hope I have put some light here.

Later, Juan.

[1/1] migration: fix expected_downtime

Commit Message

Comments

Patch