Message ID | 20230724170755.1114519-1-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Allow user to specify migration available bandwidth | expand |
On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote: > Migration bandwidth is a very important value to live migration. It's > because it's one of the major factors that we'll make decision on when to > switchover to destination in a precopy process. To elaborate on this for those reading along... QEMU takes maxmimum downtime limit and multiplies by its estimate of bandwidth. This gives a figure for the amount of data QEMU thinks it can transfer within the downtime period. QEMU compares this figure to the amount of data that is still pending at the end of an iteration. > This value is currently estimated by QEMU during the whole live migration > process by monitoring how fast we were sending the data. This can be the > most accurate bandwidth if in the ideal world, where we're always feeding > unlimited data to the migration channel, and then it'll be limited to the > bandwidth that is available. The QEMU estimate for available bandwidth will definitely be wrong, potentially by orders of magnitude, if QEMU has a max bandwidth limit set, as in that case it is never trying to push the peak rates available from the NICs/network fabric. > The issue is QEMU itself may not be able to avoid those uncertainties on > measuing the real "available migration bandwidth". At least not something > I can think of so far. IIUC, you can query the NIC properties to find the hardware transfer rate of the NICs. That doesn't imply apps can actually reach that rate in practice - it has a decent chance of being a over-estimate of bandwidth, possibly very very much over. Is such an over estimate better or worse than QEMU's current under-estimate ? It depends on the POV. From the POV of QEMU, over-estimating means means it'll be not be throttling as much as it should. That's not a downside of migration - it makes it more likely for migration to complete :-) From the POV of non-QEMU apps though, if QEMU over-estimates, it'll mean other apps get starved of network bandwidth. Overall I agree, there's no obvious way QEMU can ever come up with a reliable estimate for bandwidth available. > One way to fix this is when the user is fully aware of the available > bandwidth, then we can allow the user to help providing an accurate value. > > For example, if the user has a dedicated channel of 10Gbps for migration > for this specific VM, the user can specify this bandwidth so QEMU can > always do the calculation based on this fact, trusting the user as long as > specified. I can see that in theory, but when considering a non-trivial deployments of QEMU, I wonder if the user can really have any such certainty of what is truely avaialble. It would need global awareness of the whole network of hosts & workloads. > When the user wants to have migration only use 5Gbps out of that 10Gbps, > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > other things). So it can be useful even if the network is not dedicated, > but as long as the user can know a solid value. > > A new parameter "available-bandwidth" is introduced just for this. So when > the user specified this parameter, instead of trusting the estimated value > from QEMU itself (based on the QEMUFile send speed), let's trust the user > more. I feel like rather than "available-bandwidth", we should call it "max-convergance-bandwidth". To me that name would better reflect the fact that this isn't really required to be a measure of how much NIC bandwidth is available. It is merely an expression of a different bandwidth limit to apply during switch over. IOW * max-bandwidth: limit during pre-copy main transfer * max-convergance-bandwidth: limit during pre-copy switch-over * max-postcopy-banwidth: limit during post-copy phase > This can resolve issues like "unconvergence migration" which is caused by > hilarious low "migration bandwidth" detected for whatever reason. > > Reported-by: Zhiyi Guo <zhguo@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > qapi/migration.json | 20 +++++++++++++++++++- > migration/migration.h | 2 +- > migration/options.h | 1 + > migration/migration-hmp-cmds.c | 14 ++++++++++++++ > migration/migration.c | 19 +++++++++++++++---- > migration/options.c | 28 ++++++++++++++++++++++++++++ > migration/trace-events | 2 +- > 7 files changed, 79 insertions(+), 7 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 47dfef0278..fdc269e0a1 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -730,6 +730,16 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available bandwidth for migration. By > +# default, this value is zero, means the user is not aware of the > +# available bandwidth that can be used by QEMU migration, so QEMU will > +# estimate the bandwidth automatically. This can be set when the > +# estimated value is not accurate, while the user is able to guarantee > +# such bandwidth is available for migration purpose during the > +# migration procedure. When specified correctly, this can make the > +# switchover decision much more accurate, which will also be based on > +# the max downtime specified. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -803,7 +813,7 @@ > 'cpu-throttle-initial', 'cpu-throttle-increment', > 'cpu-throttle-tailslow', > 'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth', > - 'downtime-limit', > + 'available-bandwidth', 'downtime-limit', > { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] }, > 'block-incremental', > 'multifd-channels', > @@ -886,6 +896,9 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available migration bandwidth. Please refer > +# to comments in MigrationParameter for more information. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -971,6 +984,7 @@ > '*tls-hostname': 'StrOrNull', > '*tls-authz': 'StrOrNull', > '*max-bandwidth': 'size', > + '*available-bandwidth': 'size', > '*downtime-limit': 'uint64', > '*x-checkpoint-delay': { 'type': 'uint32', > 'features': [ 'unstable' ] }, > @@ -1078,6 +1092,9 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available migration bandwidth. Please refer > +# to comments in MigrationParameter for more information. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -1160,6 +1177,7 @@ > '*tls-hostname': 'str', > '*tls-authz': 'str', > '*max-bandwidth': 'size', > + '*available-bandwidth': 'size', > '*downtime-limit': 'uint64', > '*x-checkpoint-delay': { 'type': 'uint32', > 'features': [ 'unstable' ] }, > diff --git a/migration/migration.h b/migration/migration.h > index b7c8b67542..fadbf64d9d 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -283,7 +283,7 @@ struct MigrationState { > /* > * The final stage happens when the remaining data is smaller than > * this threshold; it's calculated from the requested downtime and > - * measured bandwidth > + * measured bandwidth, or available-bandwidth if user specified. > */ > int64_t threshold_size; > > diff --git a/migration/options.h b/migration/options.h > index 9aaf363322..ed2d9c92e7 100644 > --- a/migration/options.h > +++ b/migration/options.h > @@ -79,6 +79,7 @@ int migrate_decompress_threads(void); > uint64_t migrate_downtime_limit(void); > uint8_t migrate_max_cpu_throttle(void); > uint64_t migrate_max_bandwidth(void); > +uint64_t migrate_available_bandwidth(void); > uint64_t migrate_max_postcopy_bandwidth(void); > int migrate_multifd_channels(void); > MultiFDCompression migrate_multifd_compression(void); > diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c > index 9885d7c9f7..53fbe1b1af 100644 > --- a/migration/migration-hmp-cmds.c > +++ b/migration/migration-hmp-cmds.c > @@ -311,6 +311,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) > monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n", > MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH), > params->max_bandwidth); > + assert(params->has_available_bandwidth); > + monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n", > + MigrationParameter_str(MIGRATION_PARAMETER_AVAILABLE_BANDWIDTH), > + params->available_bandwidth); > assert(params->has_downtime_limit); > monitor_printf(mon, "%s: %" PRIu64 " ms\n", > MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT), > @@ -556,6 +560,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) > } > p->max_bandwidth = valuebw; > break; > + case MIGRATION_PARAMETER_AVAILABLE_BANDWIDTH: > + p->has_available_bandwidth = true; > + ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw); > + if (ret < 0 || valuebw > INT64_MAX > + || (size_t)valuebw != valuebw) { > + error_setg(&err, "Invalid size %s", valuestr); > + break; > + } > + p->available_bandwidth = valuebw; > + break; > case MIGRATION_PARAMETER_DOWNTIME_LIMIT: > p->has_downtime_limit = true; > visit_type_size(v, param, &p->downtime_limit, &err); > diff --git a/migration/migration.c b/migration/migration.c > index 91bba630a8..391ddfd8ce 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2671,7 +2671,7 @@ static void migration_update_counters(MigrationState *s, > { > uint64_t transferred, transferred_pages, time_spent; > uint64_t current_bytes; /* bytes transferred since the beginning */ > - double bandwidth; > + double bandwidth, avail_bw; > > if (current_time < s->iteration_start_time + BUFFER_DELAY) { > return; > @@ -2681,7 +2681,17 @@ static void migration_update_counters(MigrationState *s, > transferred = current_bytes - s->iteration_initial_bytes; > time_spent = current_time - s->iteration_start_time; > bandwidth = (double)transferred / time_spent; > - s->threshold_size = bandwidth * migrate_downtime_limit(); > + if (migrate_available_bandwidth()) { > + /* > + * If the user specified an available bandwidth, let's trust the > + * user so that can be more accurate than what we estimated. > + */ > + avail_bw = migrate_available_bandwidth(); > + } else { > + /* If the user doesn't specify bandwidth, we use the estimated */ > + avail_bw = bandwidth; > + } > + s->threshold_size = avail_bw * migrate_downtime_limit(); > > s->mbps = (((double) transferred * 8.0) / > ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; > @@ -2698,7 +2708,7 @@ static void migration_update_counters(MigrationState *s, > if (stat64_get(&mig_stats.dirty_pages_rate) && > transferred > 10000) { > s->expected_downtime = > - stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth; > + stat64_get(&mig_stats.dirty_bytes_last_sync) / avail_bw; > } > > migration_rate_reset(s->to_dst_file); > @@ -2706,7 +2716,8 @@ static void migration_update_counters(MigrationState *s, > update_iteration_initial_status(s); > > trace_migrate_transferred(transferred, time_spent, > - bandwidth, s->threshold_size); > + bandwidth, migrate_available_bandwidth(), > + s->threshold_size); > } > > static bool migration_can_switchover(MigrationState *s) > diff --git a/migration/options.c b/migration/options.c > index 5a9505adf7..160faca71a 100644 > --- a/migration/options.c > +++ b/migration/options.c > @@ -121,6 +121,8 @@ Property migration_properties[] = { > parameters.cpu_throttle_tailslow, false), > DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState, > parameters.max_bandwidth, MAX_THROTTLE), > + DEFINE_PROP_SIZE("available-bandwidth", MigrationState, > + parameters.available_bandwidth, 0), > DEFINE_PROP_UINT64("x-downtime-limit", MigrationState, > parameters.downtime_limit, > DEFAULT_MIGRATE_SET_DOWNTIME), > @@ -735,6 +737,13 @@ uint64_t migrate_max_bandwidth(void) > return s->parameters.max_bandwidth; > } > > +uint64_t migrate_available_bandwidth(void) > +{ > + MigrationState *s = migrate_get_current(); > + > + return s->parameters.available_bandwidth; > +} > + > uint64_t migrate_max_postcopy_bandwidth(void) > { > MigrationState *s = migrate_get_current(); > @@ -872,6 +881,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) > s->parameters.tls_authz : ""); > params->has_max_bandwidth = true; > params->max_bandwidth = s->parameters.max_bandwidth; > + params->has_available_bandwidth = true; > + params->available_bandwidth = s->parameters.available_bandwidth; > params->has_downtime_limit = true; > params->downtime_limit = s->parameters.downtime_limit; > params->has_x_checkpoint_delay = true; > @@ -1004,6 +1015,15 @@ bool migrate_params_check(MigrationParameters *params, Error **errp) > return false; > } > > + if (params->has_available_bandwidth && > + (params->available_bandwidth > SIZE_MAX)) { > + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, > + "available_bandwidth", > + "an integer in the range of 0 to "stringify(SIZE_MAX) > + " bytes/second"); > + return false; > + } > + > if (params->has_downtime_limit && > (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) { > error_setg(errp, QERR_INVALID_PARAMETER_VALUE, > @@ -1156,6 +1176,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params, > dest->max_bandwidth = params->max_bandwidth; > } > > + if (params->has_available_bandwidth) { > + dest->available_bandwidth = params->available_bandwidth; > + } > + > if (params->has_downtime_limit) { > dest->downtime_limit = params->downtime_limit; > } > @@ -1264,6 +1288,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > } > } > > + if (params->has_available_bandwidth) { > + s->parameters.available_bandwidth = params->available_bandwidth; > + } > + > if (params->has_downtime_limit) { > s->parameters.downtime_limit = params->downtime_limit; > } > diff --git a/migration/trace-events b/migration/trace-events > index 5259c1044b..fffed1f995 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -184,7 +184,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x" > source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 > source_return_path_thread_switchover_acked(void) "" > migration_thread_low_pending(uint64_t pending) "%" PRIu64 > -migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64 > +migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64 > process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" > process_incoming_migration_co_postcopy_end_main(void) "" > postcopy_preempt_enabled(bool value) "%d" > -- > 2.41.0 > With regards, Daniel
On Mon, 24 Jul 2023 at 19:05, Daniel P. Berrangé <berrange@redhat.com> wrote: > > I feel like rather than "available-bandwidth", we should call > it "max-convergance-bandwidth". "convergence" (I mention only since it's a proposed user-visible bit of API :-) -- PMM
On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote: > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote: > > Migration bandwidth is a very important value to live migration. It's > > because it's one of the major factors that we'll make decision on when to > > switchover to destination in a precopy process. > > To elaborate on this for those reading along... > > QEMU takes maxmimum downtime limit and multiplies by its estimate > of bandwidth. This gives a figure for the amount of data QEMU thinks > it can transfer within the downtime period. > > QEMU compares this figure to the amount of data that is still pending > at the end of an iteration. > > > This value is currently estimated by QEMU during the whole live migration > > process by monitoring how fast we were sending the data. This can be the > > most accurate bandwidth if in the ideal world, where we're always feeding > > unlimited data to the migration channel, and then it'll be limited to the > > bandwidth that is available. > > The QEMU estimate for available bandwidth will definitely be wrong, > potentially by orders of magnitude, if QEMU has a max bandwidth limit > set, as in that case it is never trying to push the peak rates available > from the NICs/network fabric. > > > The issue is QEMU itself may not be able to avoid those uncertainties on > > measuing the real "available migration bandwidth". At least not something > > I can think of so far. > > IIUC, you can query the NIC properties to find the hardware transfer > rate of the NICs. That doesn't imply apps can actually reach that > rate in practice - it has a decent chance of being a over-estimate > of bandwidth, possibly very very much over. > > Is such an over estimate better or worse than QEMU's current > under-estimate ? It depends on the POV. > > From the POV of QEMU, over-estimating means means it'll be not > be throttling as much as it should. That's not a downside of > migration - it makes it more likely for migration to complete :-) Heh. :) > > From the POV of non-QEMU apps though, if QEMU over-estimates, > it'll mean other apps get starved of network bandwidth. > > Overall I agree, there's no obvious way QEMU can ever come up > with a reliable estimate for bandwidth available. > > > One way to fix this is when the user is fully aware of the available > > bandwidth, then we can allow the user to help providing an accurate value. > > > > For example, if the user has a dedicated channel of 10Gbps for migration > > for this specific VM, the user can specify this bandwidth so QEMU can > > always do the calculation based on this fact, trusting the user as long as > > specified. > > I can see that in theory, but when considering a non-trivial > deployments of QEMU, I wonder if the user can really have any > such certainty of what is truely avaialble. It would need > global awareness of the whole network of hosts & workloads. Indeed it may or may not be easy always. The good thing about this parameter is we always use the old estimation if the user can't specify anything valid, so this is always optional not required. It solves the cases where the user can still specify accurately on the bw - our QE team has already verified that it worked for us on GPU tests, where it used to not be able to migrate at all with any sane downtime specified. I should have attached a Tested-By from Zhiyi but since this is not exactly the patch he was using I didn't. > > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > > other things). So it can be useful even if the network is not dedicated, > > but as long as the user can know a solid value. > > > > A new parameter "available-bandwidth" is introduced just for this. So when > > the user specified this parameter, instead of trusting the estimated value > > from QEMU itself (based on the QEMUFile send speed), let's trust the user > > more. > > I feel like rather than "available-bandwidth", we should call > it "max-convergance-bandwidth". > > To me that name would better reflect the fact that this isn't > really required to be a measure of how much NIC bandwidth is > available. It is merely an expression of a different bandwidth > limit to apply during switch over. > > IOW > > * max-bandwidth: limit during pre-copy main transfer > * max-convergance-bandwidth: limit during pre-copy switch-over > * max-postcopy-banwidth: limit during post-copy phase I worry the new name suggested is not straightforward enough at the 1st glance, even to me as a developer. "available-bandwidth" doesn't even bind that value to "convergence" at all, even though it was for solving this specific problem here. I wanted to make this parameter sololy for the admin to answer the question "how much bandwidth is available to QEMU migration in general?" That's pretty much straightforward IMHO. With that, it's pretty sane to consider using all we have during switchover (aka, unlimited bandwidth, as fast as possible). Maybe at some point we can even leverage this information for other purpose rather than making the migration converge. Thanks,
On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote: > On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote: > > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote: > > > Migration bandwidth is a very important value to live migration. It's > > > because it's one of the major factors that we'll make decision on when to > > > switchover to destination in a precopy process. > > > > To elaborate on this for those reading along... > > > > QEMU takes maxmimum downtime limit and multiplies by its estimate > > of bandwidth. This gives a figure for the amount of data QEMU thinks > > it can transfer within the downtime period. > > > > QEMU compares this figure to the amount of data that is still pending > > at the end of an iteration. > > > > > This value is currently estimated by QEMU during the whole live migration > > > process by monitoring how fast we were sending the data. This can be the > > > most accurate bandwidth if in the ideal world, where we're always feeding > > > unlimited data to the migration channel, and then it'll be limited to the > > > bandwidth that is available. > > > > The QEMU estimate for available bandwidth will definitely be wrong, > > potentially by orders of magnitude, if QEMU has a max bandwidth limit > > set, as in that case it is never trying to push the peak rates available > > from the NICs/network fabric. > > > > > The issue is QEMU itself may not be able to avoid those uncertainties on > > > measuing the real "available migration bandwidth". At least not something > > > I can think of so far. > > > > IIUC, you can query the NIC properties to find the hardware transfer > > rate of the NICs. That doesn't imply apps can actually reach that > > rate in practice - it has a decent chance of being a over-estimate > > of bandwidth, possibly very very much over. > > > > Is such an over estimate better or worse than QEMU's current > > under-estimate ? It depends on the POV. > > > > From the POV of QEMU, over-estimating means means it'll be not > > be throttling as much as it should. That's not a downside of > > migration - it makes it more likely for migration to complete :-) > > Heh. :) > > > > > From the POV of non-QEMU apps though, if QEMU over-estimates, > > it'll mean other apps get starved of network bandwidth. > > > > Overall I agree, there's no obvious way QEMU can ever come up > > with a reliable estimate for bandwidth available. > > > > > One way to fix this is when the user is fully aware of the available > > > bandwidth, then we can allow the user to help providing an accurate value. > > > > > > For example, if the user has a dedicated channel of 10Gbps for migration > > > for this specific VM, the user can specify this bandwidth so QEMU can > > > always do the calculation based on this fact, trusting the user as long as > > > specified. > > > > I can see that in theory, but when considering a non-trivial > > deployments of QEMU, I wonder if the user can really have any > > such certainty of what is truely avaialble. It would need > > global awareness of the whole network of hosts & workloads. > > Indeed it may or may not be easy always. > > The good thing about this parameter is we always use the old estimation if > the user can't specify anything valid, so this is always optional not > required. > > It solves the cases where the user can still specify accurately on the bw - > our QE team has already verified that it worked for us on GPU tests, where > it used to not be able to migrate at all with any sane downtime specified. > I should have attached a Tested-By from Zhiyi but since this is not exactly > the patch he was using I didn't. > > > > > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > > > other things). So it can be useful even if the network is not dedicated, > > > but as long as the user can know a solid value. > > > > > > A new parameter "available-bandwidth" is introduced just for this. So when > > > the user specified this parameter, instead of trusting the estimated value > > > from QEMU itself (based on the QEMUFile send speed), let's trust the user > > > more. > > > > I feel like rather than "available-bandwidth", we should call > > it "max-convergance-bandwidth". > > > > To me that name would better reflect the fact that this isn't > > really required to be a measure of how much NIC bandwidth is > > available. It is merely an expression of a different bandwidth > > limit to apply during switch over. > > > > IOW > > > > * max-bandwidth: limit during pre-copy main transfer > > * max-convergance-bandwidth: limit during pre-copy switch-over > > * max-postcopy-banwidth: limit during post-copy phase > > I worry the new name suggested is not straightforward enough at the 1st > glance, even to me as a developer. > > "available-bandwidth" doesn't even bind that value to "convergence" at all, > even though it was for solving this specific problem here. I wanted to make > this parameter sololy for the admin to answer the question "how much > bandwidth is available to QEMU migration in general?" That's pretty much > straightforward IMHO. With that, it's pretty sane to consider using all we > have during switchover (aka, unlimited bandwidth, as fast as possible). > > Maybe at some point we can even leverage this information for other purpose > rather than making the migration converge. The flipside is that the semantics & limits we want for convergance are already known to be different from what we wanted for pre-copy and post-copy. With that existing practice, it is probably more likely that we would not want to re-use the same setting for different cases, which makes me think a specifically targetted parameter is better. With regards, Daniel
Peter Xu <peterx@redhat.com> writes: > Migration bandwidth is a very important value to live migration. It's > because it's one of the major factors that we'll make decision on when to > switchover to destination in a precopy process. > > This value is currently estimated by QEMU during the whole live migration > process by monitoring how fast we were sending the data. This can be the > most accurate bandwidth if in the ideal world, where we're always feeding > unlimited data to the migration channel, and then it'll be limited to the > bandwidth that is available. > > However in reality it may be very different, e.g., over a 10Gbps network we > can see query-migrate showing migration bandwidth of only a few tens of > MB/s just because there are plenty of other things the migration thread > might be doing. For example, the migration thread can be busy scanning > zero pages, or it can be fetching dirty bitmap from other external dirty > sources (like vhost or KVM). It means we may not be pushing data as much > as possible to migration channel, so the bandwidth estimated from "how many > data we sent in the channel" can be dramatically inaccurate sometimes, > e.g., that a few tens of MB/s even if 10Gbps available, and then the > decision to switchover will be further affected by this. > > The migration may not even converge at all with the downtime specified, > with that wrong estimation of bandwidth. > > The issue is QEMU itself may not be able to avoid those uncertainties on > measuing the real "available migration bandwidth". At least not something > I can think of so far. > > One way to fix this is when the user is fully aware of the available > bandwidth, then we can allow the user to help providing an accurate value. > > For example, if the user has a dedicated channel of 10Gbps for migration > for this specific VM, the user can specify this bandwidth so QEMU can > always do the calculation based on this fact, trusting the user as long as > specified. > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > other things). So it can be useful even if the network is not dedicated, > but as long as the user can know a solid value. > > A new parameter "available-bandwidth" is introduced just for this. So when > the user specified this parameter, instead of trusting the estimated value > from QEMU itself (based on the QEMUFile send speed), let's trust the user > more. > > This can resolve issues like "unconvergence migration" which is caused by > hilarious low "migration bandwidth" detected for whatever reason. > > Reported-by: Zhiyi Guo <zhguo@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > qapi/migration.json | 20 +++++++++++++++++++- > migration/migration.h | 2 +- > migration/options.h | 1 + > migration/migration-hmp-cmds.c | 14 ++++++++++++++ > migration/migration.c | 19 +++++++++++++++---- > migration/options.c | 28 ++++++++++++++++++++++++++++ > migration/trace-events | 2 +- > 7 files changed, 79 insertions(+), 7 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 47dfef0278..fdc269e0a1 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -730,6 +730,16 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available bandwidth for migration. By > +# default, this value is zero, means the user is not aware of the > +# available bandwidth that can be used by QEMU migration, so QEMU will > +# estimate the bandwidth automatically. This can be set when the > +# estimated value is not accurate, while the user is able to guarantee > +# such bandwidth is available for migration purpose during the > +# migration procedure. When specified correctly, this can make the > +# switchover decision much more accurate, which will also be based on > +# the max downtime specified. (Since 8.2) Humor me: break lines slightly earlier, like # @available-bandwidth: to set available bandwidth for migration. By # default, this value is zero, means the user is not aware of the # available bandwidth that can be used by QEMU migration, so QEMU # will estimate the bandwidth automatically. This can be set when # the estimated value is not accurate, while the user is able to # guarantee such bandwidth is available for migration purpose # during the migration procedure. When specified correctly, this # can make the switchover decision much more accurate, which will # also be based on the max downtime specified. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -803,7 +813,7 @@ > 'cpu-throttle-initial', 'cpu-throttle-increment', > 'cpu-throttle-tailslow', > 'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth', > - 'downtime-limit', > + 'available-bandwidth', 'downtime-limit', > { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] }, > 'block-incremental', > 'multifd-channels', > @@ -886,6 +896,9 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available migration bandwidth. Please refer > +# to comments in MigrationParameter for more information. (Since 8.2) For better or worse, we duplicate full documentation between MigrationParameter, MigrateSetParameters, and MigrationParameters. This would be the first instance where we reference instead. I'm not opposed to use references, but if we do, I want them used consistently. > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -971,6 +984,7 @@ > '*tls-hostname': 'StrOrNull', > '*tls-authz': 'StrOrNull', > '*max-bandwidth': 'size', > + '*available-bandwidth': 'size', > '*downtime-limit': 'uint64', > '*x-checkpoint-delay': { 'type': 'uint32', > 'features': [ 'unstable' ] }, > @@ -1078,6 +1092,9 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available migration bandwidth. Please refer > +# to comments in MigrationParameter for more information. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -1160,6 +1177,7 @@ > '*tls-hostname': 'str', > '*tls-authz': 'str', > '*max-bandwidth': 'size', > + '*available-bandwidth': 'size', > '*downtime-limit': 'uint64', > '*x-checkpoint-delay': { 'type': 'uint32', > 'features': [ 'unstable' ] }, > diff --git a/migration/migration.h b/migration/migration.h > index b7c8b67542..fadbf64d9d 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -283,7 +283,7 @@ struct MigrationState { > /* > * The final stage happens when the remaining data is smaller than > * this threshold; it's calculated from the requested downtime and > - * measured bandwidth > + * measured bandwidth, or available-bandwidth if user specified. Suggest to scratch "user". > */ > int64_t threshold_size; > [...]
On Tue, Jul 25, 2023 at 10:16:52AM +0100, Daniel P. Berrangé wrote: > On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote: > > On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote: > > > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote: > > > > Migration bandwidth is a very important value to live migration. It's > > > > because it's one of the major factors that we'll make decision on when to > > > > switchover to destination in a precopy process. > > > > > > To elaborate on this for those reading along... > > > > > > QEMU takes maxmimum downtime limit and multiplies by its estimate > > > of bandwidth. This gives a figure for the amount of data QEMU thinks > > > it can transfer within the downtime period. > > > > > > QEMU compares this figure to the amount of data that is still pending > > > at the end of an iteration. > > > > > > > This value is currently estimated by QEMU during the whole live migration > > > > process by monitoring how fast we were sending the data. This can be the > > > > most accurate bandwidth if in the ideal world, where we're always feeding > > > > unlimited data to the migration channel, and then it'll be limited to the > > > > bandwidth that is available. > > > > > > The QEMU estimate for available bandwidth will definitely be wrong, > > > potentially by orders of magnitude, if QEMU has a max bandwidth limit > > > set, as in that case it is never trying to push the peak rates available > > > from the NICs/network fabric. > > > > > > > The issue is QEMU itself may not be able to avoid those uncertainties on > > > > measuing the real "available migration bandwidth". At least not something > > > > I can think of so far. > > > > > > IIUC, you can query the NIC properties to find the hardware transfer > > > rate of the NICs. That doesn't imply apps can actually reach that > > > rate in practice - it has a decent chance of being a over-estimate > > > of bandwidth, possibly very very much over. > > > > > > Is such an over estimate better or worse than QEMU's current > > > under-estimate ? It depends on the POV. > > > > > > From the POV of QEMU, over-estimating means means it'll be not > > > be throttling as much as it should. That's not a downside of > > > migration - it makes it more likely for migration to complete :-) > > > > Heh. :) > > > > > > > > From the POV of non-QEMU apps though, if QEMU over-estimates, > > > it'll mean other apps get starved of network bandwidth. > > > > > > Overall I agree, there's no obvious way QEMU can ever come up > > > with a reliable estimate for bandwidth available. > > > > > > > One way to fix this is when the user is fully aware of the available > > > > bandwidth, then we can allow the user to help providing an accurate value. > > > > > > > > For example, if the user has a dedicated channel of 10Gbps for migration > > > > for this specific VM, the user can specify this bandwidth so QEMU can > > > > always do the calculation based on this fact, trusting the user as long as > > > > specified. > > > > > > I can see that in theory, but when considering a non-trivial > > > deployments of QEMU, I wonder if the user can really have any > > > such certainty of what is truely avaialble. It would need > > > global awareness of the whole network of hosts & workloads. > > > > Indeed it may or may not be easy always. > > > > The good thing about this parameter is we always use the old estimation if > > the user can't specify anything valid, so this is always optional not > > required. > > > > It solves the cases where the user can still specify accurately on the bw - > > our QE team has already verified that it worked for us on GPU tests, where > > it used to not be able to migrate at all with any sane downtime specified. > > I should have attached a Tested-By from Zhiyi but since this is not exactly > > the patch he was using I didn't. > > > > > > > > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > > > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > > > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > > > > other things). So it can be useful even if the network is not dedicated, > > > > but as long as the user can know a solid value. > > > > > > > > A new parameter "available-bandwidth" is introduced just for this. So when > > > > the user specified this parameter, instead of trusting the estimated value > > > > from QEMU itself (based on the QEMUFile send speed), let's trust the user > > > > more. > > > > > > I feel like rather than "available-bandwidth", we should call > > > it "max-convergance-bandwidth". > > > > > > To me that name would better reflect the fact that this isn't > > > really required to be a measure of how much NIC bandwidth is > > > available. It is merely an expression of a different bandwidth > > > limit to apply during switch over. > > > > > > IOW > > > > > > * max-bandwidth: limit during pre-copy main transfer > > > * max-convergance-bandwidth: limit during pre-copy switch-over > > > * max-postcopy-banwidth: limit during post-copy phase > > > > I worry the new name suggested is not straightforward enough at the 1st > > glance, even to me as a developer. > > > > "available-bandwidth" doesn't even bind that value to "convergence" at all, > > even though it was for solving this specific problem here. I wanted to make > > this parameter sololy for the admin to answer the question "how much > > bandwidth is available to QEMU migration in general?" That's pretty much > > straightforward IMHO. With that, it's pretty sane to consider using all we > > have during switchover (aka, unlimited bandwidth, as fast as possible). > > > > Maybe at some point we can even leverage this information for other purpose > > rather than making the migration converge. > > The flipside is that the semantics & limits we want for convergance > are already known to be different from what we wanted for pre-copy > and post-copy. With that existing practice, it is probably more > likely that we would not want to re-use the same setting for different > cases, which makes me think a specifically targetted parameter is > better. We can make the semantics specific, no strong opinion here. I wished it can be as generic / easy as possible but maybe I went too far. Though, is there anything else we can choose from besides "max-convergence-bandwidth"? Or am I the only one that thinks it's hard to understand when put "max" and "convergence" together? When I take one step back to look at the whole "bandwidth" parameters, I am not sure why we'd even need both "convergence" and "postcopy" bandwidth being separate. With my current understanding of migration, we may actually need: - One bandwidth that we may want to run the background migration, aka, precopy migration, where we don't rush on pushing data. - One bandwidth that is whatever we can have maximum; for dedicated NIC that's the line speed. We should always use this full speed for important things. I'd say postcopy falls into this, and this "convergence" calculation should also rely on this. So another way to do this is we leverage the existing "postcopy-bandwidth" for calculation when set, it may help us to shrink the bandwidth values to two, but I'm not sure whether the name can be confusing too. Thanks,
On Tue, Jul 25, 2023 at 11:54:52AM -0400, Peter Xu wrote: > We can make the semantics specific, no strong opinion here. I wished it > can be as generic / easy as possible but maybe I went too far. > > Though, is there anything else we can choose from besides > "max-convergence-bandwidth"? Or am I the only one that thinks it's hard to > understand when put "max" and "convergence" together? > > When I take one step back to look at the whole "bandwidth" parameters, I am > not sure why we'd even need both "convergence" and "postcopy" bandwidth > being separate. With my current understanding of migration, we may > actually need: > > - One bandwidth that we may want to run the background migration, aka, > precopy migration, where we don't rush on pushing data. > > - One bandwidth that is whatever we can have maximum; for dedicated NIC > that's the line speed. We should always use this full speed for > important things. I'd say postcopy falls into this, and this > "convergence" calculation should also rely on this. I don't think postcopy should be assumed to run at line speed. At the point where you flip to post-copy mode, there could conceivably still be GB's worth of data still dirty and pending transfer. The migration convergance step is reasonable to put at line speed, because the max downtime parameter caps how long this burst will be, genrally to some fraction of a second. Once in post-copy mode, while the remaining data to transfer is finite, the wall clock time to complete that transfer may still be huge. It is unreasonable to assume users want to run at max linespeed for many minutes to finish post-copy at least in terms of the background transfer. You could make a case for the page fault handling to run at a higher bandwidth cap than the background transfer, but I think it is still probably not reasonable to run page fault fetches at line speed by default. IOW, I don't think we can put the same bandwidth limit on the short convergance operation, as on the longer post-copy operation. With regards, Daniel
On Tue, Jul 25, 2023 at 05:09:57PM +0100, Daniel P. Berrangé wrote: > On Tue, Jul 25, 2023 at 11:54:52AM -0400, Peter Xu wrote: > > We can make the semantics specific, no strong opinion here. I wished it > > can be as generic / easy as possible but maybe I went too far. > > > > Though, is there anything else we can choose from besides > > "max-convergence-bandwidth"? Or am I the only one that thinks it's hard to > > understand when put "max" and "convergence" together? > > > > When I take one step back to look at the whole "bandwidth" parameters, I am > > not sure why we'd even need both "convergence" and "postcopy" bandwidth > > being separate. With my current understanding of migration, we may > > actually need: > > > > - One bandwidth that we may want to run the background migration, aka, > > precopy migration, where we don't rush on pushing data. > > > > - One bandwidth that is whatever we can have maximum; for dedicated NIC > > that's the line speed. We should always use this full speed for > > important things. I'd say postcopy falls into this, and this > > "convergence" calculation should also rely on this. > > I don't think postcopy should be assumed to run at line speed. > > At the point where you flip to post-copy mode, there could > conceivably still be GB's worth of data still dirty and > pending transfer. > > The migration convergance step is reasonable to put at line > speed, because the max downtime parameter caps how long this > burst will be, genrally to some fraction of a second. > > Once in post-copy mode, while the remaining data to transfer > is finite, the wall clock time to complete that transfer may > still be huge. It is unreasonable to assume users want to > run at max linespeed for many minutes to finish post-copy > at least in terms of the background transfer. You could make > a case for the page fault handling to run at a higher bandwidth > cap than the background transfer, but I think it is still probably > not reasonable to run page fault fetches at line speed by default. > > IOW, I don't think we can put the same bandwidth limit on the > short convergance operation, as on the longer post-copy operation. Postcopy still heavily affects the performance of the VM for the whole duration, and afaiu that's so far the major issue (after we fix postcopy interruptions with recovery capability) that postcopy may not be wanted in many cases. If I am the admin I'd want it to run at full speed even if the pages were not directly requested just to shrink the duration of postcopy; I'd just want to make sure requested pages are queued sooner. But that's okay if any of us still thinks that three values would be helpful here, because we can simply have the latter two having the same value when we want. Three is the superset of two anyway. I see you used "convergance" explicitly even after PeterM's reply, is that what you prefer over "convergence"? I do see more occurances of "convergence" as a word in migration context, though. Any better name you can come up with, before I just go with "max-convergence-bandwidth" (I really cannot come up with anything better than this or available-bandwidth for now)? Thanks,
Hi, Markus, On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: > Peter Xu <peterx@redhat.com> writes: > > > Migration bandwidth is a very important value to live migration. It's > > because it's one of the major factors that we'll make decision on when to > > switchover to destination in a precopy process. > > > > This value is currently estimated by QEMU during the whole live migration > > process by monitoring how fast we were sending the data. This can be the > > most accurate bandwidth if in the ideal world, where we're always feeding > > unlimited data to the migration channel, and then it'll be limited to the > > bandwidth that is available. > > > > However in reality it may be very different, e.g., over a 10Gbps network we > > can see query-migrate showing migration bandwidth of only a few tens of > > MB/s just because there are plenty of other things the migration thread > > might be doing. For example, the migration thread can be busy scanning > > zero pages, or it can be fetching dirty bitmap from other external dirty > > sources (like vhost or KVM). It means we may not be pushing data as much > > as possible to migration channel, so the bandwidth estimated from "how many > > data we sent in the channel" can be dramatically inaccurate sometimes, > > e.g., that a few tens of MB/s even if 10Gbps available, and then the > > decision to switchover will be further affected by this. > > > > The migration may not even converge at all with the downtime specified, > > with that wrong estimation of bandwidth. > > > > The issue is QEMU itself may not be able to avoid those uncertainties on > > measuing the real "available migration bandwidth". At least not something > > I can think of so far. > > > > One way to fix this is when the user is fully aware of the available > > bandwidth, then we can allow the user to help providing an accurate value. > > > > For example, if the user has a dedicated channel of 10Gbps for migration > > for this specific VM, the user can specify this bandwidth so QEMU can > > always do the calculation based on this fact, trusting the user as long as > > specified. > > > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > > other things). So it can be useful even if the network is not dedicated, > > but as long as the user can know a solid value. > > > > A new parameter "available-bandwidth" is introduced just for this. So when > > the user specified this parameter, instead of trusting the estimated value > > from QEMU itself (based on the QEMUFile send speed), let's trust the user > > more. > > > > This can resolve issues like "unconvergence migration" which is caused by > > hilarious low "migration bandwidth" detected for whatever reason. > > > > Reported-by: Zhiyi Guo <zhguo@redhat.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > qapi/migration.json | 20 +++++++++++++++++++- > > migration/migration.h | 2 +- > > migration/options.h | 1 + > > migration/migration-hmp-cmds.c | 14 ++++++++++++++ > > migration/migration.c | 19 +++++++++++++++---- > > migration/options.c | 28 ++++++++++++++++++++++++++++ > > migration/trace-events | 2 +- > > 7 files changed, 79 insertions(+), 7 deletions(-) > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index 47dfef0278..fdc269e0a1 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -730,6 +730,16 @@ > > # @max-bandwidth: to set maximum speed for migration. maximum speed > > # in bytes per second. (Since 2.8) > > # > > +# @available-bandwidth: to set available bandwidth for migration. By > > +# default, this value is zero, means the user is not aware of the > > +# available bandwidth that can be used by QEMU migration, so QEMU will > > +# estimate the bandwidth automatically. This can be set when the > > +# estimated value is not accurate, while the user is able to guarantee > > +# such bandwidth is available for migration purpose during the > > +# migration procedure. When specified correctly, this can make the > > +# switchover decision much more accurate, which will also be based on > > +# the max downtime specified. (Since 8.2) > > Humor me: break lines slightly earlier, like > > # @available-bandwidth: to set available bandwidth for migration. By > # default, this value is zero, means the user is not aware of the > # available bandwidth that can be used by QEMU migration, so QEMU > # will estimate the bandwidth automatically. This can be set when > # the estimated value is not accurate, while the user is able to > # guarantee such bandwidth is available for migration purpose > # during the migration procedure. When specified correctly, this > # can make the switchover decision much more accurate, which will > # also be based on the max downtime specified. (Since 8.2) Sure. > > > +# > > # @downtime-limit: set maximum tolerated downtime for migration. > > # maximum downtime in milliseconds (Since 2.8) > > # > > @@ -803,7 +813,7 @@ > > 'cpu-throttle-initial', 'cpu-throttle-increment', > > 'cpu-throttle-tailslow', > > 'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth', > > - 'downtime-limit', > > + 'available-bandwidth', 'downtime-limit', > > { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] }, > > 'block-incremental', > > 'multifd-channels', > > @@ -886,6 +896,9 @@ > > # @max-bandwidth: to set maximum speed for migration. maximum speed > > # in bytes per second. (Since 2.8) > > # > > +# @available-bandwidth: to set available migration bandwidth. Please refer > > +# to comments in MigrationParameter for more information. (Since 8.2) > > For better or worse, we duplicate full documentation between > MigrationParameter, MigrateSetParameters, and MigrationParameters. This > would be the first instance where we reference instead. I'm not opposed > to use references, but if we do, I want them used consistently. We discussed this over the other "switchover" parameter, but that patchset just stranded.. Perhaps I just provide a pre-requisite patch to remove all the comments in MigrateSetParameters and MigrationParameters, letting them all point to MigrationParameter? One thing I should have mentioned much earlier is that this patch is for 8.2 material. > > > +# > > # @downtime-limit: set maximum tolerated downtime for migration. > > # maximum downtime in milliseconds (Since 2.8) > > # > > @@ -971,6 +984,7 @@ > > '*tls-hostname': 'StrOrNull', > > '*tls-authz': 'StrOrNull', > > '*max-bandwidth': 'size', > > + '*available-bandwidth': 'size', > > '*downtime-limit': 'uint64', > > '*x-checkpoint-delay': { 'type': 'uint32', > > 'features': [ 'unstable' ] }, > > @@ -1078,6 +1092,9 @@ > > # @max-bandwidth: to set maximum speed for migration. maximum speed > > # in bytes per second. (Since 2.8) > > # > > +# @available-bandwidth: to set available migration bandwidth. Please refer > > +# to comments in MigrationParameter for more information. (Since 8.2) > > +# > > # @downtime-limit: set maximum tolerated downtime for migration. > > # maximum downtime in milliseconds (Since 2.8) > > # > > @@ -1160,6 +1177,7 @@ > > '*tls-hostname': 'str', > > '*tls-authz': 'str', > > '*max-bandwidth': 'size', > > + '*available-bandwidth': 'size', > > '*downtime-limit': 'uint64', > > '*x-checkpoint-delay': { 'type': 'uint32', > > 'features': [ 'unstable' ] }, > > diff --git a/migration/migration.h b/migration/migration.h > > index b7c8b67542..fadbf64d9d 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -283,7 +283,7 @@ struct MigrationState { > > /* > > * The final stage happens when the remaining data is smaller than > > * this threshold; it's calculated from the requested downtime and > > - * measured bandwidth > > + * measured bandwidth, or available-bandwidth if user specified. > > Suggest to scratch "user". Will do, thanks.
On Tue, Jul 25, 2023 at 12:38:23PM -0400, Peter Xu wrote: > I see you used "convergance" explicitly even after PeterM's reply, is that > what you prefer over "convergence"? I do see more occurances of > "convergence" as a word in migration context, though. Ignore my speling erors :-) > Any better name you > can come up with, before I just go with "max-convergence-bandwidth" (I > really cannot come up with anything better than this or available-bandwidth > for now)? Anothre idea could be 'max-switchover-bandwidth' ? With regards, Daniel
Peter Xu <peterx@redhat.com> writes: > Hi, Markus, > > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: [...] >> For better or worse, we duplicate full documentation between >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This >> would be the first instance where we reference instead. I'm not opposed >> to use references, but if we do, I want them used consistently. > > We discussed this over the other "switchover" parameter, but that patchset > just stranded.. > > Perhaps I just provide a pre-requisite patch to remove all the comments in > MigrateSetParameters and MigrationParameters, letting them all point to > MigrationParameter? Simplifies maintaining the doc commments. But how does it affect the documentation generated from it? Better, neutral, or worse? > One thing I should have mentioned much earlier is that this patch is for > 8.2 material. Understood. [...]
On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: > Peter Xu <peterx@redhat.com> writes: > > > Hi, Markus, > > > > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: > > [...] > > >> For better or worse, we duplicate full documentation between > >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This > >> would be the first instance where we reference instead. I'm not opposed > >> to use references, but if we do, I want them used consistently. > > > > We discussed this over the other "switchover" parameter, but that patchset > > just stranded.. > > > > Perhaps I just provide a pre-requisite patch to remove all the comments in > > MigrateSetParameters and MigrationParameters, letting them all point to > > MigrationParameter? > > Simplifies maintaining the doc commments. But how does it affect the > documentation generated from it? Better, neutral, or worse? Probably somewhere neutral. There are definitely benefits, shorter man/html page in total, and avoid accidentally different docs over the same fields. E.g., we sometimes have different wordings for different objects: max-cpu-throttle maximum cpu throttle percentage. Defaults to 99. (Since 3.1) max-cpu-throttle: int (optional) maximum cpu throttle percentage. The default value is 99. (Since 3.1) This one is fine, but it's just very easy to leak in something that shows differently. It's good to provide coherent documentation for the same fields over all three objects. When looking at qemu-qmp-ref.7, it can be like this when we can invite the reader to read the other section (assuming we only keep MigrationParameter to keep the documentations): MigrationParameters (Object) The object structure to represent a list of migration parameters. The optional members aren't actually optional. For detailed explanation for each of the field, please refer to the documentation of MigrationParameter. But the problem is we always will generate the Members entry, where now it'll all filled up with "Not documented".. Members announce-initial: int (optional) Not documented announce-max: int (optional) Not documented announce-rounds: int (optional) Not documented [...] I think maybe it's better we just list the members without showing "Not documented" every time for the other two objects. Not sure whether it's an easy way to fix it, or is it a major issue. For developers, dedup the comment should always be a win, afaict. Thanks,
On Tue, Jul 25, 2023 at 06:10:18PM +0100, Daniel P. Berrangé wrote: > On Tue, Jul 25, 2023 at 12:38:23PM -0400, Peter Xu wrote: > > I see you used "convergance" explicitly even after PeterM's reply, is that > > what you prefer over "convergence"? I do see more occurances of > > "convergence" as a word in migration context, though. > > Ignore my speling erors :-) Ohh, so that's not intended. :) Actually there's indeed the word "convergance" which can be applied here, but since I'm not native I really can't figure out the fact even with a dictionary. > > > Any better name you > > can come up with, before I just go with "max-convergence-bandwidth" (I > > really cannot come up with anything better than this or available-bandwidth > > for now)? > > Anothre idea could be 'max-switchover-bandwidth' ? Yeah, switchover can also be called a phase of migration, so in parallel of existing precopy/postcopy, sounds like a good one. I'll respin with that if no further suggestions. Thanks.
Peter Xu <peterx@redhat.com> writes: > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > Hi, Markus, >> > >> > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: >> >> [...] >> >> >> For better or worse, we duplicate full documentation between >> >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This >> >> would be the first instance where we reference instead. I'm not opposed >> >> to use references, but if we do, I want them used consistently. >> > >> > We discussed this over the other "switchover" parameter, but that patchset >> > just stranded.. >> > >> > Perhaps I just provide a pre-requisite patch to remove all the comments in >> > MigrateSetParameters and MigrationParameters, letting them all point to >> > MigrationParameter? >> >> Simplifies maintaining the doc commments. But how does it affect the >> documentation generated from it? Better, neutral, or worse? > > Probably somewhere neutral. There are definitely benefits, shorter > man/html page in total, and avoid accidentally different docs over the same > fields. E.g., we sometimes have different wordings for different objects: > > max-cpu-throttle > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) > > max-cpu-throttle: int (optional) > maximum cpu throttle percentage. The default value is 99. (Since 3.1) > > This one is fine, but it's just very easy to leak in something that shows > differently. It's good to provide coherent documentation for the same > fields over all three objects. Yes, but we've been doing okay regardless. The drawback of replacing duplicates by references is that readers need to follow the references. Less onerous when the references can be clicked. If we de-duplicate, which copy do we keep, MigrationParameter, MigrateSetParameters, or MigrationParameter? Do we have an idea which of these users are likely to read first? > When looking at qemu-qmp-ref.7, it can be like this when we can invite the > reader to read the other section (assuming we only keep MigrationParameter > to keep the documentations): > > MigrationParameters (Object) > > The object structure to represent a list of migration parameters. > The optional members aren't actually optional. For detailed > explanation for each of the field, please refer to the documentation > of MigrationParameter. > > But the problem is we always will generate the Members entry, where now > it'll all filled up with "Not documented".. > > Members > announce-initial: int (optional) > Not documented > > announce-max: int (optional) > Not documented > > announce-rounds: int (optional) > Not documented > > [...] > > I think maybe it's better we just list the members without showing "Not > documented" every time for the other two objects. Not sure whether it's an > easy way to fix it, or is it a major issue. The automatic generation of "Not documented" documentation is a stop-gap. Leaving a member undocumented should be a hard error. It isn't only because we have 511 instances to fix. Perhaps a directive to ignore undocumented members could be useful. I.e. to suppress the automatic "Not documented" documented now, the error later. We could write things out in longhand instead, like # @announce-initial: Same as MigrationParameter member # @announce-initial. > For developers, dedup the comment should always be a win, afaict. No argument.
On Fri, Aug 04, 2023 at 02:06:02PM +0200, Markus Armbruster wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: > >> Peter Xu <peterx@redhat.com> writes: > >> > >> > Hi, Markus, > >> > > >> > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: > >> > >> [...] > >> > >> >> For better or worse, we duplicate full documentation between > >> >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This > >> >> would be the first instance where we reference instead. I'm not opposed > >> >> to use references, but if we do, I want them used consistently. > >> > > >> > We discussed this over the other "switchover" parameter, but that patchset > >> > just stranded.. > >> > > >> > Perhaps I just provide a pre-requisite patch to remove all the comments in > >> > MigrateSetParameters and MigrationParameters, letting them all point to > >> > MigrationParameter? > >> > >> Simplifies maintaining the doc commments. But how does it affect the > >> documentation generated from it? Better, neutral, or worse? > > > > Probably somewhere neutral. There are definitely benefits, shorter > > man/html page in total, and avoid accidentally different docs over the same > > fields. E.g., we sometimes have different wordings for different objects: > > > > max-cpu-throttle > > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) > > > > max-cpu-throttle: int (optional) > > maximum cpu throttle percentage. The default value is 99. (Since 3.1) > > > > This one is fine, but it's just very easy to leak in something that shows > > differently. It's good to provide coherent documentation for the same > > fields over all three objects. > > Yes, but we've been doing okay regardless. > > The drawback of replacing duplicates by references is that readers need > to follow the references. > > Less onerous when the references can be clicked. > > If we de-duplicate, which copy do we keep, MigrationParameter, > MigrateSetParameters, or MigrationParameter? Do we have an idea which > of these users are likely to read first? I chose MigrationParameter for no explicit reason, because I can't find good argument to differenciate them. Please let me know if you have any suggestion. > > > When looking at qemu-qmp-ref.7, it can be like this when we can invite the > > reader to read the other section (assuming we only keep MigrationParameter > > to keep the documentations): > > > > MigrationParameters (Object) > > > > The object structure to represent a list of migration parameters. > > The optional members aren't actually optional. For detailed > > explanation for each of the field, please refer to the documentation > > of MigrationParameter. > > > > But the problem is we always will generate the Members entry, where now > > it'll all filled up with "Not documented".. > > > > Members > > announce-initial: int (optional) > > Not documented > > > > announce-max: int (optional) > > Not documented > > > > announce-rounds: int (optional) > > Not documented > > > > [...] > > > > I think maybe it's better we just list the members without showing "Not > > documented" every time for the other two objects. Not sure whether it's an > > easy way to fix it, or is it a major issue. > > The automatic generation of "Not documented" documentation is a > stop-gap. Leaving a member undocumented should be a hard error. It > isn't only because we have 511 instances to fix. > > Perhaps a directive to ignore undocumented members could be useful. > I.e. to suppress the automatic "Not documented" documented now, the > error later. > > We could write things out in longhand instead, like > > # @announce-initial: Same as MigrationParameter member > # @announce-initial. Yes I can definitely do this. Since I don't really know whether the "put a link" will work at all (at least man page doesn't really have those, does it?), would this be the way you suggest us forward? Note that I am also always happy to simply duplicate the three paragraphs just like before; that's not something I must do with solving the migration problem so far, we can decouple the two problems essentially. But since we're at it, if you think worthwhile we may have a chance get rid of duplicated documents here (before code) I can try. > > > For developers, dedup the comment should always be a win, afaict. > > No argument. Let me explain a bit: I meant the patch author who will reduce writting duplicated documents, making sure everything match together. And reviewers who will read the duplicated content, making sure that everything match together again. The two efforts can be avoided. That's all I meant here for when I was referring to as "developers" in this context.. Not everyone as a common sense of developer. Thanks,
On Wed, Jul 26, 2023 at 11:12:31AM -0400, Peter Xu wrote: > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: > > Peter Xu <peterx@redhat.com> writes: > > > > > Hi, Markus, > > > > > > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: > > > > [...] > > > > >> For better or worse, we duplicate full documentation between > > >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This > > >> would be the first instance where we reference instead. I'm not opposed > > >> to use references, but if we do, I want them used consistently. > > > > > > We discussed this over the other "switchover" parameter, but that patchset > > > just stranded.. > > > > > > Perhaps I just provide a pre-requisite patch to remove all the comments in > > > MigrateSetParameters and MigrationParameters, letting them all point to > > > MigrationParameter? > > > > Simplifies maintaining the doc commments. But how does it affect the > > documentation generated from it? Better, neutral, or worse? > > Probably somewhere neutral. There are definitely benefits, shorter > > man/html page in total, and avoid accidentally different docs over the same > fields. E.g., we sometimes have different wordings for different objects: > > max-cpu-throttle > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) > > max-cpu-throttle: int (optional) > maximum cpu throttle percentage. The default value is 99. (Since 3.1) > > This one is fine, but it's just very easy to leak in something that shows > differently. It's good to provide coherent documentation for the same > fields over all three objects. Do we have any actual problems though where the difference in docs is actively harmful ? I agree there's a possbility of the duplication being problematic, but unless its actually the case in reality, it is merely a theoretical downside. IMHO for someone consuming the docs, this patch is worse, not neutral. > > When looking at qemu-qmp-ref.7, it can be like this when we can invite the > reader to read the other section (assuming we only keep MigrationParameter > to keep the documentations): > > MigrationParameters (Object) > > The object structure to represent a list of migration parameters. > The optional members aren't actually optional. For detailed > explanation for each of the field, please refer to the documentation > of MigrationParameter. > > But the problem is we always will generate the Members entry, where now > it'll all filled up with "Not documented".. > > Members > announce-initial: int (optional) > Not documented > > announce-max: int (optional) > Not documented > > announce-rounds: int (optional) > Not documented > > [...] > > I think maybe it's better we just list the members without showing "Not > documented" every time for the other two objects. Not sure whether it's an > easy way to fix it, or is it a major issue. > > For developers, dedup the comment should always be a win, afaict. IMHO that's optimizing for the wrong people and isn't a measurable win anyway. Someone adding a new parameter can just cut+paste the same docs string in the three places. It is a cost, but it is a small one time cost, where having "not documented" is a ongoing cost for consumers of our API. I don't think the burden on QEMU maintainers adding new migration parameters is significant enough to justify ripping out our existing docs. With regards, Daniel
On Fri, Aug 04, 2023 at 02:39:15PM +0100, Daniel P. Berrangé wrote: > On Wed, Jul 26, 2023 at 11:12:31AM -0400, Peter Xu wrote: > > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: > > > Peter Xu <peterx@redhat.com> writes: > > > > > > > Hi, Markus, > > > > > > > > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: > > > > > > [...] > > > > > > >> For better or worse, we duplicate full documentation between > > > >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This > > > >> would be the first instance where we reference instead. I'm not opposed > > > >> to use references, but if we do, I want them used consistently. > > > > > > > > We discussed this over the other "switchover" parameter, but that patchset > > > > just stranded.. > > > > > > > > Perhaps I just provide a pre-requisite patch to remove all the comments in > > > > MigrateSetParameters and MigrationParameters, letting them all point to > > > > MigrationParameter? > > > > > > Simplifies maintaining the doc commments. But how does it affect the > > > documentation generated from it? Better, neutral, or worse? > > > > Probably somewhere neutral. There are definitely benefits, shorter > > > > man/html page in total, and avoid accidentally different docs over the same > > fields. E.g., we sometimes have different wordings for different objects: > > > > max-cpu-throttle > > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) > > > > max-cpu-throttle: int (optional) > > maximum cpu throttle percentage. The default value is 99. (Since 3.1) > > > > This one is fine, but it's just very easy to leak in something that shows > > differently. It's good to provide coherent documentation for the same > > fields over all three objects. > > Do we have any actual problems though where the difference in > docs is actively harmful ? I agree there's a possbility of the > duplication being problematic, but unless its actually the > case in reality, it is merely a theoretical downside. > > IMHO for someone consuming the docs, this patch is worse, not neutral. I agree. > > > > > When looking at qemu-qmp-ref.7, it can be like this when we can invite the > > reader to read the other section (assuming we only keep MigrationParameter > > to keep the documentations): > > > > MigrationParameters (Object) > > > > The object structure to represent a list of migration parameters. > > The optional members aren't actually optional. For detailed > > explanation for each of the field, please refer to the documentation > > of MigrationParameter. > > > > But the problem is we always will generate the Members entry, where now > > it'll all filled up with "Not documented".. > > > > Members > > announce-initial: int (optional) > > Not documented > > > > announce-max: int (optional) > > Not documented > > > > announce-rounds: int (optional) > > Not documented > > > > [...] > > > > I think maybe it's better we just list the members without showing "Not > > documented" every time for the other two objects. Not sure whether it's an > > easy way to fix it, or is it a major issue. > > > > For developers, dedup the comment should always be a win, afaict. > > IMHO that's optimizing for the wrong people and isn't a measurable > win anyway. Someone adding a new parameter can just cut+paste the > same docs string in the three places. It is a cost, but it is a > small one time cost, where having "not documented" is a ongoing > cost for consumers of our API. > > I don't think the burden on QEMU maintainers adding new migration > parameters is significant enough to justify ripping out our existing > docs. I had that strong feeling mostly because I'm a migration developer and migration reviewer, so I suffer on both sides. :) I was trying to stand out for either any future author/reviewer from that pov, but I think indeed the ultimate consumer is the reader. Fundamentally to solve the problem, maybe we must dedup the objects rather than the documents only? I'll try to dig a bit more in this area next week if I have time, any link for previous context would be welcomed (or obvious reason that we just won't be able to do that; I only know that at least shouldn't be trivial to resolve). For this one - Markus, let me know what do you think, or I'll simply repost v3 with the duplications (when needed, probably later not sooner). Thanks,
Peter Xu <peterx@redhat.com> writes: > On Fri, Aug 04, 2023 at 02:39:15PM +0100, Daniel P. Berrangé wrote: >> On Wed, Jul 26, 2023 at 11:12:31AM -0400, Peter Xu wrote: >> > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: >> > > Peter Xu <peterx@redhat.com> writes: >> > > >> > > > Hi, Markus, >> > > > >> > > > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: >> > > >> > > [...] >> > > >> > > >> For better or worse, we duplicate full documentation between >> > > >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This >> > > >> would be the first instance where we reference instead. I'm not opposed >> > > >> to use references, but if we do, I want them used consistently. >> > > > >> > > > We discussed this over the other "switchover" parameter, but that patchset >> > > > just stranded.. >> > > > >> > > > Perhaps I just provide a pre-requisite patch to remove all the comments in >> > > > MigrateSetParameters and MigrationParameters, letting them all point to >> > > > MigrationParameter? >> > > >> > > Simplifies maintaining the doc commments. But how does it affect the >> > > documentation generated from it? Better, neutral, or worse? >> > >> > Probably somewhere neutral. There are definitely benefits, shorter >> > >> > man/html page in total, and avoid accidentally different docs over the same >> > fields. E.g., we sometimes have different wordings for different objects: >> > >> > max-cpu-throttle >> > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) >> > >> > max-cpu-throttle: int (optional) >> > maximum cpu throttle percentage. The default value is 99. (Since 3.1) >> > >> > This one is fine, but it's just very easy to leak in something that shows >> > differently. It's good to provide coherent documentation for the same >> > fields over all three objects. >> >> Do we have any actual problems though where the difference in >> docs is actively harmful ? I agree there's a possbility of the >> duplication being problematic, but unless its actually the >> case in reality, it is merely a theoretical downside. >> >> IMHO for someone consuming the docs, this patch is worse, not neutral. > > I agree. > >> >> > >> > When looking at qemu-qmp-ref.7, it can be like this when we can invite the >> > reader to read the other section (assuming we only keep MigrationParameter >> > to keep the documentations): >> > >> > MigrationParameters (Object) >> > >> > The object structure to represent a list of migration parameters. >> > The optional members aren't actually optional. For detailed >> > explanation for each of the field, please refer to the documentation >> > of MigrationParameter. >> > >> > But the problem is we always will generate the Members entry, where now >> > it'll all filled up with "Not documented".. >> > >> > Members >> > announce-initial: int (optional) >> > Not documented >> > >> > announce-max: int (optional) >> > Not documented >> > >> > announce-rounds: int (optional) >> > Not documented >> > >> > [...] >> > >> > I think maybe it's better we just list the members without showing "Not >> > documented" every time for the other two objects. Not sure whether it's an >> > easy way to fix it, or is it a major issue. >> > >> > For developers, dedup the comment should always be a win, afaict. >> >> IMHO that's optimizing for the wrong people and isn't a measurable >> win anyway. Someone adding a new parameter can just cut+paste the >> same docs string in the three places. It is a cost, but it is a >> small one time cost, where having "not documented" is a ongoing >> cost for consumers of our API. >> >> I don't think the burden on QEMU maintainers adding new migration >> parameters is significant enough to justify ripping out our existing >> docs. > > I had that strong feeling mostly because I'm a migration developer and > migration reviewer, so I suffer on both sides. :) I was trying to stand out > for either any future author/reviewer from that pov, but I think indeed the > ultimate consumer is the reader. > > Fundamentally to solve the problem, maybe we must dedup the objects rather > than the documents only? I'll try to dig a bit more in this area next week > if I have time, any link for previous context would be welcomed (or obvious > reason that we just won't be able to do that; I only know that at least > shouldn't be trivial to resolve). > > For this one - Markus, let me know what do you think, or I'll simply repost > v3 with the duplications (when needed, probably later not sooner). Since you want to investigate de-duplicating the objects, do that *first*. If you succeed, we don't have to argue about de-duplicating docs.
Peter Xu <peterx@redhat.com> writes: > On Fri, Aug 04, 2023 at 02:06:02PM +0200, Markus Armbruster wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Wed, Jul 26, 2023 at 08:21:35AM +0200, Markus Armbruster wrote: >> >> Peter Xu <peterx@redhat.com> writes: >> >> >> >> > Hi, Markus, >> >> > >> >> > On Tue, Jul 25, 2023 at 01:10:01PM +0200, Markus Armbruster wrote: >> >> >> >> [...] >> >> >> >> >> For better or worse, we duplicate full documentation between >> >> >> MigrationParameter, MigrateSetParameters, and MigrationParameters. This >> >> >> would be the first instance where we reference instead. I'm not opposed >> >> >> to use references, but if we do, I want them used consistently. >> >> > >> >> > We discussed this over the other "switchover" parameter, but that patchset >> >> > just stranded.. >> >> > >> >> > Perhaps I just provide a pre-requisite patch to remove all the comments in >> >> > MigrateSetParameters and MigrationParameters, letting them all point to >> >> > MigrationParameter? >> >> >> >> Simplifies maintaining the doc commments. But how does it affect the >> >> documentation generated from it? Better, neutral, or worse? >> > >> > Probably somewhere neutral. There are definitely benefits, shorter >> > man/html page in total, and avoid accidentally different docs over the same >> > fields. E.g., we sometimes have different wordings for different objects: >> > >> > max-cpu-throttle >> > maximum cpu throttle percentage. Defaults to 99. (Since 3.1) >> > >> > max-cpu-throttle: int (optional) >> > maximum cpu throttle percentage. The default value is 99. (Since 3.1) >> > >> > This one is fine, but it's just very easy to leak in something that shows >> > differently. It's good to provide coherent documentation for the same >> > fields over all three objects. >> >> Yes, but we've been doing okay regardless. >> >> The drawback of replacing duplicates by references is that readers need >> to follow the references. >> >> Less onerous when the references can be clicked. >> >> If we de-duplicate, which copy do we keep, MigrationParameter, >> MigrateSetParameters, or MigrationParameter? Do we have an idea which >> of these users are likely to read first? > > I chose MigrationParameter for no explicit reason, because I can't find > good argument to differenciate them. Please let me know if you have any > suggestion. > >> >> > When looking at qemu-qmp-ref.7, it can be like this when we can invite the >> > reader to read the other section (assuming we only keep MigrationParameter >> > to keep the documentations): >> > >> > MigrationParameters (Object) >> > >> > The object structure to represent a list of migration parameters. >> > The optional members aren't actually optional. For detailed >> > explanation for each of the field, please refer to the documentation >> > of MigrationParameter. >> > >> > But the problem is we always will generate the Members entry, where now >> > it'll all filled up with "Not documented".. >> > >> > Members >> > announce-initial: int (optional) >> > Not documented >> > >> > announce-max: int (optional) >> > Not documented >> > >> > announce-rounds: int (optional) >> > Not documented >> > >> > [...] >> > >> > I think maybe it's better we just list the members without showing "Not >> > documented" every time for the other two objects. Not sure whether it's an >> > easy way to fix it, or is it a major issue. >> >> The automatic generation of "Not documented" documentation is a >> stop-gap. Leaving a member undocumented should be a hard error. It >> isn't only because we have 511 instances to fix. >> >> Perhaps a directive to ignore undocumented members could be useful. >> I.e. to suppress the automatic "Not documented" documented now, the >> error later. >> >> We could write things out in longhand instead, like >> >> # @announce-initial: Same as MigrationParameter member >> # @announce-initial. > > Yes I can definitely do this. > > Since I don't really know whether the "put a link" will work at all (at > least man page doesn't really have those, does it?), would this be the way > you suggest us forward? I don't remember offhand how the QAPI doc generation machinery adds links. I can look it up for you after my vacation (two weeks, starting basically now). > Note that I am also always happy to simply duplicate the three paragraphs > just like before; that's not something I must do with solving the migration > problem so far, we can decouple the two problems essentially. But since > we're at it, if you think worthwhile we may have a chance get rid of > duplicated documents here (before code) I can try. > >> >> > For developers, dedup the comment should always be a win, afaict. >> >> No argument. > > Let me explain a bit: I meant the patch author who will reduce writting > duplicated documents, making sure everything match together. And reviewers > who will read the duplicated content, making sure that everything match > together again. The two efforts can be avoided. That's all I meant here > for when I was referring to as "developers" in this context.. Not everyone > as a common sense of developer. By "no argument", I mean we don't need to argue about this. I actually agree with you that de-duplication would be net positive for developers.
diff --git a/qapi/migration.json b/qapi/migration.json index 47dfef0278..fdc269e0a1 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -730,6 +730,16 @@ # @max-bandwidth: to set maximum speed for migration. maximum speed # in bytes per second. (Since 2.8) # +# @available-bandwidth: to set available bandwidth for migration. By +# default, this value is zero, means the user is not aware of the +# available bandwidth that can be used by QEMU migration, so QEMU will +# estimate the bandwidth automatically. This can be set when the +# estimated value is not accurate, while the user is able to guarantee +# such bandwidth is available for migration purpose during the +# migration procedure. When specified correctly, this can make the +# switchover decision much more accurate, which will also be based on +# the max downtime specified. (Since 8.2) +# # @downtime-limit: set maximum tolerated downtime for migration. # maximum downtime in milliseconds (Since 2.8) # @@ -803,7 +813,7 @@ 'cpu-throttle-initial', 'cpu-throttle-increment', 'cpu-throttle-tailslow', 'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth', - 'downtime-limit', + 'available-bandwidth', 'downtime-limit', { 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] }, 'block-incremental', 'multifd-channels', @@ -886,6 +896,9 @@ # @max-bandwidth: to set maximum speed for migration. maximum speed # in bytes per second. (Since 2.8) # +# @available-bandwidth: to set available migration bandwidth. Please refer +# to comments in MigrationParameter for more information. (Since 8.2) +# # @downtime-limit: set maximum tolerated downtime for migration. # maximum downtime in milliseconds (Since 2.8) # @@ -971,6 +984,7 @@ '*tls-hostname': 'StrOrNull', '*tls-authz': 'StrOrNull', '*max-bandwidth': 'size', + '*available-bandwidth': 'size', '*downtime-limit': 'uint64', '*x-checkpoint-delay': { 'type': 'uint32', 'features': [ 'unstable' ] }, @@ -1078,6 +1092,9 @@ # @max-bandwidth: to set maximum speed for migration. maximum speed # in bytes per second. (Since 2.8) # +# @available-bandwidth: to set available migration bandwidth. Please refer +# to comments in MigrationParameter for more information. (Since 8.2) +# # @downtime-limit: set maximum tolerated downtime for migration. # maximum downtime in milliseconds (Since 2.8) # @@ -1160,6 +1177,7 @@ '*tls-hostname': 'str', '*tls-authz': 'str', '*max-bandwidth': 'size', + '*available-bandwidth': 'size', '*downtime-limit': 'uint64', '*x-checkpoint-delay': { 'type': 'uint32', 'features': [ 'unstable' ] }, diff --git a/migration/migration.h b/migration/migration.h index b7c8b67542..fadbf64d9d 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -283,7 +283,7 @@ struct MigrationState { /* * The final stage happens when the remaining data is smaller than * this threshold; it's calculated from the requested downtime and - * measured bandwidth + * measured bandwidth, or available-bandwidth if user specified. */ int64_t threshold_size; diff --git a/migration/options.h b/migration/options.h index 9aaf363322..ed2d9c92e7 100644 --- a/migration/options.h +++ b/migration/options.h @@ -79,6 +79,7 @@ int migrate_decompress_threads(void); uint64_t migrate_downtime_limit(void); uint8_t migrate_max_cpu_throttle(void); uint64_t migrate_max_bandwidth(void); +uint64_t migrate_available_bandwidth(void); uint64_t migrate_max_postcopy_bandwidth(void); int migrate_multifd_channels(void); MultiFDCompression migrate_multifd_compression(void); diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c index 9885d7c9f7..53fbe1b1af 100644 --- a/migration/migration-hmp-cmds.c +++ b/migration/migration-hmp-cmds.c @@ -311,6 +311,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n", MigrationParameter_str(MIGRATION_PARAMETER_MAX_BANDWIDTH), params->max_bandwidth); + assert(params->has_available_bandwidth); + monitor_printf(mon, "%s: %" PRIu64 " bytes/second\n", + MigrationParameter_str(MIGRATION_PARAMETER_AVAILABLE_BANDWIDTH), + params->available_bandwidth); assert(params->has_downtime_limit); monitor_printf(mon, "%s: %" PRIu64 " ms\n", MigrationParameter_str(MIGRATION_PARAMETER_DOWNTIME_LIMIT), @@ -556,6 +560,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) } p->max_bandwidth = valuebw; break; + case MIGRATION_PARAMETER_AVAILABLE_BANDWIDTH: + p->has_available_bandwidth = true; + ret = qemu_strtosz_MiB(valuestr, NULL, &valuebw); + if (ret < 0 || valuebw > INT64_MAX + || (size_t)valuebw != valuebw) { + error_setg(&err, "Invalid size %s", valuestr); + break; + } + p->available_bandwidth = valuebw; + break; case MIGRATION_PARAMETER_DOWNTIME_LIMIT: p->has_downtime_limit = true; visit_type_size(v, param, &p->downtime_limit, &err); diff --git a/migration/migration.c b/migration/migration.c index 91bba630a8..391ddfd8ce 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2671,7 +2671,7 @@ static void migration_update_counters(MigrationState *s, { uint64_t transferred, transferred_pages, time_spent; uint64_t current_bytes; /* bytes transferred since the beginning */ - double bandwidth; + double bandwidth, avail_bw; if (current_time < s->iteration_start_time + BUFFER_DELAY) { return; @@ -2681,7 +2681,17 @@ static void migration_update_counters(MigrationState *s, transferred = current_bytes - s->iteration_initial_bytes; time_spent = current_time - s->iteration_start_time; bandwidth = (double)transferred / time_spent; - s->threshold_size = bandwidth * migrate_downtime_limit(); + if (migrate_available_bandwidth()) { + /* + * If the user specified an available bandwidth, let's trust the + * user so that can be more accurate than what we estimated. + */ + avail_bw = migrate_available_bandwidth(); + } else { + /* If the user doesn't specify bandwidth, we use the estimated */ + avail_bw = bandwidth; + } + s->threshold_size = avail_bw * migrate_downtime_limit(); s->mbps = (((double) transferred * 8.0) / ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; @@ -2698,7 +2708,7 @@ static void migration_update_counters(MigrationState *s, if (stat64_get(&mig_stats.dirty_pages_rate) && transferred > 10000) { s->expected_downtime = - stat64_get(&mig_stats.dirty_bytes_last_sync) / bandwidth; + stat64_get(&mig_stats.dirty_bytes_last_sync) / avail_bw; } migration_rate_reset(s->to_dst_file); @@ -2706,7 +2716,8 @@ static void migration_update_counters(MigrationState *s, update_iteration_initial_status(s); trace_migrate_transferred(transferred, time_spent, - bandwidth, s->threshold_size); + bandwidth, migrate_available_bandwidth(), + s->threshold_size); } static bool migration_can_switchover(MigrationState *s) diff --git a/migration/options.c b/migration/options.c index 5a9505adf7..160faca71a 100644 --- a/migration/options.c +++ b/migration/options.c @@ -121,6 +121,8 @@ Property migration_properties[] = { parameters.cpu_throttle_tailslow, false), DEFINE_PROP_SIZE("x-max-bandwidth", MigrationState, parameters.max_bandwidth, MAX_THROTTLE), + DEFINE_PROP_SIZE("available-bandwidth", MigrationState, + parameters.available_bandwidth, 0), DEFINE_PROP_UINT64("x-downtime-limit", MigrationState, parameters.downtime_limit, DEFAULT_MIGRATE_SET_DOWNTIME), @@ -735,6 +737,13 @@ uint64_t migrate_max_bandwidth(void) return s->parameters.max_bandwidth; } +uint64_t migrate_available_bandwidth(void) +{ + MigrationState *s = migrate_get_current(); + + return s->parameters.available_bandwidth; +} + uint64_t migrate_max_postcopy_bandwidth(void) { MigrationState *s = migrate_get_current(); @@ -872,6 +881,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) s->parameters.tls_authz : ""); params->has_max_bandwidth = true; params->max_bandwidth = s->parameters.max_bandwidth; + params->has_available_bandwidth = true; + params->available_bandwidth = s->parameters.available_bandwidth; params->has_downtime_limit = true; params->downtime_limit = s->parameters.downtime_limit; params->has_x_checkpoint_delay = true; @@ -1004,6 +1015,15 @@ bool migrate_params_check(MigrationParameters *params, Error **errp) return false; } + if (params->has_available_bandwidth && + (params->available_bandwidth > SIZE_MAX)) { + error_setg(errp, QERR_INVALID_PARAMETER_VALUE, + "available_bandwidth", + "an integer in the range of 0 to "stringify(SIZE_MAX) + " bytes/second"); + return false; + } + if (params->has_downtime_limit && (params->downtime_limit > MAX_MIGRATE_DOWNTIME)) { error_setg(errp, QERR_INVALID_PARAMETER_VALUE, @@ -1156,6 +1176,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params, dest->max_bandwidth = params->max_bandwidth; } + if (params->has_available_bandwidth) { + dest->available_bandwidth = params->available_bandwidth; + } + if (params->has_downtime_limit) { dest->downtime_limit = params->downtime_limit; } @@ -1264,6 +1288,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) } } + if (params->has_available_bandwidth) { + s->parameters.available_bandwidth = params->available_bandwidth; + } + if (params->has_downtime_limit) { s->parameters.downtime_limit = params->downtime_limit; } diff --git a/migration/trace-events b/migration/trace-events index 5259c1044b..fffed1f995 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -184,7 +184,7 @@ source_return_path_thread_shut(uint32_t val) "0x%x" source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 source_return_path_thread_switchover_acked(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 -migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " max_size %" PRId64 +migrate_transferred(uint64_t tranferred, uint64_t time_spent, uint64_t bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_spent %" PRIu64 " bandwidth %" PRIu64 " avail_bw %" PRIu64 " max_size %" PRId64 process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d" process_incoming_migration_co_postcopy_end_main(void) "" postcopy_preempt_enabled(bool value) "%d"
Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process. This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available. However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes, e.g., that a few tens of MB/s even if 10Gbps available, and then the decision to switchover will be further affected by this. The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth. The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far. One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value. For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. When the user wants to have migration only use 5Gbps out of that 10Gbps, one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for other things). So it can be useful even if the network is not dedicated, but as long as the user can know a solid value. A new parameter "available-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), let's trust the user more. This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason. Reported-by: Zhiyi Guo <zhguo@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- qapi/migration.json | 20 +++++++++++++++++++- migration/migration.h | 2 +- migration/options.h | 1 + migration/migration-hmp-cmds.c | 14 ++++++++++++++ migration/migration.c | 19 +++++++++++++++---- migration/options.c | 28 ++++++++++++++++++++++++++++ migration/trace-events | 2 +- 7 files changed, 79 insertions(+), 7 deletions(-)