From patchwork Tue Aug 29 21:42:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827472 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=NhBnyFXO; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb36K1BGdz26jL for ; Wed, 30 Aug 2023 09:09:45 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7nI-0000wT-3W; Tue, 29 Aug 2023 19:06:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U2-0006KO-CG for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:46 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6Tz-0000d4-2q for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2JlJsdxT1usV5bXAnIEbEDfMePHILK1A36pxGg3ZqUo=; b=NhBnyFXOUksiJNbhuXJH9TmjwoROFkLEUBdxeBmpMBs9aJQtVEqhfB0RgnGNpi6lew40BC G/AxZADLNPRNXkU5AU1XoA0znJ9QuRjuHD2MxSGPNI69JkMxSTwem/LBHFATBZT/aJtCDM LTQuYjx1qyoaKZn9nYHn9Ehg24CyWiQ= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-481-rQwPNAOJMBKlLCBw0v6BCw-1; Tue, 29 Aug 2023 17:42:40 -0400 X-MC-Unique: rQwPNAOJMBKlLCBw0v6BCw-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-410a8a0ba9eso13031601cf.0 for ; Tue, 29 Aug 2023 14:42:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345360; x=1693950160; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2JlJsdxT1usV5bXAnIEbEDfMePHILK1A36pxGg3ZqUo=; b=g1FiGY+M3/KQR9sdbz6CoOE0Y/rIckV4qB3dy1NvObAHFEBtDYSzPk9TFnh1MqvTNY 7oT0w8FSKDPCt9SXf1Wn58h0kwP+hYBZ2CHZnJkcZ/Ep/QUnxC5vPwdZxOHnwW58v5Ak wUTkN2hKDMAYOt3rApRbPaihf84zocfHxf3hDFlUJ6dR42bFKnAIabz4HFjN7M4jICYY o6XcpMix1br+HaxsuwIVSkW93Qj94eYsW28q+StbGT+xQU56ur/pZI8kQ0Xy17EYRuW/ T4N4vjGyyHogG6XT0rBBOcuHglHDY6RN7Ab7P6wBDLCK84yW+Gtb2IeTsILMzfZyuxme UAww== X-Gm-Message-State: AOJu0YxFXI6j7/B1byW6Ixcj01KPYf1ek7AsHiafJ1ddH1Fl9Gn0lF0a NdRAnexyqK2YqiOx3K4XbhqlLxNMZ5ELasKbD7Z5Rw9QJZzj3CvfnUT56RpHLQpJ2TlM+Rf7KZk dbOzagj9iVkxfYvYPdIXQ+S56Wzy83VCgr5mZYk65H8zoLXWuVnHXOzvQRYszw1XdhjXmXHMN X-Received: by 2002:ac8:7d02:0:b0:3ff:2a6b:5a76 with SMTP id g2-20020ac87d02000000b003ff2a6b5a76mr186684qtb.5.1693345359762; Tue, 29 Aug 2023 14:42:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDFWqhYIQkU0GDs5xm7pm/GyK8t/ZrvXi6M1NVzU4+tfZdLA9Psq1nPdCUKkWgxsNq8iU0lg== X-Received: by 2002:ac8:7d02:0:b0:3ff:2a6b:5a76 with SMTP id g2-20020ac87d02000000b003ff2a6b5a76mr186672qtb.5.1693345359331; Tue, 29 Aug 2023 14:42:39 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:38 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 1/9] migration: Display error in query-migrate irrelevant of status Date: Tue, 29 Aug 2023 17:42:27 -0400 Message-ID: <20230829214235.69309-2-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Display it as long as being set, irrelevant of FAILED status. E.g., it may also be applicable to PAUSED stage of postcopy, to provide hint on what has gone wrong. The error_mutex seems to be overlooked when referencing the error, add it to be very safe. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- qapi/migration.json | 5 ++--- migration/migration.c | 8 +++++--- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/qapi/migration.json b/qapi/migration.json index 8843e74b59..c241b6d318 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -230,9 +230,8 @@ # throttled during auto-converge. This is only present when # auto-converge has started throttling guest cpus. (Since 2.7) # -# @error-desc: the human readable error description string, when -# @status is 'failed'. Clients should not attempt to parse the -# error strings. (Since 2.7) +# @error-desc: the human readable error description string. Clients +# should not attempt to parse the error strings. (Since 2.7) # # @postcopy-blocktime: total time when all vCPU were blocked during # postcopy live migration. This is only present when the diff --git a/migration/migration.c b/migration/migration.c index 5528acb65e..c60064d48e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1052,9 +1052,6 @@ static void fill_source_migration_info(MigrationInfo *info) break; case MIGRATION_STATUS_FAILED: info->has_status = true; - if (s->error) { - info->error_desc = g_strdup(error_get_pretty(s->error)); - } break; case MIGRATION_STATUS_CANCELLED: info->has_status = true; @@ -1064,6 +1061,11 @@ static void fill_source_migration_info(MigrationInfo *info) break; } info->status = state; + + QEMU_LOCK_GUARD(&s->error_mutex); + if (s->error) { + info->error_desc = g_strdup(error_get_pretty(s->error)); + } } static void fill_destination_migration_info(MigrationInfo *info) From patchwork Tue Aug 29 21:42:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827501 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZBlx3PFM; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3QB6gJ6z1ygP for ; Wed, 30 Aug 2023 09:23:30 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7o6-0002Lu-UI; Tue, 29 Aug 2023 19:07:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U3-0006Ki-Ck for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:47 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U0-0000db-7G for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VBJUvVKr9ywnzsnevPjeEHCHeGxc9AC8Z+wZG3cb7nU=; b=ZBlx3PFM8vq5j0Go/C6shebDL7M+ty07w3i0ayM2n4AriglcG6galXtfdn5CiHZsSkIoIj TzrdQwBwlqqP/ZJBOrp5BiG3tfe3uBg7QpivH9SBdIk7yL51lE2iTeWoxJKphgX1KJ1k2H 7KxdfforEZFEheUkbaEnU0Mmh41NmDI= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-433-mXKfchJAPo2W0Vh5LsQm_Q-1; Tue, 29 Aug 2023 17:42:41 -0400 X-MC-Unique: mXKfchJAPo2W0Vh5LsQm_Q-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-410a3a86a88so14248051cf.1 for ; Tue, 29 Aug 2023 14:42:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345361; x=1693950161; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VBJUvVKr9ywnzsnevPjeEHCHeGxc9AC8Z+wZG3cb7nU=; b=AFbny35MTi8rdWQRz7XZttKRQKbiaHDL3NzAdLoPeOatvXccUKlviAIXouaFjeppik ZBhg3lJEbthoRtsrrMfPOjB00H3H51u6lpPDiHOvZWhokFXOLo59c4EbC+0c6dsi2jGY qMg1NHqm4JpO/NgZS8rjfGCRYbkXIDh2bSKT9JPdIyN+fA2xdcj5jaNjQna4R4HarJMs twENB01nrFCmPOcTDu903WTGHEOuwYYEw6zQkA2BIkJqiH8h3xJ2zd+OWqqBnXPUYpXo nbFK2AKq0e1N0uV+DTL7CwXi+LIX5UNXScgkFuoZkOTUfcnQPH/FqRZLwYkUIqpcWJgU 009A== X-Gm-Message-State: AOJu0YwwHE2raeDGGwKD6eUJ8lck2EpmyRxpTQdXKwIbVtRw/MTi+5G0 K3cn+JrMckTiNwDYUAF6prSERqsEPc2Bh9yqUlDkQdfAngqJjz74jI2IttMOl9B3rGwCofvwpFg RVJeiQtmLEyC7tHaue1BdTXT8pgPDWSBM3/LF9zHjm+27uWVhipMcrGWZvFWnj6z8aZOo72oz X-Received: by 2002:a05:622a:1a9a:b0:412:d46:a8df with SMTP id s26-20020a05622a1a9a00b004120d46a8dfmr336878qtc.1.1693345361210; Tue, 29 Aug 2023 14:42:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFJBSUGxJ13pa6PoF3G+tqFQH83862AxvTlhOEgGJFohSAnoAm1NC1Z08PfXv4InYYcHPAm/g== X-Received: by 2002:a05:622a:1a9a:b0:412:d46:a8df with SMTP id s26-20020a05622a1a9a00b004120d46a8dfmr336857qtc.1.1693345360808; Tue, 29 Aug 2023 14:42:40 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:39 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 2/9] migration: Let migrate_set_error() take ownership Date: Tue, 29 Aug 2023 17:42:28 -0400 Message-ID: <20230829214235.69309-3-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URG_BIZ=0.573 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org migrate_set_error() used one error_copy() so it always copy an error. However that's not the major use case - the major use case is one would like to pass the error to migrate_set_error() without further touching the error. It can be proved if we see most of the callers are freeing the error explicitly right afterwards. There're a few outliers (only if when the caller) where we can use error_copy() explicitly there. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 4 ++-- migration/channel.c | 1 - migration/migration.c | 22 ++++++++++++++++------ migration/multifd.c | 10 ++++------ migration/postcopy-ram.c | 1 - migration/ram.c | 1 - 6 files changed, 22 insertions(+), 17 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 6eea18db36..76e35a5ecf 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -465,7 +465,7 @@ bool migration_has_all_channels(void); uint64_t migrate_max_downtime(void); -void migrate_set_error(MigrationState *s, const Error *error); +void migrate_set_error(MigrationState *s, Error *error); void migrate_fd_connect(MigrationState *s, Error *error_in); @@ -510,7 +510,7 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, void *opaque); void migration_make_urgent_request(void); void migration_consume_urgent_request(void); bool migration_rate_limit(void); -void migration_cancel(const Error *error); +void migration_cancel(Error *error); void populate_vfio_info(MigrationInfo *info); void reset_vfio_bytes_transferred(void); diff --git a/migration/channel.c b/migration/channel.c index ca3319a309..48b3f6abd6 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -90,7 +90,6 @@ void migration_channel_connect(MigrationState *s, } } migrate_fd_connect(s, error); - error_free(error); } diff --git a/migration/migration.c b/migration/migration.c index c60064d48e..0f3ca168ed 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -162,7 +162,7 @@ void migration_object_init(void) dirty_bitmap_mig_init(); } -void migration_cancel(const Error *error) +void migration_cancel(Error *error) { if (error) { migrate_set_error(current_migration, error); @@ -1218,11 +1218,22 @@ static void migrate_fd_cleanup_bh(void *opaque) object_unref(OBJECT(s)); } -void migrate_set_error(MigrationState *s, const Error *error) +/* + * Set error for current migration state. The `error' ownership will be + * moved from the caller to MigrationState, so the caller doesn't need to + * free the error. + * + * If the caller still needs to reference the `error' passed in, one should + * use error_copy() explicitly. + */ +void migrate_set_error(MigrationState *s, Error *error) { QEMU_LOCK_GUARD(&s->error_mutex); if (!s->error) { - s->error = error_copy(error); + /* Record the first error triggered */ + s->error = error; + } else { + error_free(error); } } @@ -1235,7 +1246,7 @@ static void migrate_error_free(MigrationState *s) } } -static void migrate_fd_error(MigrationState *s, const Error *error) +static void migrate_fd_error(MigrationState *s, Error *error) { trace_migrate_fd_error(error_get_pretty(error)); assert(s->to_dst_file == NULL); @@ -1703,7 +1714,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, if (!resume_requested) { yank_unregister_instance(MIGRATION_YANK_INSTANCE); } - migrate_fd_error(s, local_err); + migrate_fd_error(s, error_copy(local_err)); error_propagate(errp, local_err); return; } @@ -2626,7 +2637,6 @@ static MigThrError migration_detect_error(MigrationState *s) if (local_error) { migrate_set_error(s, local_error); - error_free(local_error); } if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret) { diff --git a/migration/multifd.c b/migration/multifd.c index 0f6b203877..69d56104fb 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -551,7 +551,6 @@ void multifd_save_cleanup(void) multifd_send_state->ops->send_cleanup(p, &local_err); if (local_err) { migrate_set_error(migrate_get_current(), local_err); - error_free(local_err); } } qemu_sem_destroy(&multifd_send_state->channels_ready); @@ -750,7 +749,6 @@ out: if (local_err) { trace_multifd_send_error(p->id); multifd_send_terminate_threads(local_err); - error_free(local_err); } /* @@ -883,7 +881,6 @@ static void multifd_new_send_channel_cleanup(MultiFDSendParams *p, */ p->quit = true; object_unref(OBJECT(ioc)); - error_free(err); } static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) @@ -1148,7 +1145,6 @@ static void *multifd_recv_thread(void *opaque) if (local_err) { multifd_recv_terminate_threads(local_err); - error_free(local_err); } qemu_mutex_lock(&p->mutex); p->running = false; @@ -1240,7 +1236,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) id = multifd_recv_initial_packet(ioc, &local_err); if (id < 0) { - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate_prepend(errp, local_err, "failed to receive packet" " via multifd channel %d: ", @@ -1253,7 +1250,8 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp) if (p->c != NULL) { error_setg(&local_err, "multifd: received id '%d' already setup'", id); - multifd_recv_terminate_threads(local_err); + /* Copy local error because we'll also return it to caller */ + multifd_recv_terminate_threads(error_copy(local_err)); error_propagate(errp, local_err); return; } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 29aea9456d..8a93b5504d 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -1594,7 +1594,6 @@ postcopy_preempt_send_channel_done(MigrationState *s, { if (local_err) { migrate_set_error(s, local_err); - error_free(local_err); } else { migration_ioc_register_yank(ioc); s->postcopy_qemufile_src = qemu_file_new_output(ioc); diff --git a/migration/ram.c b/migration/ram.c index 9040d66e61..fc7fe0e6e8 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4308,7 +4308,6 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host, */ error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr); migration_cancel(err); - error_free(err); } switch (ps) { From patchwork Tue Aug 29 21:42:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827503 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=U7PrNOTS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3Qt40Zbz1ygP for ; Wed, 30 Aug 2023 09:24:06 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7nu-0001mq-4x; Tue, 29 Aug 2023 19:07:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U4-0006L1-8D for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U2-0000f7-4e for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FM2G6N8TTxn10FY7UisZKQrDckXBn6HpxC0g3RragwY=; b=U7PrNOTSatWCkmW6hMhYLXV/WKsz1hzbqVhKhuDvBY7NJNQ1sojlEq8z4ev9gBS17yWr6k hczaiTub/2wIyo9XW1U9stwnQ50fkZpDrSnzmPqL8HZ3bhhSuGlgZsez3I063pLdlMQZY2 nYOntMg2kK5I6rwfZtXhnx4AgLpf/QY= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-484-xGEJIytQOlSLzCGWMTQfRA-1; Tue, 29 Aug 2023 17:42:43 -0400 X-MC-Unique: xGEJIytQOlSLzCGWMTQfRA-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-410a3a86a88so14248071cf.1 for ; Tue, 29 Aug 2023 14:42:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345362; x=1693950162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FM2G6N8TTxn10FY7UisZKQrDckXBn6HpxC0g3RragwY=; b=da3MdPPKHO2UPJiXGCd/7qFsRLGIS4HZdcm9ErAB9LtQS8434XDOGxLvGmaNv5kzBQ lZWs7p94MTg43TYEQMaZxnosvxKQbbk/LQ4IG+x5EI33pCxIG7+HsUbXbQlU1IxWj+NY p3FPK07pPfg5PjjmiKLQNSW9svbfv7EtqMqonvklYenJycJ1O9LXY3fLQT9RbEb02eLv xGXaSiDAd9HbzfoEwXmq0B/phYi/Fh8hDcBt/fOgx+QptKoo2JDMzjiRpiXwEQaMS4Ot WZfXR2jvK7qRjb85oQXCnYVhdq8h+SsjS/AfKqIS7+cIp3zzA7Ylvl6Haawg31JZd3Gv GGHw== X-Gm-Message-State: AOJu0YwZvXNetKvHn41TToejEeuXhG4quryw2k2N1dokvBURTk8Jh90i B9my10wu/jVKylJxiSUJt4Qk4eQek8N2pZRARTuKhXXaE2fyA6RIWbYReH2phIDhsZpsbFarNSJ l+EWZrNWq63bb5snTENDsv2BxpxWgt1tgMWcUA9ttdssPIjy0BvyJV9fE9JewBCtOCTGbDIMP X-Received: by 2002:a05:622a:1a9a:b0:412:d46:a8df with SMTP id s26-20020a05622a1a9a00b004120d46a8dfmr336948qtc.1.1693345362595; Tue, 29 Aug 2023 14:42:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH4VT62JaFovTQ//w6Au11QVpHj50TsY8a0mEWat8GBpfC8IFQmFN5UVdYEMKUc+ZCUk6y8/Q== X-Received: by 2002:a05:622a:1a9a:b0:412:d46:a8df with SMTP id s26-20020a05622a1a9a00b004120d46a8dfmr336928qtc.1.1693345362293; Tue, 29 Aug 2023 14:42:42 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:41 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 3/9] migration: Introduce migrate_has_error() Date: Tue, 29 Aug 2023 17:42:29 -0400 Message-ID: <20230829214235.69309-4-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Introduce a helper to detect whether MigrationState.error is set for whatever reason. It is intended to not taking the error_mutex here because neither do we reference the pointer, nor do we modify the pointer. State why it's safe to do so. This is preparation work for any thread (e.g. source return path thread) to setup errors in an unified way to MigrationState, rather than relying on its own way to set errors (mark_source_rp_bad()). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 1 + migration/migration.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/migration/migration.h b/migration/migration.h index 76e35a5ecf..1ba38eecfa 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -466,6 +466,7 @@ bool migration_has_all_channels(void); uint64_t migrate_max_downtime(void); void migrate_set_error(MigrationState *s, Error *error); +bool migrate_has_error(MigrationState *s); void migrate_fd_connect(MigrationState *s, Error *error_in); diff --git a/migration/migration.c b/migration/migration.c index 0f3ca168ed..84c17dfbbb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1237,6 +1237,13 @@ void migrate_set_error(MigrationState *s, Error *error) } } +bool migrate_has_error(MigrationState *s) +{ + /* The lock is not helpful here, but still follow the rule */ + QEMU_LOCK_GUARD(&s->error_mutex); + return qatomic_read(&s->error); +} + static void migrate_error_free(MigrationState *s) { QEMU_LOCK_GUARD(&s->error_mutex); From patchwork Tue Aug 29 21:42:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827477 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LL01/Bkd; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3Br21yZz1yfX for ; Wed, 30 Aug 2023 09:13:40 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7nR-0000yI-DW; Tue, 29 Aug 2023 19:06:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U6-0006Lj-VC for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U3-0000gE-5l for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PdKS8z7xAB9LImRiAMESm5ufmMTyU1i9XZnD0iZK6a8=; b=LL01/Bkdn3FS4qmtSqqe2NtUv5sClo5KZpHz3+EZU4m6yGFDcJ4X6jofDzmUX0+SV3X1EU 0OkgUMtN3elfWaOQZWHwOixTna/QV+DkdbQ6hxJuOAGi/fIaJpuSk+LOuuHSivCVqpnNKr zPM3Zx5VeHRdVZk48JAlTr0nlFN49io= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-342-0K0SX9DePbK1krwnPUOiFA-1; Tue, 29 Aug 2023 17:42:44 -0400 X-MC-Unique: 0K0SX9DePbK1krwnPUOiFA-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-76f1cc68e65so39577885a.1 for ; Tue, 29 Aug 2023 14:42:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345364; x=1693950164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PdKS8z7xAB9LImRiAMESm5ufmMTyU1i9XZnD0iZK6a8=; b=KvV0FzyERCK0DRL8Ut9whQESA8ZKGzr/wW8uzxpOkMMGOrB8Kd1LVSazFkM8UGpUS+ ezCxDM5n/tqlaYf5gOpbydyk166ROxNbHIjhxvNnqihmdDrO6bpt/O9oiBtNxJKURg4Z V060KMbuUro4JQRwXD680bwCjJ+GTfUe9uUIWTUdCJOiABaJm2vLTEzONU20NLW7JcNO Yb8DR+xJq0+haXCC0GsBpmxU2/TZMzWxJPJQEKek/bQYDK7V8iFdQGs7yRU+Q2k4It2q a0E0gTErZv8g470LjCS3mgArXWab7G+3AKbo48MVwF3fDzPdr1quu9BxHwmJEB5WM2pD t2iQ== X-Gm-Message-State: AOJu0YxvFi1qBtY4CBeR6waGK0PBHL4vv6kaqkdVAwuZdmV7Fd4PkvmM SjDpez34PeQEHUAeZnR5T0i3HRAdQifpwfdnNkbJr+RCoU5uHIReGxJrgZSs/q5bzPrWO2VaqQc NsqaMDudMwGJ6RTEkmQntyQd6YKD8Z9CXKA77XxwIW2G9/EK2oHcTB3u3znDiqI2FIV9P9PGu X-Received: by 2002:a05:622a:1aa2:b0:412:944:542f with SMTP id s34-20020a05622a1aa200b004120944542fmr265414qtc.2.1693345363867; Tue, 29 Aug 2023 14:42:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEqvlFuY8x96Z5tra1v8rXsb0ngVM4pBxaz/Vu9DPOPGABT8kds3IBDhBQ0KAQKjVSgKpDMAg== X-Received: by 2002:a05:622a:1aa2:b0:412:944:542f with SMTP id s34-20020a05622a1aa200b004120944542fmr265398qtc.2.1693345363328; Tue, 29 Aug 2023 14:42:43 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:42 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 4/9] migration: Refactor error handling in source return path Date: Tue, 29 Aug 2023 17:42:30 -0400 Message-ID: <20230829214235.69309-5-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SORBS_WEB=1.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/migration.h | 1 - migration/ram.h | 5 +- migration/migration.c | 122 +++++++++++++++++++++-------------------- migration/ram.c | 41 +++++++------- migration/trace-events | 2 +- 5 files changed, 89 insertions(+), 82 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 1ba38eecfa..a5c95e4d43 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -297,7 +297,6 @@ struct MigrationState { /* Protected by qemu_file_lock */ QEMUFile *from_dst_file; QemuThread rp_thread; - bool error; /* * We can also check non-zero of rp_thread, but there's no "official" * way to do this, so this bool makes it slightly more elegant. diff --git a/migration/ram.h b/migration/ram.h index 145c915ca7..14ed666d58 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -51,7 +51,8 @@ uint64_t ram_bytes_total(void); void mig_throttle_counter_reset(void); uint64_t ram_pagesize_summary(void); -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len); +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp); void ram_postcopy_migrated_memory_release(MigrationState *ms); /* For outgoing discard bitmap */ void ram_postcopy_send_discard_bitmap(MigrationState *ms); @@ -71,7 +72,7 @@ void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr); void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr); int64_t ramblock_recv_bitmap_send(QEMUFile *file, const char *block_name); -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp); bool ramblock_page_is_discarded(RAMBlock *rb, ram_addr_t start); void postcopy_preempt_shutdown_file(MigrationState *s); void *postcopy_preempt_thread(void *opaque); diff --git a/migration/migration.c b/migration/migration.c index 84c17dfbbb..def9d119b1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1424,7 +1424,6 @@ void migrate_init(MigrationState *s) s->to_dst_file = NULL; s->state = MIGRATION_STATUS_NONE; s->rp_state.from_dst_file = NULL; - s->rp_state.error = false; s->mbps = 0.0; s->pages_per_second = 0.0; s->downtime = 0; @@ -1743,14 +1742,14 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -/* migration thread support */ -/* - * Something bad happened to the RP stream, mark an error - * The caller shall print or trace something to indicate why - */ -static void mark_source_rp_bad(MigrationState *s) +void migration_rp_wait(MigrationState *s) { - s->rp_state.error = true; + qemu_sem_wait(&s->rp_state.rp_sem); +} + +void migration_rp_kick(MigrationState *s) +{ + qemu_sem_post(&s->rp_state.rp_sem); } static struct rp_cmd_args { @@ -1774,7 +1773,7 @@ static struct rp_cmd_args { * and we don't need to send pages that have already been sent. */ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, - ram_addr_t start, size_t len) + ram_addr_t start, size_t len, Error **errp) { long our_host_ps = qemu_real_host_page_size(); @@ -1786,15 +1785,12 @@ static void migrate_handle_rp_req_pages(MigrationState *ms, const char* rbname, */ if (!QEMU_IS_ALIGNED(start, our_host_ps) || !QEMU_IS_ALIGNED(len, our_host_ps)) { - error_report("%s: Misaligned page request, start: " RAM_ADDR_FMT - " len: %zd", __func__, start, len); - mark_source_rp_bad(ms); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES: Misaligned page request, start:" + RAM_ADDR_FMT " len: %zd", start, len); return; } - if (ram_save_queue_pages(rbname, start, len)) { - mark_source_rp_bad(ms); - } + ram_save_queue_pages(rbname, start, len, errp); } /* Return true to retry, false to quit */ @@ -1809,26 +1805,28 @@ static bool postcopy_pause_return_path_thread(MigrationState *s) return true; } -static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name) +static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name, + Error **errp) { RAMBlock *block = qemu_ram_block_by_name(block_name); if (!block) { - error_report("%s: invalid block name '%s'", __func__, block_name); + error_setg(errp, "MIG_RP_MSG_RECV_BITMAP has invalid block name '%s'", + block_name); return -EINVAL; } /* Fetch the received bitmap and refresh the dirty bitmap */ - return ram_dirty_bitmap_reload(s, block); + return ram_dirty_bitmap_reload(s, block, errp); } -static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value) +static int migrate_handle_rp_resume_ack(MigrationState *s, + uint32_t value, Error **errp) { trace_source_return_path_thread_resume_ack(value); if (value != MIGRATION_RESUME_ACK_VALUE) { - error_report("%s: illegal resume_ack value %"PRIu32, - __func__, value); + error_setg(errp, "illegal resume_ack value %"PRIu32, value); return -1; } @@ -1887,49 +1885,47 @@ static void *source_return_path_thread(void *opaque) uint32_t tmp32, sibling_error; ram_addr_t start = 0; /* =0 to silence warning */ size_t len = 0, expected_len; + Error *err = NULL; int res; trace_source_return_path_thread_entry(); rcu_register_thread(); retry: - while (!ms->rp_state.error && !qemu_file_get_error(rp) && + while (!migrate_has_error(ms) && !qemu_file_get_error(rp) && migration_is_setup_or_active(ms->state)) { trace_source_return_path_thread_loop_top(); + header_type = qemu_get_be16(rp); header_len = qemu_get_be16(rp); if (qemu_file_get_error(rp)) { - mark_source_rp_bad(ms); goto out; } if (header_type >= MIG_RP_MSG_MAX || header_type == MIG_RP_MSG_INVALID) { - error_report("RP: Received invalid message 0x%04x length 0x%04x", - header_type, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Received invalid message 0x%04x length 0x%04x", + header_type, header_len); goto out; } if ((rp_cmd_args[header_type].len != -1 && header_len != rp_cmd_args[header_type].len) || header_len > sizeof(buf)) { - error_report("RP: Received '%s' message (0x%04x) with" - "incorrect length %d expecting %zu", - rp_cmd_args[header_type].name, header_type, header_len, - (size_t)rp_cmd_args[header_type].len); - mark_source_rp_bad(ms); + error_setg(&err, "Received '%s' message (0x%04x) with" + "incorrect length %d expecting %zu", + rp_cmd_args[header_type].name, header_type, header_len, + (size_t)rp_cmd_args[header_type].len); goto out; } /* We know we've got a valid header by this point */ res = qemu_get_buffer(rp, buf, header_len); if (res != header_len) { - error_report("RP: Failed reading data for message 0x%04x" - " read %d expected %d", - header_type, res, header_len); - mark_source_rp_bad(ms); + error_setg(&err, "Failed reading data for message 0x%04x" + " read %d expected %d", + header_type, res, header_len); goto out; } @@ -1939,8 +1935,7 @@ retry: sibling_error = ldl_be_p(buf); trace_source_return_path_thread_shut(sibling_error); if (sibling_error) { - error_report("RP: Sibling indicated error %d", sibling_error); - mark_source_rp_bad(ms); + error_setg(&err, "Sibling indicated error %d", sibling_error); } /* * We'll let the main thread deal with closing the RP @@ -1958,7 +1953,10 @@ retry: case MIG_RP_MSG_REQ_PAGES: start = ldq_be_p(buf); len = ldl_be_p(buf + 8); - migrate_handle_rp_req_pages(ms, NULL, start, len); + migrate_handle_rp_req_pages(ms, NULL, start, len, &err); + if (err) { + goto out; + } break; case MIG_RP_MSG_REQ_PAGES_ID: @@ -1973,32 +1971,32 @@ retry: expected_len += tmp32; } if (header_len != expected_len) { - error_report("RP: Req_Page_id with length %d expecting %zd", - header_len, expected_len); - mark_source_rp_bad(ms); + error_setg(&err, "Req_Page_id with length %d expecting %zd", + header_len, expected_len); + goto out; + } + migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len, + &err); + if (err) { goto out; } - migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len); break; case MIG_RP_MSG_RECV_BITMAP: if (header_len < 1) { - error_report("%s: missing block name", __func__); - mark_source_rp_bad(ms); + error_setg(&err, "MIG_RP_MSG_RECV_BITMAP missing block name"); goto out; } /* Format: len (1B) + idstr (<255B). This ends the idstr. */ buf[buf[0] + 1] = '\0'; - if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1))) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_recv_bitmap(ms, (char *)(buf + 1), &err)) { goto out; } break; case MIG_RP_MSG_RESUME_ACK: tmp32 = ldl_be_p(buf); - if (migrate_handle_rp_resume_ack(ms, tmp32)) { - mark_source_rp_bad(ms); + if (migrate_handle_rp_resume_ack(ms, tmp32, &err)) { goto out; } break; @@ -2014,6 +2012,19 @@ retry: } out: + if (err) { + /* + * Collect any error in return-path thread and report it to the + * migration state object. + */ + migrate_set_error(ms, err); + /* + * We lost ownership to Error*, clear it, prepared to capture the + * next error. + */ + err = NULL; + } + res = qemu_file_get_error(rp); if (res) { if (res && migration_in_postcopy()) { @@ -2029,13 +2040,11 @@ out: * it's reset only by us above, or when migration completes */ rp = ms->rp_state.from_dst_file; - ms->rp_state.error = false; goto retry; } } trace_source_return_path_thread_bad_end(); - mark_source_rp_bad(ms); } trace_source_return_path_thread_end(); @@ -2068,8 +2077,7 @@ static int open_return_path_on_source(MigrationState *ms, return 0; } -/* Returns 0 if the RP was ok, otherwise there was an error on the RP */ -static int await_return_path_close_on_source(MigrationState *ms) +static void await_return_path_close_on_source(MigrationState *ms) { /* * If this is a normal exit then the destination will send a SHUT and the @@ -2082,13 +2090,11 @@ static int await_return_path_close_on_source(MigrationState *ms) * waiting for the destination. */ qemu_file_shutdown(ms->rp_state.from_dst_file); - mark_source_rp_bad(ms); } trace_await_return_path_close_on_source_joining(); qemu_thread_join(&ms->rp_state.rp_thread); ms->rp_state.rp_thread_created = false; trace_await_return_path_close_on_source_close(); - return ms->rp_state.error; } static inline void @@ -2391,11 +2397,11 @@ static void migration_completion(MigrationState *s) * a SHUT command). */ if (s->rp_state.rp_thread_created) { - int rp_error; trace_migration_return_path_end_before(); - rp_error = await_return_path_close_on_source(s); - trace_migration_return_path_end_after(rp_error); - if (rp_error) { + await_return_path_close_on_source(s); + trace_migration_return_path_end_after(); + /* If return path has error, should have been set here */ + if (migrate_has_error(s)) { goto fail; } } diff --git a/migration/ram.c b/migration/ram.c index fc7fe0e6e8..814c59c17b 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1963,7 +1963,8 @@ static void migration_page_queue_free(RAMState *rs) * @start: starting address from the start of the RAMBlock * @len: length (in bytes) to send */ -int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) +int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, + Error **errp) { RAMBlock *ramblock; RAMState *rs = ram_state; @@ -1980,7 +1981,7 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) * Shouldn't happen, we can't reuse the last RAMBlock if * it's the 1st request. */ - error_report("ram_save_queue_pages no previous block"); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no previous block"); return -1; } } else { @@ -1988,16 +1989,17 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) if (!ramblock) { /* We shouldn't be asked for a non-existent RAMBlock */ - error_report("ram_save_queue_pages no block '%s'", rbname); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES has no block '%s'", rbname); return -1; } rs->last_req_rb = ramblock; } trace_ram_save_queue_pages(ramblock->idstr, start, len); if (!offset_in_ramblock(ramblock, start + len - 1)) { - error_report("%s request overrun start=" RAM_ADDR_FMT " len=" - RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, - __func__, start, len, ramblock->used_length); + error_setg(errp, "MIG_RP_MSG_REQ_PAGES request overrun, " + "start=" RAM_ADDR_FMT " len=" + RAM_ADDR_FMT " blocklen=" RAM_ADDR_FMT, + start, len, ramblock->used_length); return -1; } @@ -2029,9 +2031,9 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len) assert(len % page_size == 0); while (len) { if (ram_save_host_page_urgent(pss)) { - error_report("%s: ram_save_host_page_urgent() failed: " - "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, - __func__, ramblock->idstr, start); + error_setg(errp, "ram_save_host_page_urgent() failed: " + "ramblock=%s, start_addr=0x"RAM_ADDR_FMT, + ramblock->idstr, start); ret = -1; break; } @@ -4165,7 +4167,7 @@ static void ram_dirty_bitmap_reload_notify(MigrationState *s) * This is only used when the postcopy migration is paused but wants * to resume from a middle point. */ -int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) +int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) { int ret = -EINVAL; /* from_dst_file is always valid because we're within rp_thread */ @@ -4177,8 +4179,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) trace_ram_dirty_bitmap_reload_begin(block->idstr); if (s->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { - error_report("%s: incorrect state %s", __func__, - MigrationStatus_str(s->state)); + error_setg(errp, "Reload bitmap in incorrect state %s", + MigrationStatus_str(s->state)); return -EINVAL; } @@ -4195,9 +4197,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) /* The size of the bitmap should match with our ramblock */ if (size != local_size) { - error_report("%s: ramblock '%s' bitmap size mismatch " - "(0x%"PRIx64" != 0x%"PRIx64")", __func__, - block->idstr, size, local_size); + error_setg(errp, "ramblock '%s' bitmap size mismatch (0x%"PRIx64 + " != 0x%"PRIx64")", block->idstr, size, local_size); ret = -EINVAL; goto out; } @@ -4207,16 +4208,16 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block) ret = qemu_file_get_error(file); if (ret || size != local_size) { - error_report("%s: read bitmap failed for ramblock '%s': %d" - " (size 0x%"PRIx64", got: 0x%"PRIx64")", - __func__, block->idstr, ret, local_size, size); + error_setg(errp, "read bitmap failed for ramblock '%s': %d" + " (size 0x%"PRIx64", got: 0x%"PRIx64")", + block->idstr, ret, local_size, size); ret = -EIO; goto out; } if (end_mark != RAMBLOCK_RECV_BITMAP_ENDING) { - error_report("%s: ramblock '%s' end mark incorrect: 0x%"PRIx64, - __func__, block->idstr, end_mark); + error_setg(errp, "ramblock '%s' end mark incorrect: 0x%"PRIx64, + block->idstr, end_mark); ret = -EINVAL; goto out; } diff --git a/migration/trace-events b/migration/trace-events index 4666f19325..20cd17ffe8 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -164,7 +164,7 @@ migration_completion_postcopy_end_after_complete(void) "" migration_rate_limit_pre(int ms) "%d ms" migration_rate_limit_post(int urgent) "urgent: %d" migration_return_path_end_before(void) "" -migration_return_path_end_after(int rp_error) "%d" +migration_return_path_end_after(void) "" migration_thread_after_loop(void) "" migration_thread_file_err(void) "" migration_thread_setup_complete(void) "" From patchwork Tue Aug 29 21:42:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827498 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JTVfWYfr; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3Px3BCkz1ygP for ; Wed, 30 Aug 2023 09:23:17 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7n6-0000tm-U0; Tue, 29 Aug 2023 19:06:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U5-0006LN-Vv for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U3-0000hV-Iw for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2E6l8x7Sp00nkmEU9RfahHRwq3/Xa7iqj96y/AxDuAA=; b=JTVfWYfrRe/AtrX4EkPdGMAQyfMyXcVSJ/6HBKvMIFBR8yhymgDwyZmDTGmwnR68qEvs0G 8tTBiRg2woq0Q4g4d0bYcF7BZTWVRevsqyoogZu/bhJ6i7XmqKuVxSnv3aB0vrMeg0Dax+ 9KBzUAs1TTrnuGOPt5yBTeCcUMD5pe4= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-52-w_mIKy1vPby66Hywk-YbrA-1; Tue, 29 Aug 2023 17:42:45 -0400 X-MC-Unique: w_mIKy1vPby66Hywk-YbrA-1 Received: by mail-qt1-f197.google.com with SMTP id d75a77b69052e-411f4a7ddbdso14275451cf.0 for ; Tue, 29 Aug 2023 14:42:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345364; x=1693950164; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2E6l8x7Sp00nkmEU9RfahHRwq3/Xa7iqj96y/AxDuAA=; b=gbPpwZ+l6Exm0/5qIXPNgeatGKnleici69uZYRzh13oUUemcD8zyqlrBECr4/GCkWH 3Ik+H8F1072FzDE19jYzRM8IBI4WsjLZK4HOIUmQU58IuGwEvwLuTKT6X7OfVPamrlz8 yJWCmmqKb7Tqs/oqoKicrXynyP8BAxjh8J+mCwQzzxN3w037BBST5WeaBm22ZCEusTSF cS4B/bE0DUEgjkwgzduK1GfMv6sn2wDCS4xnE1nXkTw97BAAN+6lgCjIUZerPeTpMl6C JAsqk5fotXxJeMx7Y0go/LGsDOJQGaxFP5h5QyTmcio8ityEhMt/L+DOQPVD1L3RLoPr Y9aQ== X-Gm-Message-State: AOJu0YxLHeX74HcrOGdbIa/rPDq0En7T3EiACbVltDtVS1XWuK/z51tR WyJOzwPZyF7PrABbHE8O8tpZq/4rn08J75bp1gcAUN34uFpx1EDii7dFwdZvh1yDWEEXxM2sGxu p2TFCUGsHGZKnYqALerE0WcSYs+BtacXGF7AD7qmXOhqGh/mQe+iAHy5K27pni876OAJ+2LEf X-Received: by 2002:a05:622a:1a25:b0:412:2dd3:e0ed with SMTP id f37-20020a05622a1a2500b004122dd3e0edmr321549qtb.0.1693345364415; Tue, 29 Aug 2023 14:42:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGR/5fjheWPYmokwC6Ugo2jm44MuSXbl2mJRATJHnWR2Y29JCfk3KbrbZYjY3Bf0WNv4jCDUA== X-Received: by 2002:a05:622a:1a25:b0:412:2dd3:e0ed with SMTP id f37-20020a05622a1a2500b004122dd3e0edmr321536qtb.0.1693345364161; Tue, 29 Aug 2023 14:42:44 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:43 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 5/9] migration: Deliver return path file error to migrate state too Date: Tue, 29 Aug 2023 17:42:31 -0400 Message-ID: <20230829214235.69309-6-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org We've already did this for most of the return path thread errors, but not yet for the IO errors happened on the return path qemufile. Do that too. Remember to reset "err" always, because the ownership is not us anymore, otherwise we're prone to use-after-free later after recovered. Re-export qemu_file_get_error_obj(). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/qemu-file.h | 1 + migration/migration.c | 7 +++++++ migration/qemu-file.c | 2 +- 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 47015f5201..bc6edc5c39 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -129,6 +129,7 @@ void qemu_file_skip(QEMUFile *f, int size); void qemu_file_credit_transfer(QEMUFile *f, size_t size); int qemu_file_get_error_obj_any(QEMUFile *f1, QEMUFile *f2, Error **errp); void qemu_file_set_error_obj(QEMUFile *f, int ret, Error *err); +int qemu_file_get_error_obj(QEMUFile *f, Error **errp); void qemu_file_set_error(QEMUFile *f, int ret); int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); diff --git a/migration/migration.c b/migration/migration.c index def9d119b1..576e102319 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2027,6 +2027,13 @@ out: res = qemu_file_get_error(rp); if (res) { + /* We have forwarded any error in "err" already, reuse "error" */ + assert(err == NULL); + /* Try to deliver this file error to migration state */ + qemu_file_get_error_obj(rp, &err); + migrate_set_error(ms, err); + err = NULL; + if (res && migration_in_postcopy()) { /* * Maybe there is something we can do: it looks like a diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 19c33c9985..eea7171192 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -146,7 +146,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks) * is not 0. * */ -static int qemu_file_get_error_obj(QEMUFile *f, Error **errp) +int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { if (errp) { *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL; From patchwork Tue Aug 29 21:42:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827491 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Ama8xN7u; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3Jp5qCHz26jL for ; Wed, 30 Aug 2023 09:18:49 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7mw-0000jX-65; Tue, 29 Aug 2023 19:06:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U6-0006Lk-Vq for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U4-0000ht-EQ for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p6uAjHfWVV5l02g/VMaRfP8rGJHV6qapnjAjrMo0sFg=; b=Ama8xN7uLvHm7M/NjH6xrsSsvzEjBdwFy7vVVKWyZa4xJR3iIfzXlT/ErUyLDCuizU2ofh n4C/0wjPHT/u0b00mG0k4abipoDRFpH1Cly63LnHDcENdClwWN2dnNJQQ90YapB1fWV1ix qNNdxwDCN3L5u6LFWFOyqoDnA4Wy0cU= Received: from mail-oa1-f69.google.com (mail-oa1-f69.google.com [209.85.160.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-248-RSUVXUJ4OYiAlT1LCxzNYA-1; Tue, 29 Aug 2023 17:42:46 -0400 X-MC-Unique: RSUVXUJ4OYiAlT1LCxzNYA-1 Received: by mail-oa1-f69.google.com with SMTP id 586e51a60fabf-1c55535c088so1582310fac.0 for ; Tue, 29 Aug 2023 14:42:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345365; x=1693950165; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p6uAjHfWVV5l02g/VMaRfP8rGJHV6qapnjAjrMo0sFg=; b=XcIoKz7jBCUh09Idsv/XjPUdpTvtCVuyXXSOwqAKePsZpBjbXYywijeWS5PIuCXcOv fB2LUzqQ4eb3KXLoSM3G0oV42b1lfq/ADqg3zWb6apZcPSjEF9vjRg0blDDTYRrWsMzf R+/4nmaCHIbyIhVDT0tE/jEjIpg374lLTxHA5hF1TaZPMYRVGABebXbQtemOhsbBS1OE VIFzSeuCALL+zxKhCxbT+sd+nd82nZagm6Lneki3QVjIaeQ+MzzEFy5ahiSE7MbZXnWK M4X6QsxwOgRClzd12EhB8Pr3miwnUWw9awGz4gpFP6ZfZ2FCTJvGxflAVLTtOgRCU/E0 T2pQ== X-Gm-Message-State: AOJu0YySK8TyYE4ZmlZFbEA2l18PKD66MwHXY6jDKy3bKSSMnOA97XbP zbhD8wfszj8m0u1suCamsElJyi73IBT6o1GbXctApu5eKL94DEHgof1UDrnFHYjfAcPwqxnlOUd npcbQ6bm8Z8JECI0nHDxbDq4zMuxmwGUYXJ+dDLxDPS94EGVQZDC1qG0vs3C/XLt1f6dSPbbW X-Received: by 2002:a05:6830:459d:b0:6bc:a824:2750 with SMTP id az29-20020a056830459d00b006bca8242750mr297214otb.2.1693345365735; Tue, 29 Aug 2023 14:42:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH7A+paXKHllIsacA1/7iMHXlIvhD23pCPCRJtHI6qnDDKBOH+bI4Q3+C8blQTJuhxBO65HuQ== X-Received: by 2002:a05:6830:459d:b0:6bc:a824:2750 with SMTP id az29-20020a056830459d00b006bca8242750mr297201otb.2.1693345365337; Tue, 29 Aug 2023 14:42:45 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:44 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 6/9] qemufile: Always return a verbose error Date: Tue, 29 Aug 2023 17:42:32 -0400 Message-ID: <20230829214235.69309-7-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org There're a lot of cases where we only have an errno set in last_error but without a detailed error description. When this happens, try to generate an error contains the errno as a descriptive error. This will be helpful in cases where one relies on the Error*. E.g., migration state only caches Error* in MigrationState.error. With this, we'll display correct error messages in e.g. query-migrate when the error was only set by qemu_file_set_error(). Reviewed-by: Fabiano Rosas Signed-off-by: Peter Xu --- migration/qemu-file.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index eea7171192..3e64e900c9 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks) * * Return negative error value if there has been an error on previous * operations, return 0 if no error happened. - * Optional, it returns Error* in errp, but it may be NULL even if return value - * is not 0. * + * If errp is specified, a verbose error message will be copied over. */ int qemu_file_get_error_obj(QEMUFile *f, Error **errp) { + if (!f->last_error) { + return 0; + } + + /* There is an error */ if (errp) { - *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL; + if (f->last_error_obj) { + *errp = error_copy(f->last_error_obj); + } else { + error_setg_errno(errp, -f->last_error, "Channel error"); + } } + return f->last_error; } From patchwork Tue Aug 29 21:42:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827496 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Hf9yoNj2; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb3Nv37QCz1yhf for ; Wed, 30 Aug 2023 09:22:22 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7mt-0000hZ-AY; Tue, 29 Aug 2023 19:06:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U9-0006MF-GV for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U5-0000i2-OT for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345369; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tVwbEdq8b6KesgpVfASU2UdXgKGIo5bfRsC+mh3xplY=; b=Hf9yoNj20/weaWPSkrL+GVI3xS4LuZxMj6Uv4ybqb9Hl8WrZgoDhzNyBh7TqYgNmxe9aMI LlaFdL9yf9apitvBznk6Tp9M2CDTEDUnwSvDYWaiNm2ClxZuth8wlzxWQvGBd1FVMb9/RZ nQ1t5CpWRphhwKxkytQzQSJRiT0RSqg= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-324-scNWx1PiPWKz0LOQFaeUOQ-1; Tue, 29 Aug 2023 17:42:47 -0400 X-MC-Unique: scNWx1PiPWKz0LOQFaeUOQ-1 Received: by mail-qk1-f197.google.com with SMTP id af79cd13be357-76f191e26f5so50537685a.0 for ; Tue, 29 Aug 2023 14:42:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345367; x=1693950167; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tVwbEdq8b6KesgpVfASU2UdXgKGIo5bfRsC+mh3xplY=; b=elBX9kp2VTJ40Ol9KUKb9cHMfQ7YODv9MUniCiWShQuCHzdhsEPUoJoRKNyArHJB2c /xNNL83DgU/kamXdRI7HDlxFOPoTK4QZ2JK2PgL6EYjrohmhJsDekx8yFGC8CuFtDuwU Tie8wIeQ2cX3NrQVyyktLZ5rs6RrK1iHQAe7iWMRN4UYCutr0zfiXyCoMaSQYT1J8xFL GlCO3MsAma48OhS3rg+/xJ2hxIplS/1D6b20579QGdY5uHHg83HBOHXBdbflyu4DN8Ix U/jYT761LyyH1VO2TSkS/lO34bzYr9mHtSgP4pjR56lpMgyYSdVii9T6nuE0Pf8utWho pvHw== X-Gm-Message-State: AOJu0YwnLQSdJPgM8/3I572ucXMssXeAmHQm5Y9p/3Hv0JbzKCmX1t+f TV2/w8ivISZ8jhU4EL9QFDiDZDyEui1uBqXxJauKDCYnLdUFF3iK2X3TH7BqJrdkpnmzwPT2E45 j9XtbNNGLfFhMzc4uEJQi3a+iX52FVJqzoFP3QQwBri3f4OL/nXbm40/z3QBLFgq48BTHcAaS X-Received: by 2002:a05:622a:1a20:b0:412:1c5f:4781 with SMTP id f32-20020a05622a1a2000b004121c5f4781mr245381qtb.4.1693345366877; Tue, 29 Aug 2023 14:42:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHGhApLw/OiKZI2tSWJGC4HyuIW0836e2eSghK5mQCbJnorE8D+wMv0wvRyevjLGCSgJhT/mg== X-Received: by 2002:a05:622a:1a20:b0:412:1c5f:4781 with SMTP id f32-20020a05622a1a2000b004121c5f4781mr245360qtb.4.1693345366504; Tue, 29 Aug 2023 14:42:46 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:45 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 7/9] migration: Remember num of ramblocks to sync during recovery Date: Tue, 29 Aug 2023 17:42:33 -0400 Message-ID: <20230829214235.69309-8-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Instead of only relying on the count of rp_sem, make the counter be part of RAMState so it can be used in both threads to synchronize on the process. rp_sem will be further reused as a way to kick the main thread, e.g., on recovery failures. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/ram.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 814c59c17b..a9541c60b4 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -394,6 +394,14 @@ struct RAMState { /* Queue of outstanding page requests from the destination */ QemuMutex src_page_req_mutex; QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests; + + /* + * This is only used when postcopy is in recovery phase, to communicate + * between the migration thread and the return path thread on dirty + * bitmap synchronizations. This field is unused in other stages of + * RAM migration. + */ + unsigned int postcopy_bmap_sync_requested; }; typedef struct RAMState RAMState; @@ -4135,20 +4143,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) { RAMBlock *block; QEMUFile *file = s->to_dst_file; - int ramblock_count = 0; trace_ram_dirty_bitmap_sync_start(); + qatomic_set(&rs->postcopy_bmap_sync_requested, 0); RAMBLOCK_FOREACH_NOT_IGNORED(block) { qemu_savevm_send_recv_bitmap(file, block->idstr); trace_ram_dirty_bitmap_request(block->idstr); - ramblock_count++; + qatomic_inc(&rs->postcopy_bmap_sync_requested); } trace_ram_dirty_bitmap_sync_wait(); /* Wait until all the ramblocks' dirty bitmap synced */ - while (ramblock_count--) { + while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { qemu_sem_wait(&s->rp_state.rp_sem); } @@ -4175,6 +4183,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS; uint64_t local_size = DIV_ROUND_UP(nbits, 8); uint64_t size, end_mark; + RAMState *rs = ram_state; trace_ram_dirty_bitmap_reload_begin(block->idstr); @@ -4240,6 +4249,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). */ trace_ram_dirty_bitmap_reload_complete(block->idstr); + qatomic_dec(&rs->postcopy_bmap_sync_requested); + /* * We succeeded to sync bitmap for current ramblock. If this is * the last one to sync, we need to notify the main send thread. From patchwork Tue Aug 29 21:42:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827469 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Bh1XEhhS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb36C2wmrz1yfX for ; Wed, 30 Aug 2023 09:09:39 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7nu-0001qw-Vj; Tue, 29 Aug 2023 19:07:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U9-0006ME-Fo for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U7-0000iE-1m for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345370; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hrlS5gw7406y1NAmCuKbevjOhQ68l6+fK/m+TPY+7RE=; b=Bh1XEhhScdHSqrWsmhOZ8MLyjRvYCLbR3cWWLR58OGVPTpE26aYU4aRRfcPekldu70GF2Q AnJB2MQl9qLG912BFxbkQM0dHAzFoAiRA2zbuFQz0Rv3fKHGgtmUUlJf2cmy1ewuqc+JGF zugk5eBHzNmnknjYtJN44Hpa9zQTx/Q= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-317-kZaT113-MACX8C7-H3bqSA-1; Tue, 29 Aug 2023 17:42:48 -0400 X-MC-Unique: kZaT113-MACX8C7-H3bqSA-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-411f4a7ddbdso14275511cf.0 for ; Tue, 29 Aug 2023 14:42:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345368; x=1693950168; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hrlS5gw7406y1NAmCuKbevjOhQ68l6+fK/m+TPY+7RE=; b=fFGb9KPGxLkQkmhT2L1Le/1N12shx1Hr1n5Qd3mcD2W0qtaHSm+YanaYAhGc+nGEoO MYesVD/jSqTW3aQIPZXoxkOZhCWn5Df6FaO6Hsy3LiOAFFdGtFvmCbRYsulhfQ6rrPM5 dKXJW8L6kX031qTTqgLKsBAnUjSOf4ASNDGCE6TPBtTR+pOVAqHHBrnr5Vl6hQYtyK4d 5UsUupPzEJrT+eXrXKTbGwl09cWFYKAnZHZKHes5Ngxdu2VoG2rwpSnyxKMfmCCcyrWU d0VqSOIcvhP1lwew2teZlVn3w3TfVxRytrSS/l+eaOWuSYA/VTBj9OrOEgbIWZO0YkKk iqLQ== X-Gm-Message-State: AOJu0YzJX+7vdDKAGZMclz9Dxa0QCksBE8Zy86dOjRdq0ju6OyZ1HJVR NnHK8bD/QHymuYVy7FE/LpdHBcSqj7O8e/ewbgz/II0CdvXxbjVA/elvDvOZSoih3j7YhXS3vyG wavija4hmiBeSZXY8fUlo4RpAzDstv9MlpBs0mgYKFHghKVSuarTtiheOVSnPUIeTr7rH0qr6 X-Received: by 2002:a05:622a:1801:b0:40c:8ba5:33e0 with SMTP id t1-20020a05622a180100b0040c8ba533e0mr276299qtc.3.1693345367954; Tue, 29 Aug 2023 14:42:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFkuJ1ibV1nRksySKJp03jFYiDIosU7y7kVYn2+Lu503JIrQehtd+2uC/asbIizfknt9vTpPw== X-Received: by 2002:a05:622a:1801:b0:40c:8ba5:33e0 with SMTP id t1-20020a05622a180100b0040c8ba533e0mr276285qtc.3.1693345367592; Tue, 29 Aug 2023 14:42:47 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:47 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela Subject: [PATCH 8/9] migration: Add migration_rp_wait|kick() Date: Tue, 29 Aug 2023 17:42:34 -0400 Message-ID: <20230829214235.69309-9-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org It's just a simple wrapper for rp_sem on either wait() or kick(), make it even clearer on how it is used. Prepared to be used even for other things. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/migration.h | 15 +++++++++++++++ migration/migration.c | 4 ++-- migration/ram.c | 16 +++++++--------- 3 files changed, 24 insertions(+), 11 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index a5c95e4d43..b6de78dbdd 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -304,6 +304,12 @@ struct MigrationState { * be cleared in the rp_thread! */ bool rp_thread_created; + /* + * Used to synchonize between migration main thread and return path + * thread. The migration thread can wait() on this sem, while + * other threads (e.g., return path thread) can kick it using a + * post(). + */ QemuSemaphore rp_sem; /* * We post to this when we got one PONG from dest. So far it's an @@ -516,4 +522,13 @@ void populate_vfio_info(MigrationInfo *info); void reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); +/* Migration thread waiting for return path thread. */ +void migration_rp_wait(MigrationState *s); +/* + * Kick the migration thread waiting for return path messages. NOTE: the + * name can be slightly confusing (when read as "kick the rp thread"), just + * to remember the target is always the migration thread. + */ +void migration_rp_kick(MigrationState *s); + #endif diff --git a/migration/migration.c b/migration/migration.c index 576e102319..3a5f324781 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1835,7 +1835,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, MIGRATION_STATUS_POSTCOPY_ACTIVE); /* Notify send thread that time to continue send pages */ - qemu_sem_post(&s->rp_state.rp_sem); + migration_rp_kick(s); return 0; } @@ -2503,7 +2503,7 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index a9541c60b4..b5f6d65d84 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4157,7 +4157,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - qemu_sem_wait(&s->rp_state.rp_sem); + migration_rp_wait(s); } trace_ram_dirty_bitmap_sync_complete(); @@ -4165,11 +4165,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) return 0; } -static void ram_dirty_bitmap_reload_notify(MigrationState *s) -{ - qemu_sem_post(&s->rp_state.rp_sem); -} - /* * Read the received bitmap, revert it as the initial dirty bitmap. * This is only used when the postcopy migration is paused but wants @@ -4252,10 +4247,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block, Error **errp) qatomic_dec(&rs->postcopy_bmap_sync_requested); /* - * We succeeded to sync bitmap for current ramblock. If this is - * the last one to sync, we need to notify the main send thread. + * We succeeded to sync bitmap for current ramblock. Always kick the + * migration thread to check whether all requested bitmaps are + * reloaded. NOTE: it's racy to only kick when requested==0, because + * we don't know whether the migration thread may still be increasing + * it. */ - ram_dirty_bitmap_reload_notify(s); + migration_rp_kick(s); ret = 0; out: From patchwork Tue Aug 29 21:42:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 1827471 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=eexQLTgt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=nongnu.org (client-ip=209.51.188.17; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=patchwork.ozlabs.org) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Rb36C6GFFz1ygP for ; Wed, 30 Aug 2023 09:09:39 +1000 (AEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qb7nS-00012S-SY; Tue, 29 Aug 2023 19:06:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6UC-0006Nb-CL for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qb6U9-0000iW-BX for qemu-devel@nongnu.org; Tue, 29 Aug 2023 17:42:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1693345371; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zteSUqlE0a6VOFukitMAWaR/lKfpKDlrLDgCs8ahJek=; b=eexQLTgtt/Ebb59dr+vOic8xOOdW8SfXIBQB1QFcjmC0dc3IGQLvlFw/uqXzSPVc5iEB7X PkP5QDoDl0DW7Jq6MXxXgdh6kC8OkUK9MzxDeE/3/tncC2sLwMv1F8NoX7bGZWeiHRms7k 3omgQPhJsGv/HVbS2c4LLZAG78vq1KU= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-541-INFzGFbWPVii9TVRj4ag3w-1; Tue, 29 Aug 2023 17:42:49 -0400 X-MC-Unique: INFzGFbWPVii9TVRj4ag3w-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-4122babcb87so7518421cf.1 for ; Tue, 29 Aug 2023 14:42:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693345369; x=1693950169; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zteSUqlE0a6VOFukitMAWaR/lKfpKDlrLDgCs8ahJek=; b=Hdc+l7AJVcpaXaX42hwwwxvc2cUT3EcmvNgrTsYLC6geLATPOng1wUhwmFhLnpO6aZ nebVCfR/3MFI8/43BaCSMX8BjT1hUtcsVmSEXg5h7S0EoucDrnCtdOFCOoMw0tqQwwkF FDKRnlmZocCTQIh7OQdqRsZelTSVtvfyDLIac3hwBMJhEq1bNftjZvoYe4eHlmW6gp8g DFXvuKdh0S5z9z/xuhH/lYVD3l1DU5q4m9oYv/Ndidv/UpARpTWqZXA8UhHdxCso6AmG trZmGEOKxYoBFq5ELOj3VL2oxRIBjWu7kjEq7q0+iYavhdoJtiSepItHusOTDYrzIwnP nukA== X-Gm-Message-State: AOJu0Yx79mBg79WmZjrRtu9AW/BZ38v1VGMiz7SK9zkSN5TpxjlIYi8K 7jMWsV4jjYmrAHEof540frwYBGMcLfTTUm4HKUme/MnA+xxUjDYo9SrESPq8i0PbeABzwFJyhS0 K9VjCCLuExGX5bc2stdBWJiJuzVpQDIhQSbni37sb+yr6W6XajX4FXK0R3Q4IlopHRQMcQCw/ X-Received: by 2002:a05:622a:1aa7:b0:410:88c6:cf22 with SMTP id s39-20020a05622a1aa700b0041088c6cf22mr277769qtc.3.1693345369177; Tue, 29 Aug 2023 14:42:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEA0ftwYLI4iRLFR4mGW4VB7ycMR7YKIw6u4g6lJirVZ2y4Q4J+lxYIrougk2AwZ8XHdu3iEQ== X-Received: by 2002:a05:622a:1aa7:b0:410:88c6:cf22 with SMTP id s39-20020a05622a1aa700b0041088c6cf22mr277741qtc.3.1693345368684; Tue, 29 Aug 2023 14:42:48 -0700 (PDT) Received: from x1n.redhat.com (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id b18-20020ac86bd2000000b0040f8ac751a5sm3260343qtt.96.2023.08.29.14.42.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Aug 2023 14:42:48 -0700 (PDT) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , peterx@redhat.com, Juan Quintela , Xiaohui Li Subject: [PATCH 9/9] migration/postcopy: Allow network to fail even during recovery Date: Tue, 29 Aug 2023 17:42:35 -0400 Message-ID: <20230829214235.69309-10-peterx@redhat.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829214235.69309-1-peterx@redhat.com> References: <20230829214235.69309-1-peterx@redhat.com> MIME-Version: 1.0 Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Signed-off-by: Peter Xu Signed-off-by: Peter Xu --- migration/migration.h | 8 ++++-- migration/migration.c | 64 +++++++++++++++++++++++++++++++++++++++---- migration/ram.c | 4 ++- 3 files changed, 68 insertions(+), 8 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index b6de78dbdd..e86d9d098a 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -482,6 +482,7 @@ void migrate_init(MigrationState *s); bool migration_is_blocked(Error **errp); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); +bool migration_postcopy_is_alive(void); MigrationState *migrate_get_current(void); uint64_t ram_get_total_transferred_pages(void); @@ -522,8 +523,11 @@ void populate_vfio_info(MigrationInfo *info); void reset_vfio_bytes_transferred(void); void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page); -/* Migration thread waiting for return path thread. */ -void migration_rp_wait(MigrationState *s); +/* + * Migration thread waiting for return path thread. Return non-zero if an + * error is detected. + */ +int migration_rp_wait(MigrationState *s); /* * Kick the migration thread waiting for return path messages. NOTE: the * name can be slightly confusing (when read as "kick the rp thread"), just diff --git a/migration/migration.c b/migration/migration.c index 3a5f324781..85462ff1d7 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1349,6 +1349,19 @@ bool migration_in_postcopy(void) } } +bool migration_postcopy_is_alive(void) +{ + MigrationState *s = migrate_get_current(); + + switch (s->state) { + case MIGRATION_STATUS_POSTCOPY_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_RECOVER: + return true; + default: + return false; + } +} + bool migration_in_postcopy_after_devices(MigrationState *s) { return migration_in_postcopy() && s->postcopy_after_devices; @@ -1540,18 +1553,31 @@ void qmp_migrate_pause(Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); int ret; - if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive()) { /* Source side, during postcopy */ + Error *error = NULL; + + /* Tell the core migration that we're pausing */ + error_setg(&error, "Postcopy migration is paused by the user"); + migrate_set_error(ms, error); + qemu_mutex_lock(&ms->qemu_file_lock); ret = qemu_file_shutdown(ms->to_dst_file); qemu_mutex_unlock(&ms->qemu_file_lock); if (ret) { error_setg(errp, "Failed to pause source migration"); } + + /* + * Kick the migration thread out of any waiting windows (on behalf + * of the rp thread). + */ + migration_rp_kick(ms); + return; } - if (mis->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + if (migration_postcopy_is_alive()) { ret = qemu_file_shutdown(mis->from_src_file); if (ret) { error_setg(errp, "Failed to pause destination migration"); @@ -1560,7 +1586,7 @@ void qmp_migrate_pause(Error **errp) } error_setg(errp, "migrate-pause is currently only supported " - "during postcopy-active state"); + "during postcopy-active or postcopy-recover state"); } bool migration_is_blocked(Error **errp) @@ -1742,9 +1768,21 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp) qemu_sem_post(&s->pause_sem); } -void migration_rp_wait(MigrationState *s) +int migration_rp_wait(MigrationState *s) { + /* If migration has failure already, ignore the wait */ + if (migrate_has_error(s)) { + return -1; + } + qemu_sem_wait(&s->rp_state.rp_sem); + + /* After wait, double check that there's no failure */ + if (migrate_has_error(s)) { + return -1; + } + + return 0; } void migration_rp_kick(MigrationState *s) @@ -1798,6 +1836,20 @@ static bool postcopy_pause_return_path_thread(MigrationState *s) { trace_postcopy_pause_return_path(); + if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { + /* + * this will be extremely unlikely: that we got yet another network + * issue during recovering of the 1st network failure.. during this + * period the main migration thread can be waiting on rp_sem for + * this thread to sync with the other side. + * + * When this happens, explicitly kick the migration thread out of + * RECOVER stage and back to PAUSED, so the admin can try + * everything again. + */ + migration_rp_kick(s); + } + qemu_sem_wait(&s->postcopy_pause_rp_sem); trace_postcopy_pause_return_path_continued(); @@ -2503,7 +2555,9 @@ static int postcopy_resume_handshake(MigrationState *s) qemu_savevm_send_postcopy_resume(s->to_dst_file); while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { diff --git a/migration/ram.c b/migration/ram.c index b5f6d65d84..199fd3e117 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4157,7 +4157,9 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs) /* Wait until all the ramblocks' dirty bitmap synced */ while (qatomic_read(&rs->postcopy_bmap_sync_requested)) { - migration_rp_wait(s); + if (migration_rp_wait(s)) { + return -1; + } } trace_ram_dirty_bitmap_sync_complete();