From patchwork Tue Mar 26 17:27:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1916298 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4V3xYj2D0Hz1yWr for ; Wed, 27 Mar 2024 04:26:49 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 67EAF60B18; Tue, 26 Mar 2024 17:26:47 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 8bG29FGq5VfR; Tue, 26 Mar 2024 17:26:45 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org C382C60748 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id C382C60748; Tue, 26 Mar 2024 17:26:45 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 92056C0DCF; Tue, 26 Mar 2024 17:26:45 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 03A00C0DD0 for ; Tue, 26 Mar 2024 17:26:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id D2C8440849 for ; Tue, 26 Mar 2024 17:26:42 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id OzTrEDZgWWbn for ; Tue, 26 Mar 2024 17:26:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.198; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 3F43140854 Authentication-Results: smtp4.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 3F43140854 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp4.osuosl.org (Postfix) with ESMTPS id 3F43140854 for ; Tue, 26 Mar 2024 17:26:40 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 99912C000A; Tue, 26 Mar 2024 17:26:38 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Tue, 26 Mar 2024 18:27:09 +0100 Message-ID: <20240326172717.1454071-2-i.maximets@ovn.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240326172717.1454071-1-i.maximets@ovn.org> References: <20240326172717.1454071-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH v2 1/5] ovsdb: raft: Avoid transferring leadership to unavailable servers. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Current implementation of the leadership transfer just shoots the leadership in the general direction of the first stable server in the configuration. It doesn't check if the server was active recently or even that the connection is established. This may result in sending leadership to a disconnected or otherwise unavailable server. Such behavior should not cause log truncation or any other correctness issues because the destination server would have all the append requests queued up or the connection will be dropped by the leader. In a worst case we will have a leader-less cluster until the next election timer fires up. Other servers will notice the absence of the leader and will trigger a new leader election normally. However, the potential wait for the election timer is not good as real-world setups may have high values configured. Fix that by trying to transfer to servers that we know have applied the most changes, i.e., have the highest 'match_index'. Such servers replied to the most recent append requests, so they have highest chances to be healthy. Choosing the random starting point in the list of such servers so we don't transfer to the same server every single time. This slightly improves load distribution, but, most importantly, increases robustness of our test suite, making it cover more cases. Also checking that the message was actually sent without immediate failure. If we fail to transfer to any server with the highest index, try to just transfer to any other server that is not behind majority and then just any other server that is connected. We did actually send them all the updates (if the connection is open), they just didn't reply yet for one reason or another. It should be better than leaving the cluster without a leader. Note that there is always a chance that transfer will fail, since we're not waiting for it to be acknowledged (and must not wait). In this case, normal election will be triggered after the election timer fires up. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Signed-off-by: Ilya Maximets Acked-by: Han Zhou Acked-by: Felix Huettner --- CC: Felix Huettner ovsdb/raft.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index f463afcb3..b171da345 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1261,10 +1261,30 @@ raft_transfer_leadership(struct raft *raft, const char *reason) return; } - struct raft_server *s; + struct raft_server **servers, *s; + uint64_t threshold = 0; + size_t n = 0, start, i; + + servers = xmalloc(hmap_count(&raft->servers) * sizeof *servers); + HMAP_FOR_EACH (s, hmap_node, &raft->servers) { - if (!uuid_equals(&raft->sid, &s->sid) - && s->phase == RAFT_PHASE_STABLE) { + if (uuid_equals(&raft->sid, &s->sid) + || s->phase != RAFT_PHASE_STABLE) { + continue; + } + if (s->match_index > threshold) { + threshold = s->match_index; + } + servers[n++] = s; + } + + start = n ? random_range(n) : 0; + +retry: + for (i = 0; i < n; i++) { + s = servers[(start + i) % n]; + + if (s->match_index >= threshold) { struct raft_conn *conn = raft_find_conn_by_sid(raft, &s->sid); if (!conn) { continue; @@ -1280,7 +1300,10 @@ raft_transfer_leadership(struct raft *raft, const char *reason) .term = raft->term, } }; - raft_send_to_conn(raft, &rpc, conn); + + if (!raft_send_to_conn(raft, &rpc, conn)) { + continue; + } raft_record_note(raft, "transfer leadership", "transferring leadership to %s because %s", @@ -1288,6 +1311,23 @@ raft_transfer_leadership(struct raft *raft, const char *reason) break; } } + + if (n && i == n && threshold) { + if (threshold > raft->commit_index) { + /* Failed to transfer to servers with the highest 'match_index'. + * Try other servers that are not behind the majority. */ + threshold = raft->commit_index; + } else { + /* Try any other server. It is safe, because they either have all + * the append requests queued up for them before the leadership + * transfer message or their connection is broken and we will not + * transfer anyway. */ + threshold = 0; + } + goto retry; + } + + free(servers); } /* Send a RemoveServerRequest to the rest of the servers in the cluster.