From patchwork Fri Mar 15 20:14:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912693 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFqw5yKHz1yWn for ; Sat, 16 Mar 2024 07:15:56 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id D12334181D; Fri, 15 Mar 2024 20:15:51 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KBVVhNcpSBCZ; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B8589417C3 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id B8589417C3; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0BEDBC0072; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4C46FC0037 for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 3BFB080C65 for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xDwwyLvuMzu7 for ; Fri, 15 Mar 2024 20:15:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.198; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org ED46D820F8 Authentication-Results: smtp1.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org ED46D820F8 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp1.osuosl.org (Postfix) with ESMTPS id ED46D820F8 for ; Fri, 15 Mar 2024 20:15:41 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 803CDC0005; Fri, 15 Mar 2024 20:15:39 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:49 +0100 Message-ID: <20240315201614.236523-2-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 1/5] ovsdb: raft: Randomize leadership transfer. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Each cluster member typically always transfers leadership to the same other member, which is the first in their list of servers. This may result in two servers in a 3-node cluster to transfer leadership to each other and never to the third one. Randomizing the selection to make the load more evenly distributed. This also makes cluster failure tests cover more scenarios as servers will transfer leadership to servers they didn't before. This is important especially for cluster joining tests. Ideally, we would transfer to a random server with a highest apply index, but not trying to implement this for now. Signed-off-by: Ilya Maximets --- ovsdb/raft.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index f463afcb3..25f462431 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -1261,8 +1261,12 @@ raft_transfer_leadership(struct raft *raft, const char *reason) return; } + size_t n = hmap_count(&raft->servers) * 3; struct raft_server *s; - HMAP_FOR_EACH (s, hmap_node, &raft->servers) { + + while (n--) { + s = CONTAINER_OF(hmap_random_node(&raft->servers), + struct raft_server, hmap_node); if (!uuid_equals(&raft->sid, &s->sid) && s->phase == RAFT_PHASE_STABLE) { struct raft_conn *conn = raft_find_conn_by_sid(raft, &s->sid); From patchwork Fri Mar 15 20:14:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912695 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFr02DTKz1yWn for ; Sat, 16 Mar 2024 07:16:00 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id A8EFA8233F; Fri, 15 Mar 2024 20:15:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W80M10oyG2b4; Fri, 15 Mar 2024 20:15:51 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D33548233E Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id D33548233E; Fri, 15 Mar 2024 20:15:48 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 35776C0DCF; Fri, 15 Mar 2024 20:15:48 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5FBA4C0072 for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 42E936064A for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id twjPIrK1Yx2e for ; Fri, 15 Mar 2024 20:15:43 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.198; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 7DCB5605E7 Authentication-Results: smtp3.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 7DCB5605E7 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp3.osuosl.org (Postfix) with ESMTPS id 7DCB5605E7 for ; Fri, 15 Mar 2024 20:15:43 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 3AAE7C0002; Fri, 15 Mar 2024 20:15:41 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:50 +0100 Message-ID: <20240315201614.236523-3-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 2/5] ovsdb: raft: Fix time intervals for multitasking while joining. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" While joining, ovsdb-server may not wake up for a duration of a join timer, which is 1 second and is by default 3x larger than a heartbeat timer. This is causing unnecessary warnings from the cooperative multitasking module that thinks that we missed the heartbeat time by a lot. Use join timer (1000) instead while joining. Fixes: d4a15647b917 ("ovsdb: raft: Enable cooperative multitasking.") Signed-off-by: Ilya Maximets Acked-by: Han Zhou --- CC: Frode Nordahl ovsdb/raft.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 25f462431..57e27bf73 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -2126,10 +2126,11 @@ raft_run(struct raft *raft) raft_reset_ping_timer(raft); } + uint64_t interval = raft->joining + ? 1000 : RAFT_TIMER_THRESHOLD(raft->election_timer); cooperative_multitasking_set( &raft_run_cb, (void *) raft, time_msec(), - RAFT_TIMER_THRESHOLD(raft->election_timer) - + RAFT_TIMER_THRESHOLD(raft->election_timer) / 10, "raft_run"); + interval + interval / 10, "raft_run"); /* Do this only at the end; if we did it as soon as we set raft->left or * raft->failed in handling the RemoveServerReply, then it could easily From patchwork Fri Mar 15 20:14:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912697 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFr335gWz23s6 for ; Sat, 16 Mar 2024 07:16:03 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 0268941979; Fri, 15 Mar 2024 20:16:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id At6yC0JPSqxS; Fri, 15 Mar 2024 20:15:55 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 2B5834197A Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 2B5834197A; Fri, 15 Mar 2024 20:15:52 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 00E9FC0072; Fri, 15 Mar 2024 20:15:52 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0F44CC008E for ; Fri, 15 Mar 2024 20:15:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 0BF5C4181D for ; Fri, 15 Mar 2024 20:15:51 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vlXzdQ9qC-RR for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2001:4b98:dc4:8::226; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 68DF44083D Authentication-Results: smtp4.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 68DF44083D Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [IPv6:2001:4b98:dc4:8::226]) by smtp4.osuosl.org (Postfix) with ESMTPS id 68DF44083D for ; Fri, 15 Mar 2024 20:15:45 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 3DBAAC0003; Fri, 15 Mar 2024 20:15:43 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:51 +0100 Message-ID: <20240315201614.236523-4-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 3/5] ovsdb: raft: Fix permanent joining state on a cluster member. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Consider the following chain of events: 1. Have a cluster with 2 members - A and B. A is a leader. 2. C connects to A, sends a request to join the cluster. 3. A catches up C, creates an update for the 'servers' list and sends it to B and C to apply. This entry is not committed yet. 4. Before B or C can reply, A looses leadership for some reason. 5. A sends a joining failure message to C, C remains in joining state. 5. Both B and C have the new version of 'servers', so they recognize each other as valid cluster members. 6. B initiates a vote, C (or A) replies and B becomes a new leader. 7. B has a new list of servers. B commits it. C becomes a committed cluster member. 8. A and C receive heartbeats with a new commit index and C is now a committed cluster member for all A, B and C. However, at the end of this process, C is still in joining state as it never received a successful reply for a join request, and C is still in a COMMITTING phase for A. So, C skips some parts of the RAFT life cycle and A will refuse to transfer leadership to C if something happens in the future. More interestingly, B can actually transfer leadership to C and vote for it. A will vote for it just fine as well. After that, C becomes a new cluster leader while still in joining state. In this state C will not commit any changes. So, we have seemingly stable cluster that doesn't commit any changes! E.g.: s3 Address: unix:s3.raft Status: joining cluster Remotes for joining: unix:s3.raft unix:s2.raft unix:s1.raft Role: leader Term: 4 Leader: self Vote: self Last Election started 30095 ms ago, reason: leadership_transfer Last Election won: 30093 ms ago Election timer: 1000 Log: [2, 7] Entries not yet committed: 2 Entries not yet applied: 6 Connections: ->s1 ->s2 <-s1 <-s2 Disconnections: 0 Servers: s3 (60cf at unix:s3.raft) (self) next_index=7 match_index=6 s2 (46aa at unix:s2.raft) next_index=7 match_index=6 last msg 58 ms ago s1 (28f7 at unix:s1.raft) next_index=7 match_index=6 last msg 59 ms ago Fix the first scenario by examining server changes in committed log entries. This way server A can transition C to a STABLE phase and server C can find itself in the committed list of servers and move out from a joining state. This is similar to completing commands without receiving an explicit reply or after the role change from leader to follower. The second scenario with a leader in a joining state can be fixed when the joining server becomes leader. New leader's log is getting committed automatically and all servers transition into STABLE phase for it, but it should also move on from a joining state, since it leads the cluster now. It is also possible that B transfers leadership to C before the list of servers is marked as committed on other servers. In this case C will commit it's own addition to the cluster configuration. The added test usually triggers both scenarios, but it will trigger at least one of them. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Signed-off-by: Ilya Maximets Acked-by: Han Zhou --- ovsdb/raft.c | 44 ++++++++++++++++++++++++++++++++++- tests/ovsdb-cluster.at | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 96 insertions(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 57e27bf73..237d7ebf5 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -385,6 +385,7 @@ static void raft_get_servers_from_log(struct raft *, enum vlog_level); static void raft_get_election_timer_from_log(struct raft *); static bool raft_handle_write_error(struct raft *, struct ovsdb_error *); +static bool raft_has_uncommitted_configuration(const struct raft *); static void raft_run_reconfigure(struct raft *); @@ -2810,6 +2811,18 @@ raft_become_leader(struct raft *raft) raft_reset_election_timer(raft); raft_reset_ping_timer(raft); + if (raft->joining) { + /* It is possible that the server committing this one to the list of + * servers lost leadership before the entry is committed but after + * it was already replicated to majority of servers. In this case + * other servers will recognize this one as a valid cluster member + * and may transfer leadership to it and vote for it. This way + * we're becoming a cluster leader without receiving reply for a + * join request and will commit addition of this server ourselves. */ + VLOG_INFO_RL(&rl, "elected as leader while joining"); + raft->joining = false; + } + struct raft_server *s; HMAP_FOR_EACH (s, hmap_node, &raft->servers) { raft_server_init_leader(raft, s); @@ -2968,12 +2981,12 @@ raft_update_commit_index(struct raft *raft, uint64_t new_commit_index) } while (raft->commit_index < new_commit_index) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); uint64_t index = ++raft->commit_index; const struct raft_entry *e = raft_get_entry(raft, index); if (raft_entry_has_data(e)) { struct raft_command *cmd = raft_find_command_by_eid(raft, &e->eid); - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); if (cmd) { if (!cmd->index && raft->role == RAFT_LEADER) { @@ -3017,6 +3030,35 @@ raft_update_commit_index(struct raft *raft, uint64_t new_commit_index) * reallocate raft->entries, which would invalidate 'e', so * this case must be last, after the one for 'e->data'. */ raft_run_reconfigure(raft); + } else if (e->servers && !raft_has_uncommitted_configuration(raft)) { + struct ovsdb_error *error; + struct raft_server *s; + struct hmap servers; + + error = raft_servers_from_json(e->servers, &servers); + ovs_assert(!error); + HMAP_FOR_EACH (s, hmap_node, &servers) { + struct raft_server *server = raft_find_server(raft, &s->sid); + + if (server && server->phase == RAFT_PHASE_COMMITTING) { + /* This server lost leadership while committing + * seever 's', but it was committed later by a + * new leader. */ + server->phase = RAFT_PHASE_STABLE; + } + + if (raft->joining && uuid_equals(&s->sid, &raft->sid)) { + /* Leadership change happened before previous leader + * could commit the change of a servers list, but it + * was replicated and a new leader committed it. */ + VLOG_INFO_RL(&rl, + "added to configuration without reply " + "(eid: "UUID_FMT", commit index: %"PRIu64")", + UUID_ARGS(&e->eid), index); + raft->joining = false; + } + } + raft_servers_destroy(&servers); } } diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 481afc08b..482e4e02d 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -473,6 +473,59 @@ done AT_CLEANUP +AT_SETUP([OVSDB cluster - leadership change after replication while joining]) +AT_KEYWORDS([ovsdb server negative unix cluster join]) + +n=5 +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db dnl + $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) +cid=$(ovsdb-tool db-cid s1.db) +schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema) +for i in $(seq 2 $n); do + AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) +done + +on_exit 'kill $(cat *.pid)' +on_exit " + for i in \$(ls $(pwd)/s[[0-$n]]); do + ovs-appctl --timeout 1 -t \$i cluster/status $schema_name; + done +" + +dnl Starting servers one by one asking all exisitng servers to transfer +dnl leadership after append reply forcing the joining server to try another +dnl one that will also transfer leadership. Since transfer is happening +dnl after the servers update is replicated to other servers, one of the +dnl other servers will actually commit it. It may be a new leader from +dnl one of the old members or the new joining server itself. +for i in $(seq $n); do + dnl Make sure that all already started servers joined the cluster. + for j in $(seq $((i - 1)) ); do + AT_CHECK([ovsdb_client_wait unix:s$j.ovsdb $schema_name connected]) + done + for j in $(seq $((i - 1)) ); do + OVS_WAIT_UNTIL([ovs-appctl -t "$(pwd)"/s$j \ + cluster/failure-test \ + transfer-leadership-after-sending-append-request \ + | grep -q "engaged"]) + done + + AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off \ + --detach --no-chdir --log-file=s$i.log \ + --pidfile=s$i.pid --unixctl=s$i \ + --remote=punix:s$i.ovsdb s$i.db]) +done + +dnl Make sure that all servers joined the cluster. +for i in $(seq $n); do + AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) +done + +for i in $(seq $n); do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid]) +done + +AT_CLEANUP OVS_START_SHELL_HELPERS From patchwork Fri Mar 15 20:14:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912696 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFr30SsTz1yWn for ; Sat, 16 Mar 2024 07:16:03 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 12B9F60F7D; Fri, 15 Mar 2024 20:16:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X2Wfmk7qzhjR; Fri, 15 Mar 2024 20:15:56 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 544D660FA5 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 544D660FA5; Fri, 15 Mar 2024 20:15:50 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 30BF7C0072; Fri, 15 Mar 2024 20:15:50 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7DA7FC0DD2 for ; Fri, 15 Mar 2024 20:15:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 31809417E4 for ; Fri, 15 Mar 2024 20:15:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JlHpGmIx3Swm for ; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2001:4b98:dc4:8::226; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 256FA41714 Authentication-Results: smtp4.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 256FA41714 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [IPv6:2001:4b98:dc4:8::226]) by smtp4.osuosl.org (Postfix) with ESMTPS id 256FA41714 for ; Fri, 15 Mar 2024 20:15:46 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 0897EC0002; Fri, 15 Mar 2024 20:15:44 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:52 +0100 Message-ID: <20240315201614.236523-5-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 4/5] ovsdb: raft: Fix assertion when 1-node cluster looses leadership. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Some of the failure tests can make a single-node cluster to loose leadership. In this case the next raft_run() will trigger election with a pre-vore enabled. This is causing an assertion when this server attempts to vote for itself. Fix that by not using pre-voting if the is only one server. A new failure test introduced in later commit triggers this assertion every time. Fixes: 85634fd58004 ("ovsdb: raft: Support pre-vote mechanism to deal with disruptive server.") Signed-off-by: Ilya Maximets Acked-by: Han Zhou --- ovsdb/raft.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index 237d7ebf5..c41419052 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -2083,7 +2083,7 @@ raft_run(struct raft *raft) raft_start_election(raft, true, false); } } else { - raft_start_election(raft, true, false); + raft_start_election(raft, hmap_count(&raft->servers) > 1, false); } } From patchwork Fri Mar 15 20:14:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1912698 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TxFr63r1Rz1yWn for ; Sat, 16 Mar 2024 07:16:06 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id AA41C41899; Fri, 15 Mar 2024 20:16:04 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G4_7cZ3FmscN; Fri, 15 Mar 2024 20:16:00 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org D74ED41825 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id D74ED41825; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A7958C0072; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 107FAC0037 for ; Fri, 15 Mar 2024 20:15:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id A3D9D82362 for ; Fri, 15 Mar 2024 20:15:57 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Qgq0zQSgsZkR for ; Fri, 15 Mar 2024 20:15:55 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=217.70.183.198; helo=relay6-d.mail.gandi.net; envelope-from=i.maximets@ovn.org; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org DEE9D82499 Authentication-Results: smtp1.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org DEE9D82499 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp1.osuosl.org (Postfix) with ESMTPS id DEE9D82499 for ; Fri, 15 Mar 2024 20:15:49 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id 7EE10C0003; Fri, 15 Mar 2024 20:15:47 +0000 (UTC) From: Ilya Maximets To: ovs-dev@openvswitch.org Date: Fri, 15 Mar 2024 21:14:53 +0100 Message-ID: <20240315201614.236523-6-i.maximets@ovn.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240315201614.236523-1-i.maximets@ovn.org> References: <20240315201614.236523-1-i.maximets@ovn.org> MIME-Version: 1.0 X-GND-Sasl: i.maximets@ovn.org Cc: Ilya Maximets , Dumitru Ceara Subject: [ovs-dev] [PATCH 5/5] ovsdb: raft: Fix inability to join after leadership change round trip. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Consider the following sequence of events: 1. Cluster with 2 nodes - A and B. A is a leader. 2. C connects to A and sends a join request. 3. A sends an append request to C. C is in CATCHUP phase for A. 4. A looses leadership to B. Sends join failure notification to C. 5. C sends append reply to A. 6. A discards append reply (not leader). 7. B looses leadership back to A. 8. C sends a new join request to A. 9. A replies with failure (already in progress). 10. GoTo step 8. At this point A is waiting for an append reply that it already discarded at step 6 and fails all the new attempts of C to join with 'already in progress' verdict. C stays forever in a joining state and in a CATCHUP phase from A's perspective. This is a similar case to a sudden disconnect from a leader fixed in commit 999ba294fb4f ("ovsdb: raft: Fix inability to join the cluster after interrupted attempt."), but since we're not disconnecting, the servers are not getting destroyed. Fix that by destroying all the servers that are not yet part of the configuration after leadership is lost. This way, server C will be able to simply re-start the joining process from scratch. New failure test command is added in order to simulate leadership change before we receive the append reply, so it gets discarded. New cluster test is added to exercise this scenario. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Reported-at: https://github.com/ovn-org/ovn/issues/235 Signed-off-by: Ilya Maximets Acked-by: Han Zhou --- ovsdb/raft.c | 16 ++++++++++++- tests/ovsdb-cluster.at | 53 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/ovsdb/raft.c b/ovsdb/raft.c index c41419052..f9e760a08 100644 --- a/ovsdb/raft.c +++ b/ovsdb/raft.c @@ -81,6 +81,7 @@ enum raft_failure_test { FT_STOP_RAFT_RPC, FT_TRANSFER_LEADERSHIP, FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ, + FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD, }; static enum raft_failure_test failure_test; @@ -2702,15 +2703,22 @@ raft_become_follower(struct raft *raft) * new configuration. Our AppendEntries processing will properly update * the server configuration later, if necessary. * + * However, since we're sending replies about a failure to add, those new + * servers has to be cleaned up. Otherwise, they will stuck in a 'CATCHUP' + * phase in case this server regains leadership before they join through + * the current new leader. They are not yet in 'raft->servers', so not + * part of the shared configuration. + * * Also we do not complete commands here, as they can still be completed * if their log entries have already been replicated to other servers. * If the entries were actually committed according to the new leader, our * AppendEntries processing will complete the corresponding commands. */ struct raft_server *s; - HMAP_FOR_EACH (s, hmap_node, &raft->add_servers) { + HMAP_FOR_EACH_POP (s, hmap_node, &raft->add_servers) { raft_send_add_server_reply__(raft, &s->sid, s->address, false, RAFT_SERVER_LOST_LEADERSHIP); + raft_server_destroy(s); } if (raft->remove_server) { raft_send_remove_server_reply__(raft, &raft->remove_server->sid, @@ -3985,6 +3993,10 @@ raft_handle_add_server_request(struct raft *raft, "to cluster "CID_FMT, s->nickname, SID_ARGS(&s->sid), rq->address, CID_ARGS(&raft->cid)); raft_send_append_request(raft, s, 0, "initialize new server"); + + if (failure_test == FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD) { + failure_test = FT_TRANSFER_LEADERSHIP; + } } static void @@ -5110,6 +5122,8 @@ raft_unixctl_failure_test(struct unixctl_conn *conn OVS_UNUSED, } else if (!strcmp(test, "transfer-leadership-after-sending-append-request")) { failure_test = FT_TRANSFER_LEADERSHIP_AFTER_SEND_APPEND_REQ; + } else if (!strcmp(test, "transfer-leadership-after-starting-to-add")) { + failure_test = FT_TRANSFER_LEADERSHIP_AFTER_STARTING_TO_ADD; } else if (!strcmp(test, "transfer-leadership")) { failure_test = FT_TRANSFER_LEADERSHIP; } else if (!strcmp(test, "clear")) { diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at index 482e4e02d..9d8b4d06a 100644 --- a/tests/ovsdb-cluster.at +++ b/tests/ovsdb-cluster.at @@ -525,6 +525,59 @@ for i in $(seq $n); do OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid]) done +AT_CLEANUP + +AT_SETUP([OVSDB cluster - leadership change before replication while joining]) +AT_KEYWORDS([ovsdb server negative unix cluster join]) + +n=5 +AT_CHECK([ovsdb-tool '-vPATTERN:console:%c|%p|%m' create-cluster s1.db dnl + $abs_srcdir/idltest.ovsschema unix:s1.raft], [0], [], [stderr]) +cid=$(ovsdb-tool db-cid s1.db) +schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema) +for i in $(seq 2 $n); do + AT_CHECK([ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft]) +done + +on_exit 'kill $(cat *.pid)' +on_exit " + for i in \$(ls $(pwd)/s[[0-$n]]); do + ovs-appctl --timeout 1 -t \$i cluster/status $schema_name; + done +" + +dnl Starting servers one by one asking all exisitng servers to transfer +dnl leadership right after starting to add a server. Joining server will +dnl need to find a new leader that will also transfer leadership. +dnl This will continue until the same server will not become a leader +dnl for the second time and will be able to add a new server. +for i in $(seq $n); do + dnl Make sure that all already started servers joined the cluster. + for j in $(seq $((i - 1)) ); do + AT_CHECK([ovsdb_client_wait unix:s$j.ovsdb $schema_name connected]) + done + for j in $(seq $((i - 1)) ); do + OVS_WAIT_UNTIL([ovs-appctl -t "$(pwd)"/s$j \ + cluster/failure-test \ + transfer-leadership-after-starting-to-add \ + | grep -q "engaged"]) + done + + AT_CHECK([ovsdb-server -v -vconsole:off -vsyslog:off \ + --detach --no-chdir --log-file=s$i.log \ + --pidfile=s$i.pid --unixctl=s$i \ + --remote=punix:s$i.ovsdb s$i.db]) +done + +dnl Make sure that all servers joined the cluster. +for i in $(seq $n); do + AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) +done + +for i in $(seq $n); do + OVS_APP_EXIT_AND_WAIT_BY_TARGET([$(pwd)/s$i], [s$i.pid]) +done + AT_CLEANUP