From patchwork Fri Sep 20 12:40:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Maximets X-Patchwork-Id: 1987898 X-Patchwork-Delegate: i.maximets@samsung.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X9Bpk5ft9z1y27 for ; Fri, 20 Sep 2024 22:41:50 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 882C4409D6; Fri, 20 Sep 2024 12:41:48 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id pIkuGJdrq9Gt; Fri, 20 Sep 2024 12:41:46 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 9E44540290 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 9E44540290; Fri, 20 Sep 2024 12:41:46 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6C8FFC0012; Fri, 20 Sep 2024 12:41:46 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 16D10C0011 for ; Fri, 20 Sep 2024 12:41:45 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id F264F80D3E for ; Fri, 20 Sep 2024 12:41:44 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id CVIzX4P5xSDT for ; Fri, 20 Sep 2024 12:41:44 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.65; helo=mail-ed1-f65.google.com; envelope-from=i.maximets.ovn@gmail.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org B8D3C80D1A Authentication-Results: smtp1.osuosl.org; dmarc=none (p=none dis=none) header.from=ovn.org DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org B8D3C80D1A Received: from mail-ed1-f65.google.com (mail-ed1-f65.google.com [209.85.208.65]) by smtp1.osuosl.org (Postfix) with ESMTPS id B8D3C80D1A for ; Fri, 20 Sep 2024 12:41:43 +0000 (UTC) Received: by mail-ed1-f65.google.com with SMTP id 4fb4d7f45d1cf-5c26815e174so2619893a12.0 for ; Fri, 20 Sep 2024 05:41:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726836101; x=1727440901; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=v44dS1dOjX7yQ1VpXsZ604/b8zW7ldSSjzjz4ZWCclw=; b=g54PQTBbJJyDz3ZprVa7BrhTAy+MTqBQg1c+f26GQJH2PJXdaBOZfrE0UraJurkWWc kUCbFVsDt3iZEsA8HRfTzJwoNLhlu82IgMK2yQRfu8H7qBctfQkw+/XjLHx8uvMhhyDm PlL6O/htNgPC+bJd82J2ZmSEBuPP6nmm6E+UJ9soytuVluJ1VK3fWOoF1XFXcegceO2s vVCkw/z/3kQf7qRzbeNLsS+p0cwrMBDsrY1a4dVVJohumxYgL8vnceL1V1niJiCQdM/k twWSNxR45S6PgjPSTf324gLZvbIpftTXOrN0w0WCp9nEbJ+suz90NcJOwxWfZJo6wJVK b+qQ== X-Gm-Message-State: AOJu0YxZSXJM0uFXmzlWbytPUmrrYx4AH8V8lD0FibkTmdZAOCx2SVIj z657HOj8dua+Xwx0eim8DiojyIganYVo5fqVOpDtpNnN4az7bXi1fZ1ZlwU/ X-Google-Smtp-Source: AGHT+IHizfApFEaooxpWl0ilg0PzmdcW3ZpkmxCITRQA+F9qbGshUGBb4YKksePmfMaejkR/HsyUtg== X-Received: by 2002:a05:6402:5308:b0:5c4:2c78:b400 with SMTP id 4fb4d7f45d1cf-5c464a3b5dbmr1622121a12.11.1726836100870; Fri, 20 Sep 2024 05:41:40 -0700 (PDT) Received: from im-t490s.redhat.com (ip-86-49-44-151.bb.vodafone.cz. [86.49.44.151]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5c44e27de59sm3881327a12.7.2024.09.20.05.41.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Sep 2024 05:41:40 -0700 (PDT) From: Ilya Maximets To: ovs-dev@openvswitch.org Cc: Dumitru Ceara , Ilya Maximets Date: Fri, 20 Sep 2024 14:40:47 +0200 Message-ID: <20240920124135.1006001-1-i.maximets@ovn.org> X-Mailer: git-send-email 2.46.0 MIME-Version: 1.0 Subject: [ovs-dev] [PATCH] ofproto-dpif: Improve load balancing in dp_hash select groups. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" ovn-kubernetes users observe uneven distribution of traffic among service backends. It gets noticeably worse when 9+ backends are configured. For example: Server connections server-6nbg7 634 server-72br8 326 <----- server-7b8rm 605 server-9tnz7 655 server-c84hw 596 server-mjskl 622 server-ptkcc 336 <----- server-rl7pk 592 server-xd2cb 634 Here we can see that out of 5000 connections total, servers '72br8' and 'ptkcc' received about 2 times less connections than any other server. With 10 backends, 4 servers will receive 2x less, etc. The situation is common, because kubernetes users frequently scale their applications with something like kubectl scale --replicas=10. Services are implemented as OVN load-balancers, which are implemented as OpenFlow select groups with a dp_hash selection method. Balancing is implemented by allocating a hash space (number of hashes is a power of 2) and distributing hashes between the group buckets using Webster's method taking into account bucket weights. This works well enough when buckets have different weights, because OVS needs to allocate larger hash space in order to accommodate the weight difference. However, in case of OVN load-balancers, all the buckets (backends) have the same weight. This leads to allocation of a smaller hash space just enough to fit all the backends (closest power of 2). For example, if we have 10 backends (buckets), then 16 hashes will be allocated. After hashes are distributed, we end up with 6 buckets having 2 hashes each and 4 buckets having 1 hash. So, each of the 6 buckets will receive on average 2x more traffic than each of the remaining 4. The problem can be mitigated by expanding the hash space and this is already done for cases with small number of buckets by limiting the hash space to have at least 16 hashes. However, it is just not large enough for 9 or 10 buckets to provide a good distribution. At the same time, blindly increasing the limit is also not a good solution as we do not want too large of a hash space for small number of buckets, simply because each hash is a separate datapath flow and having too many of them may cause performance issues. Approach taken in this change is to ensure that the hash space is at least 4 times larger than the number of buckets, but not larger than the maximum allowed (256). This provides a better distribution while not unnecessarily exploding number of datapath flows for services with not that many backends. Here is some data to demonstrate why the 4 was chosen as a coefficient: coeff. : 1 2 3 4 5 100 ------------------------------------------------------------------- AvgDiff : 43.1 % 27.1 % 18.3 % 15.1 % 13.6 % 10.9 % MaxDiff : 50.0 % 33.3 % 25.0 % 20.0 % 20.0 % 20.0 % AvgDev : 24.9 % 13.4 % 8.5 % 6.9 % 6.1 % 4.8 % MaxDev : 35.4 % 20.4 % 14.4 % 11.2 % 11.2 % 11.2 % --------+---------------------------------------------------------- 16 : 1 1 1 1 1 - 32 : 17 9 6 5 4 - 64 : 33 17 11 9 7 - 128 : 65 33 22 17 13 1 256 : 129 65 43 33 26 2 --------+---------------------------------------------------------- current proposed Table shows average and maximum load difference (Diff) between backends across groups with 1 to 64 equally weighted backends. And it shows average and maximum standard deviation (Dev) of load distribution for the same. For example, with a coefficient 2, the maximum difference between two backends will be 33% and the maximum standard deviation will be 20.4%. With the current logic (coefficient of 1) we have maximum difference as high as 50%, as shown with the example at the beginning, with the standard deviation of 35.4%. The bottom half of the table shows from how many backends we start to use a particular number of buckets. For example, with a coeff. 3 we will have 16 hashes for 1 to 5 buckets, 32 hashes for 6-10, 64 buckets for 11-21 and so on. According to the table, the number 4 is about where we achieve a good enough standard deviation for the load (11.2%) while still not creating too many hashes for cases with low number of backends. The standard deviation also doesn't go down that much with higher coefficient. The data is aggregated for up to 64 backends because with higher numbers we're getting close to the cap of 256 hashes, so deviation increases. However, the load difference with so many backends should have lower impact on a cluster in general. The change improves the cases with 64+ backends, but to get very good numbers we'll need to consider moving the upper limit for the hash space, which is a separate change to think about. Added test checks that standard deviation of the load distribution between buckets is below 12.5% for all cases up to 64 buckets. It's just a pretty number that I've chosen that should be IMO good enough. Note: when number of buckets is a power of 2, then we have an ideal distribution with zero deviation, because hashes can be evenly mapped to buckets. Note 2: The change doesn't affect distribution for 4 or less backends, since there are still minimum 16 hashes for those groups. Signed-off-by: Ilya Maximets Acked-by: Eelco Chaudron Acked-by: Aaron Conole --- Not sure if this should be considered as a bug fix and be backported. Thoughts on this are welcome. ofproto/ofproto-dpif.c | 6 +++-- tests/ofproto-dpif.at | 54 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 4659d8a3b..bf43d5d4b 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -5341,8 +5341,10 @@ group_setup_dp_hash_table(struct group_dpif *group, size_t max_hash) min_weight, total_weight); uint64_t min_slots = DIV_ROUND_UP(total_weight, min_weight); - uint64_t min_slots2 = ROUND_UP_POW2(min_slots); - uint64_t n_hash = MAX(16, min_slots2); + uint64_t min_slots2 = + MAX(min_slots, MIN(n_buckets * 4, MAX_SELECT_GROUP_HASH_VALUES)); + uint64_t min_slots3 = ROUND_UP_POW2(min_slots2); + uint64_t n_hash = MAX(16, min_slots3); if (n_hash > MAX_SELECT_GROUP_HASH_VALUES || (max_hash != 0 && n_hash > max_hash)) { VLOG_DBG(" Too many hash values required: %"PRIu64, n_hash); diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 12cb7f7a6..1c99dc3bb 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -1238,6 +1238,60 @@ bucket3 >= 500 OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([ofproto-dpif - select group with dp_hash and equal weights]) + +OVS_VSWITCHD_START +add_of_ports br0 1 10 + +AT_CHECK([ovs-appctl vlog/set ofproto_dpif:file:dbg vconn:file:info]) + +AT_DATA([stddev.awk], [ + { + # $1 (target) is a mean value, because all weights are the same. + # $2 (hits) is an actual number of hashes assigned to this bucket. + n_hashes += $2 + n_buckets++ + sum_sq_diff += ($2 - $1) * ($2 - $1) + } + END { + mean = n_hashes / n_buckets + stddev = sqrt(sum_sq_diff / n_buckets) + stddevp = stddev * 100 / mean + + print "hashes:", n_hashes, "buckets:", n_buckets + print "mean:", mean, "stddev:", stddev, "(", stddevp, "% )" + + # Make sure that standard deviation of load between buckets is below 12.5%. + # Note: it's not a strict requirement, but a good number that passes tests. + if (stddevp <= 12.5) { print "PASS" } + else { print "FAIL" } + } +]) + +m4_define([CHECK_DISTRIBUTION], [ + AT_CHECK([tail -n $1 ovs-vswitchd.log | grep 'ofproto_dpif|DBG|.*Bucket' \ + | sed 's/.*target=\([[0-9\.]]*\) hits=\([[0-9]]*\)/\1 \2/' \ + | awk -f stddev.awk], [0], [stdout]) + AT_CHECK([grep -q "buckets: $2" stdout]) + AT_CHECK([grep -q 'PASS' stdout]) +]) + +m4_define([OF_GROUP], [group_id=$1,type=select,selection_method=dp_hash]) +m4_define([OFG_BUCKET], [bucket=weight=$1,output:10]) + +dnl Test load distribution in groups with up to 64 equally weighted buckets. +m4_define([OFG_BUCKETS], [OFG_BUCKET(100)]) +m4_for([id], [1], [64], [1], [ + get_log_next_line_num + AT_CHECK([ovs-ofctl -O OpenFlow15 add-group br0 \ + "OF_GROUP(id),OFG_BUCKETS()"]) + CHECK_DISTRIBUTION([+$LINENUM], [id]) + m4_append([OFG_BUCKETS], [,OFG_BUCKET(100)]) +]) + +OVS_VSWITCHD_STOP +AT_CLEANUP + AT_SETUP([ofproto-dpif - select group with explicit dp_hash selection method]) OVS_VSWITCHD_START