From patchwork Tue Oct 31 20:09:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1857752 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=i55wSBTN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKh851BKTz1yQ5 for ; Wed, 1 Nov 2023 07:10:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 13B6B3857355 for ; Tue, 31 Oct 2023 20:10:11 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id 96BA5385770A for ; Tue, 31 Oct 2023 20:09:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 96BA5385770A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 96BA5385770A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1133 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782977; cv=none; b=bnxWC4qIapwSlTQjIqUDjJAAjwbLYOHcsfCcRcMJO9b/fbNtXnaCRLqX5oX81R/DxTEVPg2kAq3lVJBsi1Jh0xbTDjv4YzTszz1uuoolbeHicKSvBViCKAdt76ZmARF3HhMirhqWcvnsqsmQNOmFOXanNslkDVxE/yXaf+HkaVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782977; c=relaxed/simple; bh=9WsET7lIW9+kn5hSO4zzd6EWHRIXxU5dVbmf+rh3PlQ=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=nsYBClVN0w00fPLx+qd1ARL9oEc0NpVk27B5/AcAy8I0iPf2vUWw8afr1D27jXJI/+gJ2qY3m5jfRk7PnTekD+JbpgPs1x/ldhmzhv1dQZZAQQhAzPjCXXUURmsfbCZeTofKliS+AGqOa/bMuFbBq27Ud1p2m6FTZ+3new8U2Ro= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-59b5484fbe6so59077927b3.1 for ; Tue, 31 Oct 2023 13:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1698782975; x=1699387775; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=sWcO/F3jf5++7bvf7k677eZvBleFBNohDCBgOUG9Cw4=; b=i55wSBTNTJVklV1cQbqK2E7uLdiLhq0/DlItPufE2Mx6Ju2YrHNeuv3X9L7Ww9DnEb Dq89ojhV7gF8f4uYZFtuTz1DWN/7y5PEj0lswJSML2HMIBb0gxikHtQmVi5Z9e43WUEt nwUlwrfrYSH4TYFP9Sc3JkBBhIg9WMkCsHVaWHAsA9LVyoD9bDvvNXK2315X05jpwi+i 74g+EcB2ki+LvGPShfebxSaB6XA5R7f3LT3s/UZ5+ddihUUJmMYfAtnr23WutDEoOeBA A+8x/hRj0mubgBpj8cBInsScIVr0tNPsuLJZXmJjg6A6xP7KA3GS4CBrcgbrmY0CtHWA Gqzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698782975; x=1699387775; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sWcO/F3jf5++7bvf7k677eZvBleFBNohDCBgOUG9Cw4=; b=Mm+Q6SXE3+ERfzWxvR7evAk4itgKlRD66d/yF/RZisLrg9nKY8KMd0sxFiMsBSKCOs OY10aEwZn5zGh+ZLBG07hGRYfYzgxYejyTOz8I3qOFkD76vK/rz3rqFJ+lbpKwI8rOul xx5E+0KQc5jO1TK3kpo05mVx/8n+626x3Gm/LJPikMCIXn8VoFmYGMWK0F47WI08sx4t TPsQLfSqh5vRzbv9MmjO6o1MEeVsVvBr/9aHiOgB8GOVD9ETVYybPqVVVnRvTFJpfjWr OmQb7zjZbadBCtrRDMwAgJjNY9su74J6LjAHquKv0hmrvjcEIyyIc3372QD7zA+erWhp eiqw== X-Gm-Message-State: AOJu0YxSZvJHptowwTHaFq2LDk1PdRcTVU2NdZocA1F6B965j8gru0Nb 5q6ltt8gj23UK5Ln/hWPv15vjEHcIpo92kVrYvjoyw== X-Google-Smtp-Source: AGHT+IGuTU/o/x1A1iq3rBwbQj3AZbB3Z1H8a1P1UpPsuchi6tdrG/+aMbnqPpk6u5ssCuMq5dd8qg== X-Received: by 2002:a81:ed0a:0:b0:5a8:3cb:b53d with SMTP id k10-20020a81ed0a000000b005a803cbb53dmr12418187ywm.1.1698782975259; Tue, 31 Oct 2023 13:09:35 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:3d3c:6c87:9be3:8cfc:976d]) by smtp.gmail.com with ESMTPSA id q69-20020a819948000000b005a7fa3ccb32sm1264111ywg.35.2023.10.31.13.09.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 13:09:34 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , "H . J . Lu" , Bruce Merry Subject: [PATCH 3/4] x86: Do not prefer ERMS for memset on Zen3+ Date: Tue, 31 Oct 2023 17:09:24 -0300 Message-Id: <20231031200925.3297456-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> References: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The REP STOSB usage on memset does show any performance gain on Zen3/Zen4 cores compared to the vectorized loops. Checked on x86_64-linux-gnu. --- sysdeps/x86/dl-cacheinfo.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 51e5ba200f..99ba0f776a 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1018,11 +1018,17 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (tunable_size > minimum_rep_movsb_threshold) rep_movsb_threshold = tunable_size; - /* NB: The default value of the x86_rep_stosb_threshold tunable is the - same as the default value of __x86_rep_stosb_threshold and the - minimum value is fixed. */ - rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, - long int, NULL); + /* For AMD Zen3+ architecture, the performance of vectorized loop is + slight better than ERMS. */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_stosb_threshold = SIZE_MAX; + + if (TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* NB: The default value of the x86_rep_stosb_threshold tunable is the + same as the default value of __x86_rep_stosb_threshold and the + minimum value is fixed. */ + rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, + long int, NULL); TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX);