From patchwork Wed Jun 26 02:46:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1952323 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W85hb4xF5z20X6 for ; Wed, 26 Jun 2024 12:47:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 40DE33870C0C for ; Wed, 26 Jun 2024 02:47:25 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx1.zhaoxin.com (MX1.ZHAOXIN.COM [210.0.225.12]) by sourceware.org (Postfix) with ESMTPS id CC4E038708D6 for ; Wed, 26 Jun 2024 02:47:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CC4E038708D6 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CC4E038708D6 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=210.0.225.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370027; cv=none; b=RVpd8G9sVJgZTJ3Uf5LnsD1uYaQ1sQmwpI3wShE0KseKMlJ0+RORcUKkzDOeVUNSASEsGPCURt9AmgNVSrv51gWXbcDSZoFjqYzv+nOu+pr00VLE5uwFXogG0uuMJL8pQ8SvvDj2mzDxKZXTJ72YL9z80Vddw8D3zptOM7aUTQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370027; c=relaxed/simple; bh=sdKAgR8enu2wKXstd+iwSEA4bYCuRc16oyy1Jqo3Q8Y=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=JLaIzVZdK5hKIdWT5CeDkJ04KJR3oS/uKver0VS5Ibl1XVnIOOwUmUsYVX+/FUNOU4MHOrQ6Og4EKKhoWUsbHwdafCI4r4l0gdwHJpJkwQQXNandjh+OGl2s+3OKPe0MlfXjZgtryraKmFvlTBkwONUaM979PEnKuFVWd9WCLgM= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719370016-086e231105135240001-zm97zT Received: from ZXSHMBX3.zhaoxin.com (ZXSHMBX3.zhaoxin.com [10.28.252.165]) by mx1.zhaoxin.com with ESMTP id VC0znWcGNB1t577s (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 26 Jun 2024 10:46:56 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.165 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX3.zhaoxin.com (10.28.252.165) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:56 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:55 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.165 From: MayShao X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH 1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Date: Wed, 26 Jun 2024 10:46:47 +0800 X-ASG-Orig-Subj: [PATCH 1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Message-ID: <20240626024649.3689-1-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX3.zhaoxin.com[10.28.252.165] X-Barracuda-Start-Time: 1719370016 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.35:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 3716 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126768 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org From: MayShao Fix code indentation issues under the Zhaoxin branch. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. --- sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 27 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 3d7c2819d7..24fbf699b9 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1015,7 +1015,7 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht kind = arch_kind_zhaoxin; get_common_indices (cpu_features, &family, &model, &extended_model, - &stepping); + &stepping); get_extended_indices (cpu_features); @@ -1026,38 +1026,50 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht { if (model == 0xf || model == 0x19) { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; } } else if (family == 0x7) { - if (model == 0x1b) - { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); - - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; - - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - else if (model == 0x3b) - { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); - - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - } + switch (model) + { + case 0x1b: + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); + + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; + + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + case 0x3b: + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); + + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + case 0x5b: + case 0x6b: + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] + |= (bit_arch_Prefer_No_VZEROUPPER + | bit_arch_Fast_Unaligned_Load); + break; + } + } } else { From patchwork Wed Jun 26 02:46:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1952324 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W85hg0k8Hz20X6 for ; Wed, 26 Jun 2024 12:47:31 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 938A43870C3F for ; Wed, 26 Jun 2024 02:47:28 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx2.zhaoxin.com (mx2.zhaoxin.com [203.110.167.99]) by sourceware.org (Postfix) with ESMTPS id E4A533870930 for ; Wed, 26 Jun 2024 02:47:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4A533870930 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E4A533870930 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=203.110.167.99 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370029; cv=none; b=vAidpazuBKpev8PSxEsL1/cNrtRVXJ4eDBmuJscY4MWiN7ToDrijWi1zua+NDXneMuefyI6zbVehwUD3P45sODFCDKG6+mT8rYwpUYJS+usaL+6PeJWPWyteEKtU0LjOIMqavV1xz848PnFAy5OVVCM1mn99OYCUicMkuGrYEiE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370029; c=relaxed/simple; bh=Pp/87ce+XYbwBIbNP4c4jHqwbrlnfF6XuTyN14hcfww=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=xuItISSyxgknm+vCeMgBwPjEzd3jxz9yQtSJBnAnVWGPlSU8hCs/mM6ydL9EPK3guxmee05VpeMTF6lMvHb1HwpQqdfF6tKtg4z65QF3592l/FTNH5NLjsCyMLzuZALTWTll+Borh751Xz8xiJ5mZWd/P7VQgkV5cKKCPVPx2a0= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719370018-1eb14e2e60bbc60001-zm97zT Received: from ZXSHMBX2.zhaoxin.com (ZXSHMBX2.zhaoxin.com [10.28.252.164]) by mx2.zhaoxin.com with ESMTP id v7pcCYrYpkMymQGP (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 26 Jun 2024 10:46:58 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.164 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX2.zhaoxin.com (10.28.252.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:57 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:56 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.164 From: MayShao X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH 2/3] x86_64: Optimize large size copy in memmove-ssse3 Date: Wed, 26 Jun 2024 10:46:48 +0800 X-ASG-Orig-Subj: [PATCH 2/3] x86_64: Optimize large size copy in memmove-ssse3 Message-ID: <20240626024649.3689-2-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240626024649.3689-1-MayShao-oc@zhaoxin.com> References: <20240626024649.3689-1-MayShao-oc@zhaoxin.com> MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX2.zhaoxin.com[10.28.252.164] X-Barracuda-Start-Time: 1719370018 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.36:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 2469 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126768 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org From: MayShao This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the ' __x86_shared_non_temporal_threshold' is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 1.000 bench-memcpy-random geometric_mean(N=20) New / Original: 0.998 bench-memcpy-large geometric_mean(N=20) New / Original: 0.975 bench-memmove geometric_mean(N=20) New / Original: 1.001 bench-memmmove-large geometric_mean(N=20) New / Original: 0.964 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.007 bench-memcpy-random geometric_mean(N=20) New / Original: 1.000 bench-memcpy-large geometric_mean(N=20) New / Original: 0.998 bench-memmove geometric_mean(N=20) New / Original: 0.996 bench-memmmove-large geometric_mean(N=20) New / Original: 0.941 --- sysdeps/x86_64/multiarch/memmove-ssse3.S | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memmove-ssse3.S b/sysdeps/x86_64/multiarch/memmove-ssse3.S index 048d015712..40bf90b2b7 100644 --- a/sysdeps/x86_64/multiarch/memmove-ssse3.S +++ b/sysdeps/x86_64/multiarch/memmove-ssse3.S @@ -151,13 +151,11 @@ L(more_2x_vec): loop. */ movups %xmm0, (%rdi) -# ifdef SHARED_CACHE_SIZE_HALF - cmp $SHARED_CACHE_SIZE_HALF, %RDX_LP -# else - cmp __x86_shared_cache_size_half(%rip), %rdx -# endif + cmp __x86_shared_non_temporal_threshold(%rip), %rdx ja L(large_memcpy) + .p2align 4,, 8 +L(loop_fwd): leaq -64(%rdi, %rdx), %r8 andq $-16, %rdi movl $48, %edx @@ -199,6 +197,10 @@ L(large_memcpy): movups -64(%r9, %rdx), %xmm10 movups -80(%r9, %rdx), %xmm11 + subq %rdi, %r9 + cmpq %r9, %rdx + ja L(loop_fwd) + sall $5, %ecx leal (%rcx, %rcx, 2), %r8d leaq -96(%rdi, %rdx), %rcx From patchwork Wed Jun 26 02:46:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1952325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W85hh1kGWz20X6 for ; Wed, 26 Jun 2024 12:47:32 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6CC953870C1B for ; Wed, 26 Jun 2024 02:47:30 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx2.zhaoxin.com (mx2.zhaoxin.com [203.110.167.99]) by sourceware.org (Postfix) with ESMTPS id F15B83870C09 for ; Wed, 26 Jun 2024 02:47:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F15B83870C09 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F15B83870C09 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=203.110.167.99 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370029; cv=none; b=wt08ylqjo6aNG16jxbQRgd+h2nqnN07sDH95ArSV1mB8Hj7fRGtNOlgBGfRFQpMYI/k5PoFc58HLL+bUmWFxYansUBrkZIgHBV2QkbeB8bgghb7m6IYumiCYfZjBiF/gfK1/aowzk/ZZgpvxITA/zz42p1LVr/y/aS/jrxryzrk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719370029; c=relaxed/simple; bh=0ppHtmQhSMQTkhYs9/E/GMNvt6xWZ1N0XkZ6xKdV5Cs=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=s3sPXIzGGSZqUwnxLlx8gycU6omkmNqpxmxZVjRXUDkoMwqZSKjV7o/k1xCpgOa4qvWIRIPkZFxBVxARioD+0QS3gELi3vTZa29lVmNJYryjdzjEt7R8RObBaMwwMq+0HuyR3c7pfAC338R93bm5X02Swmeal1Q25K04+XAX4UE= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719370019-1eb14e2e5fbbbf0001-zm97zT Received: from ZXSHMBX1.zhaoxin.com (ZXSHMBX1.zhaoxin.com [10.28.252.163]) by mx2.zhaoxin.com with ESMTP id D6JoC1Ut14cmHXKJ (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Wed, 26 Jun 2024 10:46:59 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX1.zhaoxin.com (10.28.252.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:59 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 26 Jun 2024 10:46:57 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 From: MayShao X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH 3/3] x86: Set default non_temporal_threshold for Zhaoxin processors Date: Wed, 26 Jun 2024 10:46:49 +0800 X-ASG-Orig-Subj: [PATCH 3/3] x86: Set default non_temporal_threshold for Zhaoxin processors Message-ID: <20240626024649.3689-3-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240626024649.3689-1-MayShao-oc@zhaoxin.com> References: <20240626024649.3689-1-MayShao-oc@zhaoxin.com> MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX1.zhaoxin.com[10.28.252.163] X-Barracuda-Start-Time: 1719370019 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.36:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 1940 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126768 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org From: MayShao Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. --- sysdeps/x86/cpu-features.c | 2 ++ sysdeps/x86/dl-cacheinfo.h | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 24fbf699b9..55dac6a8b2 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1060,7 +1060,9 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht break; case 0x5b: + cpu_features->cachesize_non_temporal_divisor = 2; case 0x6b: + cpu_features->cachesize_non_temporal_divisor = 4; cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] &= ~bit_arch_AVX_Fast_Unaligned_Load; diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 3a6ec4ef9f..438997a707 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -935,7 +935,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) a higher risk of actually thrashing the cache as they don't have a HW LRU hint. As well, their performance in highly parallel situations is noticeably worse. */ - if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS)) + if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS) + && cpu_features->basic.kind != arch_kind_zhaoxin) non_temporal_threshold = non_temporal_threshold_lowbound; /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best