From patchwork Sat Jun 29 03:58:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1954202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9z8715rPz20Xg for ; Sat, 29 Jun 2024 13:59:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2EB33831391 for ; Sat, 29 Jun 2024 03:59:16 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx1.zhaoxin.com (MX1.ZHAOXIN.COM [210.0.225.12]) by sourceware.org (Postfix) with ESMTPS id 4DD9A389908B for ; Sat, 29 Jun 2024 03:58:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4DD9A389908B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4DD9A389908B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=210.0.225.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; cv=none; b=VzzL86TvdiyK0XUG98zOaHHJ6lh0MauHo7gANdiFgpBgqj984ZqOi7ymu1it+rfUFXVJ6+82jbbCau16pXiqrNU4iWrBxDNsh5XQ9tkX5W8KY8rWljHMR+lcRjMoVsHP/PwuyDTrirwBUxeKF5vO4StYapqM8JnqDm/0aZChsb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; c=relaxed/simple; bh=X9VqSjoXUiLWWqQXNfxUvHwp4nO8KSMxdLJ7cY9pQjw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ofgbl8i388GBv+GEX4H6z7OWNomuW5/AThfbkgx2/Bd4g35aqZm2cuKVCsOfAM5daekoBXZQbLYN6iWDQG+wmFCJYA3BmrX0VonGyLE4N01vSqC5BcCxSejB3jUS3mqOeeI9CEJMqSrCS/tYYfKR1QVQAQt1ZCrmIMRym0mvnY8= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719633515-086e23110713a760001-zm97zT Received: from ZXSHMBX1.zhaoxin.com (ZXSHMBX1.zhaoxin.com [10.28.252.163]) by mx1.zhaoxin.com with ESMTP id qSLo8F8UarBv5E5k (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 29 Jun 2024 11:58:35 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX1.zhaoxin.com (10.28.252.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:35 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:34 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 From: MayShao-oc X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH v2 1/3] x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Date: Sat, 29 Jun 2024 11:58:26 +0800 X-ASG-Orig-Subj: [PATCH v2 1/3] x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Message-ID: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX1.zhaoxin.com[10.28.252.163] X-Barracuda-Start-Time: 1719633515 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.35:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 2952 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126910 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein --- sysdeps/x86/cpu-features.c | 51 ++++++++++++++++++++++++++------------ 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 3d7c2819d7..1927f65699 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1023,39 +1023,58 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht model += extended_model; if (family == 0x6) - { - if (model == 0xf || model == 0x19) - { + { + /* Tuning for older Zhaoxin processors. */ + if (model == 0xf || model == 0x19) + { CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; + /* Unaligned AVX loads are slower. */ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - } + &= ~bit_arch_AVX_Fast_Unaligned_Load; + } + } else if (family == 0x7) - { - if (model == 0x1b) + { + switch (model) { + /* Wudaokou microarch tuning. */ + case 0x1b: CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + |= bit_arch_Slow_SSE4_2; cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - else if (model == 0x3b) - { + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + /* Lujiazui microarch tuning. */ + case 0x3b: CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + /* Yongfeng and Shijidadao mircoarch tuning. */ + case 0x5b: + case 0x6b: + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + + /* To use sse2_unaligned versions of memset, strcpy and strcat. + */ + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] + |= (bit_arch_Prefer_No_VZEROUPPER + | bit_arch_Fast_Unaligned_Load); + break; } } } From patchwork Sat Jun 29 03:58:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1954201 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9z8717CHz20Zy for ; Sat, 29 Jun 2024 13:59:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B8CF3899092 for ; Sat, 29 Jun 2024 03:59:10 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx1.zhaoxin.com (MX1.ZHAOXIN.COM [210.0.225.12]) by sourceware.org (Postfix) with ESMTPS id 0C2D23831391 for ; Sat, 29 Jun 2024 03:58:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0C2D23831391 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0C2D23831391 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=210.0.225.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; cv=none; b=pxs6wypC2Zehu3+22McdBbRW6PE7QHeNUaOpgEk0UAd7cU4LeyUQErSHMaFTETu4zkz7UjRXSkBfXCQi24twyDWiHdIJstq8ZEVJHpEkiprdKZ8oTvQc0sGtdp9vtj+nVKqvdV8xSnZSDYExUM9ov3oSZr9VzFCeY5D8OO/Ec6I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; c=relaxed/simple; bh=YsxlsG0YSRC6Y7g1fJYG8lG3V1lRmpJNT2EogxsrP5k=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=PV1lSOCp5tVHGGj9ExdB+e1eu3/7BwJ8d/O05J0RkhLLVZfAHzVdKdCSDBoqK3Hfo/A972ljqkFeZzg2Hr4Yk3l7/rX/0xipP7NB1QPRa5TkXKwTYQxY1iR0bAkbvDwhLZS4fJcttBdTkK/HvgSC9pKDPdkxZh8/PFgTxqf/mFw= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719633516-086e23110613a760001-zm97zT Received: from ZXSHMBX3.zhaoxin.com (ZXSHMBX3.zhaoxin.com [10.28.252.165]) by mx1.zhaoxin.com with ESMTP id ELb188RSXetCRy0w (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 29 Jun 2024 11:58:36 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.165 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX3.zhaoxin.com (10.28.252.165) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:36 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:35 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.165 From: MayShao-oc X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH v2 2/3] x86_64: Optimize large size copy in memmove-ssse3 Date: Sat, 29 Jun 2024 11:58:27 +0800 X-ASG-Orig-Subj: [PATCH v2 2/3] x86_64: Optimize large size copy in memmove-ssse3 Message-ID: <20240629035828.4145216-2-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> References: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX3.zhaoxin.com[10.28.252.165] X-Barracuda-Start-Time: 1719633516 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.35:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 2569 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126910 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein --- sysdeps/x86_64/multiarch/memmove-ssse3.S | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memmove-ssse3.S b/sysdeps/x86_64/multiarch/memmove-ssse3.S index 048d015712..01008fd981 100644 --- a/sysdeps/x86_64/multiarch/memmove-ssse3.S +++ b/sysdeps/x86_64/multiarch/memmove-ssse3.S @@ -151,13 +151,10 @@ L(more_2x_vec): loop. */ movups %xmm0, (%rdi) -# ifdef SHARED_CACHE_SIZE_HALF - cmp $SHARED_CACHE_SIZE_HALF, %RDX_LP -# else - cmp __x86_shared_cache_size_half(%rip), %rdx -# endif + cmp __x86_shared_non_temporal_threshold(%rip), %rdx ja L(large_memcpy) +L(loop_fwd): leaq -64(%rdi, %rdx), %r8 andq $-16, %rdi movl $48, %edx @@ -199,6 +196,13 @@ L(large_memcpy): movups -64(%r9, %rdx), %xmm10 movups -80(%r9, %rdx), %xmm11 + /* Check if src and dst overlap. If they do use cacheable + writes to potentially gain positive interference between + the loads during the memmove. */ + subq %rdi, %r9 + cmpq %rdx, %r9 + jb L(loop_fwd) + sall $5, %ecx leal (%rcx, %rcx, 2), %r8d leaq -96(%rdi, %rdx), %rcx From patchwork Sat Jun 29 03:58:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1954203 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9z8m0fJ1z20Xg for ; Sat, 29 Jun 2024 13:59:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E5F1A3899096 for ; Sat, 29 Jun 2024 03:59:48 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx1.zhaoxin.com (MX1.ZHAOXIN.COM [210.0.225.12]) by sourceware.org (Postfix) with ESMTPS id 27CA53899089 for ; Sat, 29 Jun 2024 03:58:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 27CA53899089 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 27CA53899089 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=210.0.225.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; cv=none; b=ZL20a7MwBFEtrsPA/2KLSob0bD9cx94A9gDY4tQ+mvGVcSNRDAGiCqu8x4EKJgbXLCcr3sHE4TsVaN/IlUrNjGCBZ8z78YwlR57sF2JjRQ8fOtH9brYF6QiVMsfgEqXejE5rTg4OSXWQtKtvL7rlpzJLWrFWRzHj/vLt9dGDEsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; c=relaxed/simple; bh=To9hwcSfix+SmE6NOnfp6pF6PlqwHgOkyX4wrGhy82E=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=i3r8gAIoSi4oQ3R2stgIDT7QduiQ2/cLHkJTQny9hfBX1VPnumhaTdqWu61JC8iKbmlrRvefMMwSKTT7Pr6Bb+dDlHHZaWR4tT9Ue/MxYVL5PiHJC+EZXrpn1WAMEnuI/bpSKa++0lZwYyQW7Qdc+llR/m0+m44Kuye0+t21VVg= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719633517-086e23110513a760001-zm97zT Received: from ZXSHMBX2.zhaoxin.com (ZXSHMBX2.zhaoxin.com [10.28.252.164]) by mx1.zhaoxin.com with ESMTP id e1rhHkV5pFKINFTO (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 29 Jun 2024 11:58:37 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.164 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX2.zhaoxin.com (10.28.252.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:36 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:35 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.164 From: MayShao-oc X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH v2 3/3] x86: Set default non_temporal_threshold for Zhaoxin processors Date: Sat, 29 Jun 2024 11:58:28 +0800 X-ASG-Orig-Subj: [PATCH v2 3/3] x86: Set default non_temporal_threshold for Zhaoxin processors Message-ID: <20240629035828.4145216-3-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> References: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX2.zhaoxin.com[10.28.252.164] X-Barracuda-Start-Time: 1719633517 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.35:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 2036 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126910 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein --- sysdeps/x86/cpu-features.c | 1 + sysdeps/x86/dl-cacheinfo.h | 6 ++++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 1927f65699..e501e084ef 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1065,6 +1065,7 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht /* Yongfeng and Shijidadao mircoarch tuning. */ case 0x5b: + cpu_features->cachesize_non_temporal_divisor = 2; case 0x6b: cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] &= ~bit_arch_AVX_Fast_Unaligned_Load; diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 3a6ec4ef9f..5e77345a6e 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -934,8 +934,10 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run a higher risk of actually thrashing the cache as they don't have a HW LRU hint. As well, their performance in highly parallel situations is - noticeably worse. */ - if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS)) + noticeably worse. Zhaoxin processors are an exception, the lowbound is not + suitable for them based on actual test data. */ + if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS) + && cpu_features->basic.kind != arch_kind_zhaoxin) non_temporal_threshold = non_temporal_threshold_lowbound; /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of 'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best