From patchwork Sat Jun 29 03:58:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mayshao-oc X-Patchwork-Id: 1954202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9z8715rPz20Xg for ; Sat, 29 Jun 2024 13:59:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2EB33831391 for ; Sat, 29 Jun 2024 03:59:16 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mx1.zhaoxin.com (MX1.ZHAOXIN.COM [210.0.225.12]) by sourceware.org (Postfix) with ESMTPS id 4DD9A389908B for ; Sat, 29 Jun 2024 03:58:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4DD9A389908B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=zhaoxin.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zhaoxin.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4DD9A389908B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=210.0.225.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; cv=none; b=VzzL86TvdiyK0XUG98zOaHHJ6lh0MauHo7gANdiFgpBgqj984ZqOi7ymu1it+rfUFXVJ6+82jbbCau16pXiqrNU4iWrBxDNsh5XQ9tkX5W8KY8rWljHMR+lcRjMoVsHP/PwuyDTrirwBUxeKF5vO4StYapqM8JnqDm/0aZChsb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719633532; c=relaxed/simple; bh=X9VqSjoXUiLWWqQXNfxUvHwp4nO8KSMxdLJ7cY9pQjw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ofgbl8i388GBv+GEX4H6z7OWNomuW5/AThfbkgx2/Bd4g35aqZm2cuKVCsOfAM5daekoBXZQbLYN6iWDQG+wmFCJYA3BmrX0VonGyLE4N01vSqC5BcCxSejB3jUS3mqOeeI9CEJMqSrCS/tYYfKR1QVQAQt1ZCrmIMRym0mvnY8= ARC-Authentication-Results: i=1; server2.sourceware.org X-ASG-Debug-ID: 1719633515-086e23110713a760001-zm97zT Received: from ZXSHMBX1.zhaoxin.com (ZXSHMBX1.zhaoxin.com [10.28.252.163]) by mx1.zhaoxin.com with ESMTP id qSLo8F8UarBv5E5k (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sat, 29 Jun 2024 11:58:35 +0800 (CST) X-Barracuda-Envelope-From: Mayshao-oc@zhaoxin.com X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 Received: from ZXBJMBX02.zhaoxin.com (10.29.252.6) by ZXSHMBX1.zhaoxin.com (10.28.252.163) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:35 +0800 Received: from zhaoxin.com (223.70.179.86) by ZXBJMBX02.zhaoxin.com (10.29.252.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 29 Jun 2024 11:58:34 +0800 X-Barracuda-RBL-Trusted-Forwarder: 10.28.252.163 From: MayShao-oc X-Barracuda-RBL-Trusted-Forwarder: 10.29.252.6 To: , , , CC: , , , Subject: [PATCH v2 1/3] x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Date: Sat, 29 Jun 2024 11:58:26 +0800 X-ASG-Orig-Subj: [PATCH v2 1/3] x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Message-ID: <20240629035828.4145216-1-MayShao-oc@zhaoxin.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [223.70.179.86] X-ClientProxiedBy: ZXSHCAS1.zhaoxin.com (10.28.252.161) To ZXBJMBX02.zhaoxin.com (10.29.252.6) X-Barracuda-Connect: ZXSHMBX1.zhaoxin.com[10.28.252.163] X-Barracuda-Start-Time: 1719633515 X-Barracuda-Encrypted: ECDHE-RSA-AES128-GCM-SHA256 X-Barracuda-URL: https://10.28.252.35:4443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at zhaoxin.com X-Barracuda-Scan-Msg-Size: 2952 X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.126910 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein --- sysdeps/x86/cpu-features.c | 51 ++++++++++++++++++++++++++------------ 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 3d7c2819d7..1927f65699 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1023,39 +1023,58 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht model += extended_model; if (family == 0x6) - { - if (model == 0xf || model == 0x19) - { + { + /* Tuning for older Zhaoxin processors. */ + if (model == 0xf || model == 0x19) + { CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; + /* Unaligned AVX loads are slower. */ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - } + &= ~bit_arch_AVX_Fast_Unaligned_Load; + } + } else if (family == 0x7) - { - if (model == 0x1b) + { + switch (model) { + /* Wudaokou microarch tuning. */ + case 0x1b: CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + |= bit_arch_Slow_SSE4_2; cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - else if (model == 0x3b) - { + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + /* Lujiazui microarch tuning. */ + case 0x3b: CPU_FEATURE_UNSET (cpu_features, AVX); CPU_FEATURE_UNSET (cpu_features, AVX2); cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + /* Yongfeng and Shijidadao mircoarch tuning. */ + case 0x5b: + case 0x6b: + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + + /* To use sse2_unaligned versions of memset, strcpy and strcat. + */ + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] + |= (bit_arch_Prefer_No_VZEROUPPER + | bit_arch_Fast_Unaligned_Load); + break; } } }