From patchwork Mon Aug 12 06:48:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 1971447 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wj4rV6s4Qz1yXl for ; Mon, 12 Aug 2024 16:49:46 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9B95C385840A for ; Mon, 12 Aug 2024 06:49:43 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mailgw1.hygon.cn (unknown [110.188.70.11]) by sourceware.org (Postfix) with ESMTP id CB6CA3858D39 for ; Mon, 12 Aug 2024 06:48:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB6CA3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hygon.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CB6CA3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=110.188.70.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723445331; cv=none; b=jPKZme5zhYMVsCFAAxlAmJlTiCcZK8WyXxuwJh7TQwDTQXhRFcbgjDULSQnFIdTzE0blOC5fcY9y0dvoyjPkPPlQJYJjokcHXMmL9IOsexJww7Ba4jDlqhehdUkPcXmgHxLT0bfoBA7pgY6hNq082XEhZXksHwAEUnZaWuU02wU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723445331; c=relaxed/simple; bh=+wXHZIgbXhXu+IUi0jsx1DwJCnQurSL1FpkQdA9wvDM=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=bJJf1yqfv21mJI1Bmg16HUn5N7L1S7eV95PI4gSTN0tpNYuamhM4Qv9t7JZa05V6DfbAFitrxG2nbUNxCEvsvPh+QPUsBFpnEy/vs864nKYL9YoRBSOemnMah0Pm7NJupY9JheuXyF7ukiTgxuQU48oWMhRvKSniJKeleYlKRQU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from maildlp1.hygon.cn (unknown [172.23.18.60]) by mailgw1.hygon.cn (Postfix) with ESMTP id 79202F9A0; Mon, 12 Aug 2024 14:48:38 +0800 (CST) Received: from cncheex01.Hygon.cn (unknown [172.23.18.10]) by maildlp1.hygon.cn (Postfix) with ESMTPS id 66B711571; Mon, 12 Aug 2024 14:48:38 +0800 (CST) Received: from trace.hygon.cn (172.23.18.45) by cncheex01.Hygon.cn (172.23.18.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Mon, 12 Aug 2024 14:48:41 +0800 From: Feifei Wang To: CC: , , , , , Subject: [RFC PATCH 3/3] x86: Enable non-temporal memset for Hygon processors Date: Mon, 12 Aug 2024 14:48:25 +0800 Message-ID: <1723445305-99403-4-git-send-email-wangfeifei@hygon.cn> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1723445305-99403-1-git-send-email-wangfeifei@hygon.cn> References: <1723445305-99403-1-git-send-email-wangfeifei@hygon.cn> MIME-Version: 1.0 X-Originating-IP: [172.23.18.45] X-ClientProxiedBy: cncheex01.Hygon.cn (172.23.18.10) To cncheex01.Hygon.cn (172.23.18.10) X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org This patch is based on the following new flag patch: https://patchwork.sourceware.org/project/glibc/patch/20240811055619.2863839-1-goldstein.w.n@gmail.com/ After the new cpu-flag 'Prefer_Non_Temporal' is added in glibc, this patch can be enabled to access the non-temporal memset implementation for hygon processors. Test Results: thread: 1 memset store value: 0 hygon1 arch x86_memset_non_temporal_threshold = 8MB size new performance / old performance 128 byte(2x -4x vec case) 1 256 byte(4x - 8x vec case) 1 512 byte( > 8x loop case) 1 1MB 0.994 4MB 0.996 8MB 0.670 16MB 0.343 32MB 0.355 hygon2 arch x86_memset_non_temporal_threshold = 8MB size new performance / old performance 128 byte(2x -4x vec case) 1 256 byte(4x - 8x vec case) 0.653 512 byte( > 8x loop case) 0.713 1MB 1 4MB 0.887 8MB 1.312 16MB 0.822 32MB 0.830 hygon3 arch x86_memset_non_temporal_threshold = 8MB size new performance / old performance 128 byte(2x -4x vec case) 1 256 byte(4x - 8x vec case) 1 512 byte( > 8x loop case) 1 1MB 1 4MB 0.990 8MB 0.737 16MB 0.390 32MB 0.401 For hygon arch with this patch, no performance degradation on '2x - 8x branch case' when extra branch jump added. And with this patch, non-temporal stores can improve performance by 20% - 65%. Signed-off-by: Feifei Wang Reviewed-by: Jing Li --- sysdeps/x86/cpu-features.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 034dc28f64..cae26babc7 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1098,6 +1098,12 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht get_extended_indices (cpu_features); update_active (cpu_features); + + /* Use Prefer_Non_Temporal flag to access the non-temporal + memset implementation due to ERMS is disable in Hygon + processors. */ + cpu_features->preferred[index_arch_Prefer_Non_Temporal] + |= (bit_arch_Prefer_Non_Temporal); } else {