From patchwork Wed Aug 16 15:43:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1821956 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=Hcx7vvto; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RQsqJ5gtKz1yNm for ; Thu, 17 Aug 2023 01:43:24 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7B6543857706 for ; Wed, 16 Aug 2023 15:43:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7B6543857706 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1692200601; bh=ukiHjAD+ifOK3Zx0q7cWpxwX5i7MClk/CyxaWJdOWPU=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Hcx7vvtomhPjXUt5iytQfTC/Ch1ro0OLocNHdDfZQvLUpwPaHqc9//NYesVhW1xtl vH1+xNkG/Q2vt8UIC4mHgPE2ux+uAKR/gMaRVn82eUhVpTL4y7cCsbb9ckyhNZ7+Bd DFhpd5q1cHs14G08SJ563EDc6mFPXBnDh0x0ktG8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id C22903858D33 for ; Wed, 16 Aug 2023 15:43:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C22903858D33 Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bdbbede5d4so44671005ad.2 for ; Wed, 16 Aug 2023 08:43:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692200584; x=1692805384; h=content-disposition:mime-version:message-id:subject:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ukiHjAD+ifOK3Zx0q7cWpxwX5i7MClk/CyxaWJdOWPU=; b=hhiolyCbivtQ63RdaS4qAGdX+Fqlgwp/HX6hwkzc827TGBsIJYk9bdd3W+CAGlgsAx kWzZJzsmE6l3kEuH6Pi791n/kOMDbM9uvNye5l9bmiEGzgOv5FTp8MjiB8Alf7vb/mD9 SsXLssdt/ogZ8frXFhqScAMbMjI1D51vPivWGTHV/Viw5GsSazcPYlBxaJ38mgEmPb2H z/W5pwDpXXNgDv+logUb7VVOt/tRXkUUCJ5O/2UAmfL0qxps+W1F+pzsmRXsOFul35AS rQZXGjGATQT4C1NAK7gAJwNlJfxn7irw1qG0EGTHEi5fTob8RvJo9OtUbG8qX2tAF3Ej 53mw== X-Gm-Message-State: AOJu0Yws5vb75oKsfo3pBKQ2T1ebZSsjnc3/TB3RCFrMkvR/P8BOoWZY 3bO5SMrtQEqd0Bo9895KdXYgfEH7mmE= X-Google-Smtp-Source: AGHT+IFz6/vkk5eYw/7EF1b+6FysLpP9FpgU/DqvzlE7xZhl5PejnTEgjK6vqgtsMhIhWoxuZOlBwg== X-Received: by 2002:a17:903:22cc:b0:1b6:797d:33fb with SMTP id y12-20020a17090322cc00b001b6797d33fbmr2984494plg.64.1692200583533; Wed, 16 Aug 2023 08:43:03 -0700 (PDT) Received: from gnu-cfl-3.localdomain ([172.59.161.42]) by smtp.gmail.com with ESMTPSA id z6-20020a633306000000b00565da5abef5sm2216101pgz.10.2023.08.16.08.43.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Aug 2023 08:43:03 -0700 (PDT) Received: by gnu-cfl-3.localdomain (Postfix, from userid 1000) id 8836C7402F8; Wed, 16 Aug 2023 08:43:01 -0700 (PDT) Date: Wed, 16 Aug 2023 08:43:01 -0700 To: GNU C Library Subject: [RFC][PATCH] : Add initial AVX10 support Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-3025.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Hi, AVX10 CPUID enumeration is different. Since AVX10 version is stored as a byte value, but CPU_FEATURE_PRESENT/CPU_FEATURE_ACTIVE return a boolean value and can't return AVX10 version. This patch adds AVX10_VERSION and AVX10_VECTOR_SIZE macros. Any suggestions? Thanks. H.J. --- Add initial support for Intel Advanced Performance Extensions: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html to . 1. Add CPUID_INDEX_24_ECX_0 for CPUID leaf 0x24 to store AVX10 version and vector size info. 2. Add AVX10_VERSION and AVX10_VECTOR_SIZE for AVX10 version and vector size. --- manual/platform.texi | 27 +++++++++++++++++++++++ sysdeps/x86/bits/platform/x86.h | 10 +++++++-- sysdeps/x86/cpu-features.c | 35 ++++++++++++++++++++++++++++++ sysdeps/x86/include/cpu-features.h | 5 ++++- sysdeps/x86/sys/platform/x86.h | 28 ++++++++++++++++++++++++ sysdeps/x86/tst-get-cpu-features.c | 8 +++++++ 6 files changed, 110 insertions(+), 3 deletions(-) diff --git a/manual/platform.texi b/manual/platform.texi index 2a2d557067..1567fdf255 100644 --- a/manual/platform.texi +++ b/manual/platform.texi @@ -222,6 +222,9 @@ Leaf (EAX = 23H). @item @code{AVX} -- The AVX instruction extensions. +@item +@code{AVX10} -- The AVX10 instruction extensions. + @item @code{AVX2} -- The AVX2 instruction extensions. @@ -760,3 +763,27 @@ avx_active (void) return CPU_FEATURE_ACTIVE (AVX); @} @end smallexample + +You could query @code{AVX10} version number with: + +@smallexample +#include + +int +get_avx10_version (void) +@{ + return AVX10_VERSION (); +@} +@end smallexample + +and @code{AVX10} vector size in bits with: + +@smallexample +#include + +int +get_avx10_vector_size (void) +@{ + return AVX10_VECTOR_SIZE (); +@} +@end smallexample diff --git a/sysdeps/x86/bits/platform/x86.h b/sysdeps/x86/bits/platform/x86.h index 88ca071aa7..dbba9c95c3 100644 --- a/sysdeps/x86/bits/platform/x86.h +++ b/sysdeps/x86/bits/platform/x86.h @@ -30,7 +30,8 @@ enum CPUID_INDEX_80000008, CPUID_INDEX_7_ECX_1, CPUID_INDEX_19, - CPUID_INDEX_14_ECX_0 + CPUID_INDEX_14_ECX_0, + CPUID_INDEX_24_ECX_0 }; struct cpuid_feature @@ -312,6 +313,7 @@ enum x86_cpu_AVX_NE_CONVERT = x86_cpu_index_7_ecx_1_edx + 5, x86_cpu_AMX_COMPLEX = x86_cpu_index_7_ecx_1_edx + 8, x86_cpu_PREFETCHI = x86_cpu_index_7_ecx_1_edx + 14, + x86_cpu_AVX10 = x86_cpu_index_7_ecx_1_edx + 19, x86_cpu_APX_F = x86_cpu_index_7_ecx_1_edx + 21, x86_cpu_index_19_ebx @@ -325,5 +327,9 @@ enum = (CPUID_INDEX_14_ECX_0 * 8 * 4 * sizeof (unsigned int) + cpuid_register_index_ebx * 8 * sizeof (unsigned int)), - x86_cpu_PTWRITE = x86_cpu_index_14_ecx_0_ebx + 4 + x86_cpu_PTWRITE = x86_cpu_index_14_ecx_0_ebx + 4, + + x86_cpu_index_24_ecx_0_ebx + = (CPUID_INDEX_24_ECX_0 * 8 * 4 * sizeof (unsigned int) + + cpuid_register_index_ebx * 8 * sizeof (unsigned int)), }; diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index badf088874..8dd8392586 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -120,6 +120,14 @@ update_active (struct cpu_features *cpu_features) { unsigned int xcrlow; unsigned int xcrhigh; + enum + { + xmm = 0, + ymm, + zmm + } + vector_size = xmm; + CPU_FEATURE_SET (cpu_features, AVX10); asm ("xgetbv" : "=a" (xcrlow), "=d" (xcrhigh) : "c" (0)); /* Is YMM and XMM state usable? */ if ((xcrlow & (bit_YMM_state | bit_XMM_state)) @@ -128,6 +136,7 @@ update_active (struct cpu_features *cpu_features) /* Determine if AVX is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX)) { + vector_size = ymm; CPU_FEATURE_SET (cpu_features, AVX); /* The following features depend on AVX being usable. */ /* Determine if AVX2 is usable. */ @@ -166,6 +175,7 @@ update_active (struct cpu_features *cpu_features) | bit_ZMM16_31_state)) == (bit_Opmask_state | bit_ZMM0_15_state | bit_ZMM16_31_state)) { + vector_size = zmm; /* Determine if AVX512F is usable. */ if (CPU_FEATURES_CPU_P (cpu_features, AVX512F)) { @@ -210,6 +220,31 @@ update_active (struct cpu_features *cpu_features) } } + if (CPU_FEATURES_CPU_P (cpu_features, AVX10) + && cpu_features->basic.max_cpuid >= 0x24) + { + __cpuid_count (0x24, 0, + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.eax, + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.ebx, + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.ecx, + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.edx); + switch (vector_size) + { + case zmm: + break; + case ymm: + /* Clear the ZMM bit. */ + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.ebx + &= ~(1 << 18); + break; + case xmm: + /* Clear the YMM and ZMM bits. */ + cpu_features->features[CPUID_INDEX_24_ECX_0].cpuid.ebx + &= ~((1 << 17) | (1 << 18)); + break; + } + } + /* Are XTILECFG and XTILEDATA states usable? */ if ((xcrlow & (bit_XTILECFG_state | bit_XTILEDATA_state)) == (bit_XTILECFG_state | bit_XTILEDATA_state)) diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h index eb30d342a6..c1b1811528 100644 --- a/sysdeps/x86/include/cpu-features.h +++ b/sysdeps/x86/include/cpu-features.h @@ -29,7 +29,7 @@ enum { - CPUID_INDEX_MAX = CPUID_INDEX_14_ECX_0 + 1 + CPUID_INDEX_MAX = CPUID_INDEX_24_ECX_0 + 1 }; enum @@ -319,6 +319,7 @@ enum #define bit_cpu_AVX_NE_CONVERT (1u << 5) #define bit_cpu_AMX_COMPLEX (1u << 8) #define bit_cpu_PREFETCHI (1u << 14) +#define bit_cpu_AVX10 (1u << 19) #define bit_cpu_APX_F (1u << 21) /* CPUID_INDEX_19. */ @@ -563,6 +564,7 @@ enum #define index_cpu_AVX_NE_CONVERT CPUID_INDEX_7_ECX_1 #define index_cpu_AMX_COMPLEX CPUID_INDEX_7_ECX_1 #define index_cpu_PREFETCHI CPUID_INDEX_7_ECX_1 +#define index_cpu_AVX10 CPUID_INDEX_7_ECX_1 #define index_cpu_APX_F CPUID_INDEX_7_ECX_1 /* CPUID_INDEX_19. */ @@ -809,6 +811,7 @@ enum #define reg_AVX_NE_CONVERT edx #define reg_AMX_COMPLEX edx #define reg_PREFETCHI edx +#define reg_AVX10 edx #define reg_APX_F edx /* CPUID_INDEX_19. */ diff --git a/sysdeps/x86/sys/platform/x86.h b/sysdeps/x86/sys/platform/x86.h index 1ea2c5fc0b..11edf4df3e 100644 --- a/sysdeps/x86/sys/platform/x86.h +++ b/sysdeps/x86/sys/platform/x86.h @@ -55,10 +55,38 @@ x86_cpu_active (unsigned int __index) return __ptr->active_array[__reg] & (1 << __bit); } +static __inline__ unsigned int +x86_cpu_get_avx10_info (unsigned int __index) +{ + const struct cpuid_feature *__ptr = __x86_get_cpuid_feature_leaf + (__index / (8 * sizeof (unsigned int) * 4)); + unsigned int __reg + = __index & (8 * sizeof (unsigned int) * 4 - 1); + __reg /= 8 * sizeof (unsigned int); + + return __ptr->cpuid_array[__reg]; +} + +static __inline__ unsigned int +x86_cpu_get_avx10_vector_size (void) +{ + unsigned int ebx = x86_cpu_get_avx10_info (x86_cpu_index_24_ecx_0_ebx); + if ((ebx & (1 << 18)) != 0) + return 512; + if ((ebx & (1 << 17)) != 0) + return 256; + return 128; +} + /* CPU_FEATURE_PRESENT evaluates to true if CPU supports the feature. */ #define CPU_FEATURE_PRESENT(name) x86_cpu_present (x86_cpu_##name) /* CPU_FEATURE_ACTIVE evaluates to true if the feature is active. */ #define CPU_FEATURE_ACTIVE(name) x86_cpu_active (x86_cpu_##name) +/* Get AVX10 version number. */ +#define AVX10_VERSION() \ + (x86_cpu_get_avx10_info (x86_cpu_index_24_ecx_0_ebx) & 0xff) +/* Get AVX10 vector size. */ +#define AVX10_VECTOR_SIZE() x86_cpu_get_avx10_vector_size () __END_DECLS diff --git a/sysdeps/x86/tst-get-cpu-features.c b/sysdeps/x86/tst-get-cpu-features.c index b27fa7324a..e788f37df2 100644 --- a/sysdeps/x86/tst-get-cpu-features.c +++ b/sysdeps/x86/tst-get-cpu-features.c @@ -219,6 +219,7 @@ do_test (void) CHECK_CPU_FEATURE_PRESENT (AVX_NE_CONVERT); CHECK_CPU_FEATURE_PRESENT (AMX_COMPLEX); CHECK_CPU_FEATURE_PRESENT (PREFETCHI); + CHECK_CPU_FEATURE_PRESENT (AVX10); CHECK_CPU_FEATURE_PRESENT (APX_F); CHECK_CPU_FEATURE_PRESENT (AESKLE); CHECK_CPU_FEATURE_PRESENT (WIDE_KL); @@ -391,11 +392,18 @@ do_test (void) CHECK_CPU_FEATURE_ACTIVE (AVX_NE_CONVERT); CHECK_CPU_FEATURE_ACTIVE (AMX_COMPLEX); CHECK_CPU_FEATURE_ACTIVE (PREFETCHI); + CHECK_CPU_FEATURE_ACTIVE (AVX10); CHECK_CPU_FEATURE_ACTIVE (APX_F); CHECK_CPU_FEATURE_ACTIVE (AESKLE); CHECK_CPU_FEATURE_ACTIVE (WIDE_KL); CHECK_CPU_FEATURE_ACTIVE (PTWRITE); + if (CPU_FEATURE_ACTIVE (AVX10)) + { + printf ("AVX10 version: %d\n", AVX10_VERSION ()); + printf ("AVX10 vector size: %d\n", AVX10_VECTOR_SIZE ()); + } + return 0; }