From patchwork Mon Jun 27 21:18:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1649114 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=NVUNqg47; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LX0wQ4xcmz9sGC for ; Tue, 28 Jun 2022 07:19:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 15913385E00A for ; Mon, 27 Jun 2022 21:19:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 15913385E00A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656364756; bh=kHpBhnuu4tIBM++Xtg7riY8Tnyn5o+7W6Zdr8gKiLxE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=NVUNqg47FCJz6O0rCWG0DpFsPcWbsOHCtqUEt1onw3MymsipuXBZ5k1gz7N3LC+1e 7SO86pMQZ7deeaC0AwALojhYWioYDHfy+yknHsugmJw8GxnhmeeRHhu1vOfjoSWjEt Ns1yOLPeyy2oYhhVogkfY6ID1DVEHuVY93IWYWWA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by sourceware.org (Postfix) with ESMTPS id A152B3858D32 for ; Mon, 27 Jun 2022 21:19:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A152B3858D32 Received: by mail-pf1-x429.google.com with SMTP id k9so929888pfg.5 for ; Mon, 27 Jun 2022 14:19:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=kHpBhnuu4tIBM++Xtg7riY8Tnyn5o+7W6Zdr8gKiLxE=; b=JD4DWXCCE7u00nuL0k3wbBt2nWQDWZL6HgW7Bxma93jpX6MT9krojtKNWG1WE0cmd4 YsdYB7T/9RNateREd/AOqzcU0qAqJY9ApYaVHWl6XQ8bP7ycspe39wDtzc191d/7LIxc 6rZkxdTG1WBh3/MbtCLoWHnVYj3/rYOzSKEsdBAN78xf/EVSTbgdfomEfKqJo/5lezh1 lebaKrIR4zPjd6p2bUoR2hImjJCtBRptZmiPbkrjF4eQPN+3YZ3tSTG9EHFxDuWNSMKy IUW0RJ9uwvF5bnbxl9N1FO7xty+7ARfzycMNGPICDIpJmLi8nIuJ8KDHQaGcezaj5pCL iMPg== X-Gm-Message-State: AJIora+c7WQRA4MrIqOUOxLpjcLqHZ7WuYgxYZSCkIc0LiKUA6bRrS6B mpvrs7f7a04TbJ4/24VQMVgmLAAzHvg= X-Google-Smtp-Source: AGRyM1viYr1agnPkMMN4czPvNCByYpuvkvEKodTuVfcAE9shdZAeQ9UBKHuGyNE0nd8h5ZbREj5Vog== X-Received: by 2002:aa7:9911:0:b0:525:1bb3:965a with SMTP id z17-20020aa79911000000b005251bb3965amr17207582pff.79.1656364739572; Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.localdomain ([172.58.37.230]) by smtp.gmail.com with ESMTPSA id m19-20020a170902c45300b00163f1831ddfsm7662766plm.40.2022.06.27.14.18.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.. (localhost [IPv6:::1]) by gnu-tgl-3.localdomain (Postfix) with ESMTP id 454B5C0351; Mon, 27 Jun 2022 14:18:58 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 1/2] x86: Move CPU_FEATURE{S}_{USABLE|ARCH}_P to isa-level.h Date: Mon, 27 Jun 2022 14:18:57 -0700 Message-Id: <20220627211858.2807553-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to where MINIMUM_X86_ISA_LEVEL and XXX_X86_ISA_LEVEL are defined. --- sysdeps/x86/isa-ifunc-macros.h | 27 --------------------------- sysdeps/x86/isa-level.h | 24 ++++++++++++++++++++++++ 2 files changed, 24 insertions(+), 27 deletions(-) diff --git a/sysdeps/x86/isa-ifunc-macros.h b/sysdeps/x86/isa-ifunc-macros.h index d69905689b..f967a1bec6 100644 --- a/sysdeps/x86/isa-ifunc-macros.h +++ b/sysdeps/x86/isa-ifunc-macros.h @@ -56,31 +56,4 @@ # define X86_IFUNC_IMPL_ADD_V1(...) #endif -/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P - macros are wrappers for the the respective - CPU_FEATURE{S}_{USABLE|ARCH}_P runtime checks. They differ in two - ways. - - 1. The USABLE_P version is evaluated to true when the feature - is enabled. - - 2. The ARCH_P version has a third argument `not`. The `not` - argument can either be '!' or empty. If the feature is - enabled above an ISA level, the third argument should be empty - and the expression is evaluated to true when the feature is - enabled. If the feature is disabled above an ISA level, the - third argument should be `!` and the expression is evaluated - to true when the feature is disabled. - */ - -#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ - || CPU_FEATURE_USABLE_P (ptr, name)) - - -#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ - (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ - || not CPU_FEATURES_ARCH_P (ptr, name)) - - #endif diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index 075e7c6ee1..c6156e7f7a 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -87,6 +87,30 @@ when ISA level < 3. */ #define Prefer_No_VZEROUPPER_X86_ISA_LEVEL 3 +/* Both X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P + macros are wrappers for the respective CPU_FEATURE{S}_{USABLE|ARCH}_P + runtime checks. They differ in two ways. + + 1. The USABLE_P version is evaluated to true when the feature + is enabled. + + 2. The ARCH_P version has a third argument `not`. The `not` + argument can either be `!` or empty. If the feature is + enabled above an ISA level, the third argument should be empty + and the expression is evaluated to true when the feature is + enabled. If the feature is disabled above an ISA level, the + third argument should be `!` and the expression is evaluated + to true when the feature is disabled. + */ + +#define X86_ISA_CPU_FEATURE_USABLE_P(ptr, name) \ + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ + || CPU_FEATURE_USABLE_P (ptr, name)) + +#define X86_ISA_CPU_FEATURES_ARCH_P(ptr, name, not) \ + (((name##_X86_ISA_LEVEL) <= MINIMUM_X86_ISA_LEVEL) \ + || not CPU_FEATURES_ARCH_P (ptr, name)) + #define ISA_SHOULD_BUILD(isa_build_level) \ (MINIMUM_X86_ISA_LEVEL <= (isa_build_level) && IS_IN (libc)) \ || defined ISA_DEFAULT_IMPL From patchwork Mon Jun 27 21:18:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1649115 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=fIxlnk01; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LX0xD2F7vz9sGJ for ; Tue, 28 Jun 2022 07:20:00 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D1A063857B99 for ; Mon, 27 Jun 2022 21:19:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D1A063857B99 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1656364797; bh=d+Bg1k/z+vRqCHfCKb3M807wVBgXfJsKReRtXnVUe0I=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=fIxlnk010QXk6luaINCm80t77rl9v4bQDPcrv+JS7DhUF2CZ8WlAK5klJJJf+hlWo Il9/z6iV80uMMuisOuW10TEyescLkpDSm94pk+xwMwYYaM7fOB1l8XbiUGZ7MCtrUw WNzekqgLQjdzn76nhl0eqHXQGEoWUOwGuu4MsSS8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id C114A3858283 for ; Mon, 27 Jun 2022 21:19:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C114A3858283 Received: by mail-pf1-x42b.google.com with SMTP id d17so10126588pfq.9 for ; Mon, 27 Jun 2022 14:19:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d+Bg1k/z+vRqCHfCKb3M807wVBgXfJsKReRtXnVUe0I=; b=ZVi9AUtXXrgjXBL8w21ZPkWt+UsaPt9BLjkQnh1sgN0UTzptlq0IoesMZRSY0nxX/w kChoX5zmT6nPe+v4s8MjNFx2/v/NL4d7B56ZXqcpLCCN/SK9XLmj5mclnGdiCNlRjIkd sF65lQpeex4/9SfhLu9QnIVTvI/kIgA71jvZyDutX7E4PnTSenhRpHvzpkjZk8yRfvVp x2njtPoKAvvo2i/5Dy2shkqYK20yQfugwGFSR/n/Nkm83kZByICAUnwi/x2Zqn9Qwlz1 zu+lB/LxTNR6EWIkQmFuBHMO54HoDR+d+GesFfzMGnYav57hi0p0fCNVFVfJ1RLBI2B6 9qTw== X-Gm-Message-State: AJIora+Fww1lrJAgP89g4fzXCJAeGuyOI1xma60qqaFrGmeCt89T85ct I2Up9/InxOZDx9JwHB7xcMtHzVGHARc= X-Google-Smtp-Source: AGRyM1t+SfjnVrXUQ4m1cMbc7IdWS65YWFad2OtN040HrlWJoWaIB9cvRG4VSSZOJvvyX/qPR3r00Q== X-Received: by 2002:a63:6a85:0:b0:3fa:722a:fbdc with SMTP id f127-20020a636a85000000b003fa722afbdcmr14667452pgc.174.1656364739782; Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.localdomain ([172.58.37.230]) by smtp.gmail.com with ESMTPSA id bi1-20020a056a02024100b00410702f9bbasm1775300pgb.23.2022.06.27.14.18.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 14:18:59 -0700 (PDT) Received: from gnu-tgl-3.. (localhost [IPv6:::1]) by gnu-tgl-3.localdomain (Postfix) with ESMTP id 4646AC0375; Mon, 27 Jun 2022 14:18:58 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 2/2] x86-64: Only define used SSE/AVX/AVX512 run-time resolvers Date: Mon, 27 Jun 2022 14:18:58 -0700 Message-Id: <20220627211858.2807553-2-hjl.tools@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220627211858.2807553-1-hjl.tools@gmail.com> References: <20220627211858.2807553-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "H.J. Lu via Libc-alpha" From: "H.J. Lu" Reply-To: "H.J. Lu" Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers. --- sysdeps/x86/isa-level.h | 2 ++ sysdeps/x86_64/dl-machine.h | 12 ++++--- sysdeps/x86_64/dl-trampoline.S | 59 ++++++++++++++++++---------------- 3 files changed, 42 insertions(+), 31 deletions(-) diff --git a/sysdeps/x86/isa-level.h b/sysdeps/x86/isa-level.h index c6156e7f7a..f293aea906 100644 --- a/sysdeps/x86/isa-level.h +++ b/sysdeps/x86/isa-level.h @@ -68,10 +68,12 @@ compile-time constant.. */ /* ISA level >= 4 guaranteed includes. */ +#define AVX512F_X86_ISA_LEVEL 4 #define AVX512VL_X86_ISA_LEVEL 4 #define AVX512BW_X86_ISA_LEVEL 4 /* ISA level >= 3 guaranteed includes. */ +#define AVX_X86_ISA_LEVEL 3 #define AVX2_X86_ISA_LEVEL 3 #define BMI2_X86_ISA_LEVEL 3 diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h index 34766325ae..005d089501 100644 --- a/sysdeps/x86_64/dl-machine.h +++ b/sysdeps/x86_64/dl-machine.h @@ -28,6 +28,7 @@ #include #include #include +#include /* Return nonzero iff ELF header is compatible with the running host. */ static inline int __attribute__ ((unused)) @@ -86,6 +87,8 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* Identify this shared object. */ *(ElfW(Addr) *) (got + 1) = (ElfW(Addr)) l; + const struct cpu_features* cpu_features = __get_cpu_features (); + /* The got[2] entry contains the address of a function which gets called to get the address of a so far unresolved function and jump to it. The profiling extension of the dynamic linker allows @@ -94,9 +97,9 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], end in this function. */ if (__glibc_unlikely (profile)) { - if (CPU_FEATURE_USABLE (AVX512F)) + if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512F)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx512; - else if (CPU_FEATURE_USABLE (AVX)) + else if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX)) *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_avx; else *(ElfW(Addr) *) (got + 2) = (ElfW(Addr)) &_dl_runtime_profile_sse; @@ -112,9 +115,10 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* This function will get called to fix up the GOT entry indicated by the offset on the stack, and then jump to the resolved address. */ - if (GLRO(dl_x86_cpu_features).xsave_state_size != 0) + if (MINIMUM_X86_ISA_LEVEL >= AVX_X86_ISA_LEVEL + || GLRO(dl_x86_cpu_features).xsave_state_size != 0) *(ElfW(Addr) *) (got + 2) - = (CPU_FEATURE_USABLE (XSAVEC) + = (CPU_FEATURE_USABLE_P (cpu_features, XSAVEC) ? (ElfW(Addr)) &_dl_runtime_resolve_xsavec : (ElfW(Addr)) &_dl_runtime_resolve_xsave); else diff --git a/sysdeps/x86_64/dl-trampoline.S b/sysdeps/x86_64/dl-trampoline.S index 831a654713..f669805ac5 100644 --- a/sysdeps/x86_64/dl-trampoline.S +++ b/sysdeps/x86_64/dl-trampoline.S @@ -20,6 +20,7 @@ #include #include #include +#include #ifndef DL_STACK_ALIGNMENT /* Due to GCC bug: @@ -62,35 +63,39 @@ #undef VMOVA #undef VEC_SIZE -#define VEC_SIZE 32 -#define VMOVA vmovdqa -#define VEC(i) ymm##i -#define _dl_runtime_profile _dl_runtime_profile_avx -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE +#if MINIMUM_X86_ISA_LEVEL <= AVX_X86_ISA_LEVEL +# define VEC_SIZE 32 +# define VMOVA vmovdqa +# define VEC(i) ymm##i +# define _dl_runtime_profile _dl_runtime_profile_avx +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE +#endif +#if MINIMUM_X86_ISA_LEVEL < AVX_X86_ISA_LEVEL /* movaps/movups is 1-byte shorter. */ -#define VEC_SIZE 16 -#define VMOVA movaps -#define VEC(i) xmm##i -#define _dl_runtime_profile _dl_runtime_profile_sse -#undef RESTORE_AVX -#include "dl-trampoline.h" -#undef _dl_runtime_profile -#undef VEC -#undef VMOVA -#undef VEC_SIZE - -#define USE_FXSAVE -#define STATE_SAVE_ALIGNMENT 16 -#define _dl_runtime_resolve _dl_runtime_resolve_fxsave -#include "dl-trampoline.h" -#undef _dl_runtime_resolve -#undef USE_FXSAVE -#undef STATE_SAVE_ALIGNMENT +# define VEC_SIZE 16 +# define VMOVA movaps +# define VEC(i) xmm##i +# define _dl_runtime_profile _dl_runtime_profile_sse +# undef RESTORE_AVX +# include "dl-trampoline.h" +# undef _dl_runtime_profile +# undef VEC +# undef VMOVA +# undef VEC_SIZE + +# define USE_FXSAVE +# define STATE_SAVE_ALIGNMENT 16 +# define _dl_runtime_resolve _dl_runtime_resolve_fxsave +# include "dl-trampoline.h" +# undef _dl_runtime_resolve +# undef USE_FXSAVE +# undef STATE_SAVE_ALIGNMENT +#endif #define USE_XSAVE #define STATE_SAVE_ALIGNMENT 64