From patchwork Fri Oct 14 18:22:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1690136 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=Iow03PFw; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpvrs6RcZz23jn for ; Sat, 15 Oct 2022 05:23:09 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3492C3858025 for ; Fri, 14 Oct 2022 18:23:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3492C3858025 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665771787; bh=p/B1rj/33KjN4l64OM6l5sn8kRoDINAfRJkXt0ad5wM=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Iow03PFwLCFZe/Uj8GMUBl2yUzrWXbV84ec98nfhKaJd+6QRjnOadUYSXDIEPgnfB 7E+9CgSxpVt2XLrEnpKe6jnkHewTO6lrHLZwD7pDD7xxqt6Y8YYlw7qztToSLbAW5y vKgC2W29o1gYyYs3mQWhUKPkLefkxaEU4kVodxfw= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id A8DF43858D38 for ; Fri, 14 Oct 2022 18:22:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A8DF43858D38 Received: by mail-ej1-x629.google.com with SMTP id fy4so12248998ejc.5 for ; Fri, 14 Oct 2022 11:22:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p/B1rj/33KjN4l64OM6l5sn8kRoDINAfRJkXt0ad5wM=; b=hIPQVASrC1VbjkWaY2NTV64B9jk1Ay73bB1ktu/I/Mgyn4Yof6/FeC1C9C9sRzW5TL vXeFNai7tWio1pHwxoPwcl4TCa3+mEFiZBGMEnxeMi6MAtVtxM97QrdRao9go+ufoDIK +dWQ54CpoiJTw1yrEn05fhPlhMsR4tStq2HrUROsAnwNNHdUhcqOpDUUB2QH025JV/My Cl74Dk0QyimT3OJ1v7aqQXijr6q09mqu0Y+NaAxON8iBgP7VH/KP/53rdow9twTKvc+B lf41PhRuwXyiuz9BiArnOBwe9BEuTYX7nodql6UcEhiBtNcJgDHZpNjUQvKJozwyk+y5 gUDw== X-Gm-Message-State: ACrzQf0GxEIxpWUq0t31G/uiec0PDQdKiF5uX79izrmVsbL/+x61tNUQ ndMIXXNH2yrDQapDKuttVR8GFSHxekj8yg== X-Google-Smtp-Source: AMsMyM6fP7QplQu2kK/gy/g78bYSNkY1xX6lacGkS1g+ddwn5ft+UtoCW+rEcphfZh4cIm9iMpXHHA== X-Received: by 2002:a17:907:75f1:b0:78a:f935:647d with SMTP id jz17-20020a17090775f100b0078af935647dmr4432806ejc.587.1665771733166; Fri, 14 Oct 2022 11:22:13 -0700 (PDT) Received: from noahgold-DESK.an.intel.com ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id p18-20020a17090653d200b0073dd1ac2fc8sm1851821ejo.195.2022.10.14.11.22.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 11:22:12 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 1/3] x86: Update evex256/512 vec macros Date: Fri, 14 Oct 2022 13:22:03 -0500 Message-Id: <20221014182205.115792-1-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014164008.1325863-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" 1) Make section only define if there is not a previous definition 2) Add `VEC_lo` definition for proper reg-width but in the ymm/zmm0-15 range. This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/evex256-vecs.h | 7 +++++-- sysdeps/x86_64/multiarch/evex512-vecs.h | 7 +++++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/sysdeps/x86_64/multiarch/evex256-vecs.h b/sysdeps/x86_64/multiarch/evex256-vecs.h index 222ba46dc7..4fccabd4b8 100644 --- a/sysdeps/x86_64/multiarch/evex256-vecs.h +++ b/sysdeps/x86_64/multiarch/evex256-vecs.h @@ -28,8 +28,11 @@ #include "evex-vecs-common.h" #define USE_WITH_EVEX256 1 -#define SECTION(p) p##.evex -#define VEC VEC_ymm +#ifndef SECTION +# define SECTION(p) p##.evex +#endif +#define VEC VEC_ymm +#define VEC_lo VEC_any_ymm #endif diff --git a/sysdeps/x86_64/multiarch/evex512-vecs.h b/sysdeps/x86_64/multiarch/evex512-vecs.h index d1784d5368..fecc2d3925 100644 --- a/sysdeps/x86_64/multiarch/evex512-vecs.h +++ b/sysdeps/x86_64/multiarch/evex512-vecs.h @@ -28,8 +28,11 @@ #include "evex-vecs-common.h" #define USE_WITH_EVEX512 1 -#define SECTION(p) p##.evex512 -#define VEC VEC_zmm +#ifndef SECTION +# define SECTION(p) p##.evex512 +#endif +#define VEC VEC_zmm +#define VEC_lo VEC_any_zmm #endif From patchwork Fri Oct 14 18:22:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1690137 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=jnI7WJcF; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mpvs33dbqz23jn for ; Sat, 15 Oct 2022 05:23:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02B54385783E for ; Fri, 14 Oct 2022 18:23:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 02B54385783E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665771796; bh=jm/HhYQ3JQ0UFOwL/E+F1y4eLck/RWvtnqH/Kkac6eg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jnI7WJcFh2XYrS0FKh+Xuil4wO2AxQ0Jf6/qUdG0d8VPX0/7YseN/IVDEMcbDJbZ1 DsAhsvBbENXqBb8gBVZFCZt0bglmSDCAAN4SVQKUQgu2Rqy0gdsTD7YerOQ4Xsc8IG KhB8fcZi7nJFH5kBEjbx62XO967ifKfReBtmcgbA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id E78D13858C50 for ; Fri, 14 Oct 2022 18:22:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E78D13858C50 Received: by mail-ed1-x536.google.com with SMTP id q19so7934584edd.10 for ; Fri, 14 Oct 2022 11:22:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jm/HhYQ3JQ0UFOwL/E+F1y4eLck/RWvtnqH/Kkac6eg=; b=ULYv8HBthjDi7oHvqHx/fvftlKhBqgw/AyY/MQajxkRR5LV3l+Admo9MWPlWNZjEig EKsuV3WPrApiHvCjWWZnbl4rMSmh/qlZsAU5StGIAeOr96ODdAqHOwSZe9qyh7jJYxW0 T71onMPtfvpK9HlmKBo1Fb0L3S6ctdiF34CuyNsJRizxGf/G0pOVeTPVwrcpC1aTefHU zCPbJMPD0LCcyXwfFDRHlDVG85eapUDF9ZEiC0YE2qk3fTPS6dyn4LCf32rXeHBPODGs bDmKF5W3nf7hyHysg+pPuPBI1jIcYLi53jwU3tald8Mcl4aLX9WD56o9c9TAgUgJvvIz BaXw== X-Gm-Message-State: ACrzQf1+tbdPVIjAEuDxfXwbQ+FtqrXZ7qRJqDdsLp7OLU4W5YSo1UyP eAx6sQi7ZfP6y+f/tHAxqCl6plYx0VAURA== X-Google-Smtp-Source: AMsMyM4YtMyhcW1/3VHb5Wu29874vQ9CaX8ft4jxM48Xb9dbegXz+ElHT5acYHG2M3sCvuLFMSOcIw== X-Received: by 2002:a05:6402:1b08:b0:458:d229:bcac with SMTP id by8-20020a0564021b0800b00458d229bcacmr5197914edb.118.1665771736331; Fri, 14 Oct 2022 11:22:16 -0700 (PDT) Received: from noahgold-DESK.an.intel.com ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id p18-20020a17090653d200b0073dd1ac2fc8sm1851821ejo.195.2022.10.14.11.22.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 11:22:16 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 2/3] x86: Add macros for GPRs / mask insn based on VEC_SIZE Date: Fri, 14 Oct 2022 13:22:04 -0500 Message-Id: <20221014182205.115792-2-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014182205.115792-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014182205.115792-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This is to make it easier to do think like: ``` vpcmpb %VEC(0), %VEC(1), %k0 kmov{d|q} %k0, %{eax|rax} test %{eax|rax} ``` It adds macro s.t any GPR can get the proper width with: `V{upper_case_GPR_name}` and any mask insn can get the proper width with: `{mask_insn_without_postfix}V` This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/reg-macros.h | 351 ++++++++++++++++++ .../multiarch/scripts/gen-reg-macros.py | 112 ++++++ 2 files changed, 463 insertions(+) create mode 100644 sysdeps/x86_64/multiarch/reg-macros.h create mode 100644 sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py diff --git a/sysdeps/x86_64/multiarch/reg-macros.h b/sysdeps/x86_64/multiarch/reg-macros.h new file mode 100644 index 0000000000..2b6bf417d1 --- /dev/null +++ b/sysdeps/x86_64/multiarch/reg-macros.h @@ -0,0 +1,351 @@ +/* This file was generated by: gen-reg-macros.py. + + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _REG_MACROS_H +#define _REG_MACROS_H 1 + +#define rax_8 al +#define eax_8 al +#define ax_8 al +#define al_8 al +#define rax_16 ax +#define eax_16 ax +#define ax_16 ax +#define al_16 ax +#define rax_32 eax +#define eax_32 eax +#define ax_32 eax +#define al_32 eax +#define rax_64 rax +#define eax_64 rax +#define ax_64 rax +#define al_64 rax +#define rbx_8 bl +#define ebx_8 bl +#define bx_8 bl +#define bl_8 bl +#define rbx_16 bx +#define ebx_16 bx +#define bx_16 bx +#define bl_16 bx +#define rbx_32 ebx +#define ebx_32 ebx +#define bx_32 ebx +#define bl_32 ebx +#define rbx_64 rbx +#define ebx_64 rbx +#define bx_64 rbx +#define bl_64 rbx +#define rcx_8 cl +#define ecx_8 cl +#define cx_8 cl +#define cl_8 cl +#define rcx_16 cx +#define ecx_16 cx +#define cx_16 cx +#define cl_16 cx +#define rcx_32 ecx +#define ecx_32 ecx +#define cx_32 ecx +#define cl_32 ecx +#define rcx_64 rcx +#define ecx_64 rcx +#define cx_64 rcx +#define cl_64 rcx +#define rdx_8 dl +#define edx_8 dl +#define dx_8 dl +#define dl_8 dl +#define rdx_16 dx +#define edx_16 dx +#define dx_16 dx +#define dl_16 dx +#define rdx_32 edx +#define edx_32 edx +#define dx_32 edx +#define dl_32 edx +#define rdx_64 rdx +#define edx_64 rdx +#define dx_64 rdx +#define dl_64 rdx +#define rbp_8 bpl +#define ebp_8 bpl +#define bp_8 bpl +#define bpl_8 bpl +#define rbp_16 bp +#define ebp_16 bp +#define bp_16 bp +#define bpl_16 bp +#define rbp_32 ebp +#define ebp_32 ebp +#define bp_32 ebp +#define bpl_32 ebp +#define rbp_64 rbp +#define ebp_64 rbp +#define bp_64 rbp +#define bpl_64 rbp +#define rsp_8 spl +#define esp_8 spl +#define sp_8 spl +#define spl_8 spl +#define rsp_16 sp +#define esp_16 sp +#define sp_16 sp +#define spl_16 sp +#define rsp_32 esp +#define esp_32 esp +#define sp_32 esp +#define spl_32 esp +#define rsp_64 rsp +#define esp_64 rsp +#define sp_64 rsp +#define spl_64 rsp +#define rsi_8 sil +#define esi_8 sil +#define si_8 sil +#define sil_8 sil +#define rsi_16 si +#define esi_16 si +#define si_16 si +#define sil_16 si +#define rsi_32 esi +#define esi_32 esi +#define si_32 esi +#define sil_32 esi +#define rsi_64 rsi +#define esi_64 rsi +#define si_64 rsi +#define sil_64 rsi +#define rdi_8 dil +#define edi_8 dil +#define di_8 dil +#define dil_8 dil +#define rdi_16 di +#define edi_16 di +#define di_16 di +#define dil_16 di +#define rdi_32 edi +#define edi_32 edi +#define di_32 edi +#define dil_32 edi +#define rdi_64 rdi +#define edi_64 rdi +#define di_64 rdi +#define dil_64 rdi +#define r8_8 r8b +#define r8d_8 r8b +#define r8w_8 r8b +#define r8b_8 r8b +#define r8_16 r8w +#define r8d_16 r8w +#define r8w_16 r8w +#define r8b_16 r8w +#define r8_32 r8d +#define r8d_32 r8d +#define r8w_32 r8d +#define r8b_32 r8d +#define r8_64 r8 +#define r8d_64 r8 +#define r8w_64 r8 +#define r8b_64 r8 +#define r9_8 r9b +#define r9d_8 r9b +#define r9w_8 r9b +#define r9b_8 r9b +#define r9_16 r9w +#define r9d_16 r9w +#define r9w_16 r9w +#define r9b_16 r9w +#define r9_32 r9d +#define r9d_32 r9d +#define r9w_32 r9d +#define r9b_32 r9d +#define r9_64 r9 +#define r9d_64 r9 +#define r9w_64 r9 +#define r9b_64 r9 +#define r10_8 r10b +#define r10d_8 r10b +#define r10w_8 r10b +#define r10b_8 r10b +#define r10_16 r10w +#define r10d_16 r10w +#define r10w_16 r10w +#define r10b_16 r10w +#define r10_32 r10d +#define r10d_32 r10d +#define r10w_32 r10d +#define r10b_32 r10d +#define r10_64 r10 +#define r10d_64 r10 +#define r10w_64 r10 +#define r10b_64 r10 +#define r11_8 r11b +#define r11d_8 r11b +#define r11w_8 r11b +#define r11b_8 r11b +#define r11_16 r11w +#define r11d_16 r11w +#define r11w_16 r11w +#define r11b_16 r11w +#define r11_32 r11d +#define r11d_32 r11d +#define r11w_32 r11d +#define r11b_32 r11d +#define r11_64 r11 +#define r11d_64 r11 +#define r11w_64 r11 +#define r11b_64 r11 +#define r12_8 r12b +#define r12d_8 r12b +#define r12w_8 r12b +#define r12b_8 r12b +#define r12_16 r12w +#define r12d_16 r12w +#define r12w_16 r12w +#define r12b_16 r12w +#define r12_32 r12d +#define r12d_32 r12d +#define r12w_32 r12d +#define r12b_32 r12d +#define r12_64 r12 +#define r12d_64 r12 +#define r12w_64 r12 +#define r12b_64 r12 +#define r13_8 r13b +#define r13d_8 r13b +#define r13w_8 r13b +#define r13b_8 r13b +#define r13_16 r13w +#define r13d_16 r13w +#define r13w_16 r13w +#define r13b_16 r13w +#define r13_32 r13d +#define r13d_32 r13d +#define r13w_32 r13d +#define r13b_32 r13d +#define r13_64 r13 +#define r13d_64 r13 +#define r13w_64 r13 +#define r13b_64 r13 +#define r14_8 r14b +#define r14d_8 r14b +#define r14w_8 r14b +#define r14b_8 r14b +#define r14_16 r14w +#define r14d_16 r14w +#define r14w_16 r14w +#define r14b_16 r14w +#define r14_32 r14d +#define r14d_32 r14d +#define r14w_32 r14d +#define r14b_32 r14d +#define r14_64 r14 +#define r14d_64 r14 +#define r14w_64 r14 +#define r14b_64 r14 +#define r15_8 r15b +#define r15d_8 r15b +#define r15w_8 r15b +#define r15b_8 r15b +#define r15_16 r15w +#define r15d_16 r15w +#define r15w_16 r15w +#define r15b_16 r15w +#define r15_32 r15d +#define r15d_32 r15d +#define r15w_32 r15d +#define r15b_32 r15d +#define r15_64 r15 +#define r15d_64 r15 +#define r15w_64 r15 +#define r15b_64 r15 + +#define VRAX VGPR(rax) +#define VRBX VGPR(rbx) +#define VRCX VGPR(rcx) +#define VRDX VGPR(rdx) +#define VRBP VGPR(rbp) +#define VRSP VGPR(rsp) +#define VRSI VGPR(rsi) +#define VRDI VGPR(rdi) +#define VR8 VGPR(r8) +#define VR9 VGPR(r9) +#define VR10 VGPR(r10) +#define VR11 VGPR(r11) +#define VR12 VGPR(r12) +#define VR13 VGPR(r13) +#define VR14 VGPR(r14) +#define VR15 VGPR(r15) + +#define kmov_8 kmovb +#define kmov_16 kmovw +#define kmov_32 kmovd +#define kmov_64 kmovq +#define kortest_8 kortestb +#define kortest_16 kortestw +#define kortest_32 kortestd +#define kortest_64 kortestq +#define kor_8 korb +#define kor_16 korw +#define kor_32 kord +#define kor_64 korq +#define ktest_8 ktestb +#define ktest_16 ktestw +#define ktest_32 ktestd +#define ktest_64 ktestq +#define kand_8 kandb +#define kand_16 kandw +#define kand_32 kandd +#define kand_64 kandq +#define kxor_8 kxorb +#define kxor_16 kxorw +#define kxor_32 kxord +#define kxor_64 kxorq +#define knot_8 knotb +#define knot_16 knotw +#define knot_32 knotd +#define knot_64 knotq +#define kxnor_8 kxnorb +#define kxnor_16 kxnorw +#define kxnor_32 kxnord +#define kxnor_64 kxnorq +#define kunpack_8 kunpackbw +#define kunpack_16 kunpackwd +#define kunpack_32 kunpackdq + +#define KMOV VKINSN_SZ(kmov, REG_WIDTH) +#define KORTEST VKINSN_SZ(kortest, REG_WIDTH) +#define KOR VKINSN_SZ(kor, REG_WIDTH) +#define KTEST VKINSN_SZ(ktest, REG_WIDTH) +#define KAND VKINSN_SZ(kand, REG_WIDTH) +#define KXOR VKINSN_SZ(kxor, REG_WIDTH) +#define KNOT VKINSN_SZ(knot, REG_WIDTH) +#define KXNOR VKINSN_SZ(kxnor, REG_WIDTH) +#define KUNPACK VKINSN_SZ(kunpack, REG_WIDTH) + +#ifndef REG_WIDTH +# define REG_WIDTH VEC_SIZE +#endif +#define PRIM_VGPR_SZ(reg_name, reg_size) reg_name##_##reg_size +#define VGPR_SZ(reg_name, reg_size) PRIM_VGPR_SZ(reg_name, reg_size) +#define VGPR(reg_name) VGPR_SZ(reg_name, REG_WIDTH) +#define VKINSN_SZ(insn, reg_size) PRIM_VGPR_SZ(insn, reg_size) + +#endif diff --git a/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py new file mode 100644 index 0000000000..cf65c9fb8d --- /dev/null +++ b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py @@ -0,0 +1,112 @@ +#!/usr/bin/python3 +# Copyright (C) 2022 Free Software Foundation, Inc. +# This file is part of the GNU C Library. +# +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. +# +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. +# +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# . +"""Generate macros for getting GPR name of a certain size + +Inputs: None +Output: Prints header fill to stdout + +API: + VGPR(reg_name) + - Get register name VEC_SIZE component of `reg_name` + VGPR_SZ(reg_name, reg_size) + - Get register name `reg_size` component of `reg_name` +""" + +import sys +from datetime import datetime + +registers = [["rax", "eax", "ax", "al"], ["rbx", "ebx", "bx", "bl"], + ["rcx", "ecx", "cx", "cl"], ["rdx", "edx", "dx", "dl"], + ["rbp", "ebp", "bp", "bpl"], ["rsp", "esp", "sp", "spl"], + ["rsi", "esi", "si", "sil"], ["rdi", "edi", "di", "dil"], + ["r8", "r8d", "r8w", "r8b"], ["r9", "r9d", "r9w", "r9b"], + ["r10", "r10d", "r10w", "r10b"], ["r11", "r11d", "r11w", "r11b"], + ["r12", "r12d", "r12w", "r12b"], ["r13", "r13d", "r13w", "r13b"], + ["r14", "r14d", "r14w", "r14b"], ["r15", "r15d", "r15w", "r15b"]] + +mask_insns = [ + "kmov", + "kortest", + "kor", + "ktest", + "kand", + "kxor", + "knot", + "kxnor", +] +mask_insns_ext = ["b", "w", "d", "q"] + +cr = """ + Copyright (C) {} Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ +""" + +print("/* This file was generated by: {}.".format(sys.argv[0])) +print(cr.format(datetime.today().year)) + +print("#ifndef _REG_MACROS_H") +print("#define _REG_MACROS_H\t1\n") +for reg in registers: + for i in range(0, 4): + for j in range(0, 4): + print("#define {}_{}\t{}".format(reg[j], 8 << i, reg[3 - i])) + +print("") +for reg in registers: + print("#define V{}\tVGPR({})".format(reg[0].upper(), reg[0])) + +print("") +for mask_insn in mask_insns: + for i in range(0, 4): + print("#define {}_{}\t{}{}".format(mask_insn, 8 << i, mask_insn, + mask_insns_ext[i])) +for i in range(0, 3): + print("#define kunpack_{}\tkunpack{}{}".format(8 << i, mask_insns_ext[i], + mask_insns_ext[i + 1])) +mask_insns.append("kunpack") + +print("") + +for mask_insn in mask_insns: + print("#define {} \tVKINSN_SZ({}, REG_WIDTH)".format( + mask_insn.upper(), mask_insn)) +print("") + +print("#ifndef REG_WIDTH") +print("# define REG_WIDTH VEC_SIZE") +print("#endif") +print("#define PRIM_VGPR_SZ(reg_name, reg_size)\treg_name##_##reg_size") +print("#define VGPR_SZ(reg_name, reg_size)\tPRIM_VGPR_SZ(reg_name, reg_size)") +print("#define VGPR(reg_name)\tVGPR_SZ(reg_name, REG_WIDTH)") +print("#define VKINSN_SZ(insn, reg_size)\tPRIM_VGPR_SZ(insn, reg_size)") + +print("\n#endif") From patchwork Fri Oct 14 18:22:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1690138 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=rLmuXfdy; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4MpvsG44Wcz23jn for ; Sat, 15 Oct 2022 05:23:30 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8F7BF3857364 for ; Fri, 14 Oct 2022 18:23:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8F7BF3857364 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1665771808; bh=zs4PrI9znQDbJF2gqAWcYEzOM1tz62MC5CYPINpetNQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=rLmuXfdyUkNkq8MV93vsaA8uP5uqo5z1XjHAnTTIT3fmV1q4LUiKBEU3/heWcDioc IpOEjSDSy3GPWHJpEcOKLiUQ5uv1Odb1PXuPxzrSv58DYGgrl2P2JvXUHwk4CbIZ0I vSfMRPuLBFGeP6dhpMowqjsnlP6xuP10++RRr6iY= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id B46243858C52 for ; Fri, 14 Oct 2022 18:22:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B46243858C52 Received: by mail-ej1-x635.google.com with SMTP id w18so12203387ejq.11 for ; Fri, 14 Oct 2022 11:22:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zs4PrI9znQDbJF2gqAWcYEzOM1tz62MC5CYPINpetNQ=; b=3aYH0/QGsLVzueQP/nQzf0ce391pGd43UIG+7RZVK38UG/Wb9wLo6rCYjkP0flh1D8 8rm44fIwpA92moH7EASmua/47PHd9clkWvGo9pTFHScHKKoWW7nwOO2pChvsywyuNPll 44oQe5qdZBXXZVkE8vQ2TRCjc3gPacCY4v4OZLsBSnXcTm8uyYQE9+JiO2zivBbRbiax 0obb73j3Y6/AIxvPiwKIH+fs/DghKHwcqD9HX6U3mEkQozTArgmHlv1/XVLWhU1tYySw AY/oK5s3BsucNdwzoyUtTkWAPjkI21McYdJHQFKBTE9Zi5L70i53QUPq+xoIQrrhmNcz 6EZw== X-Gm-Message-State: ACrzQf1dtdtv/6VRW35nGxDnPOmG1tlOPI9rkFD3yDhZ0SFiNjKuKhRt goDJHpqJVS/Xe4R0T6+eRRNHJLq1dSennQ== X-Google-Smtp-Source: AMsMyM7AVWR4gITzK06C/sFXXeDHYAia4GC/wROx2mK5XDp17i6phcgLquVA7n5OP+Js0RbJfEFjJQ== X-Received: by 2002:a17:907:761b:b0:78d:4990:3f3e with SMTP id jx27-20020a170907761b00b0078d49903f3emr4471923ejc.228.1665771739195; Fri, 14 Oct 2022 11:22:19 -0700 (PDT) Received: from noahgold-DESK.an.intel.com ([192.55.60.38]) by smtp.gmail.com with ESMTPSA id p18-20020a17090653d200b0073dd1ac2fc8sm1851821ejo.195.2022.10.14.11.22.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 11:22:18 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 3/3] x86: Update strlen-evex-base to use new reg/vec macros. Date: Fri, 14 Oct 2022 13:22:05 -0500 Message-Id: <20221014182205.115792-3-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221014182205.115792-1-goldstein.w.n@gmail.com> References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014182205.115792-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" To avoid duplicate the VMM / GPR / mask insn macros in all incoming evex512 files use the macros defined in 'reg-macros.h' and '{vec}-macros.h' This commit does not change libc.so Tested build on x86-64 --- sysdeps/x86_64/multiarch/strlen-evex-base.S | 116 +++++++------------- sysdeps/x86_64/multiarch/strlen-evex512.S | 4 +- 2 files changed, 44 insertions(+), 76 deletions(-) diff --git a/sysdeps/x86_64/multiarch/strlen-evex-base.S b/sysdeps/x86_64/multiarch/strlen-evex-base.S index 418e9f8411..8af9791e92 100644 --- a/sysdeps/x86_64/multiarch/strlen-evex-base.S +++ b/sysdeps/x86_64/multiarch/strlen-evex-base.S @@ -36,42 +36,10 @@ # define CHAR_SIZE 1 # endif -# define XMM0 xmm16 # define PAGE_SIZE 4096 # define CHAR_PER_VEC (VEC_SIZE / CHAR_SIZE) -# if VEC_SIZE == 64 -# define KMOV kmovq -# define KORTEST kortestq -# define RAX rax -# define RCX rcx -# define RDX rdx -# define SHR shrq -# define TEXTSUFFIX evex512 -# define VMM0 zmm16 -# define VMM1 zmm17 -# define VMM2 zmm18 -# define VMM3 zmm19 -# define VMM4 zmm20 -# define VMOVA vmovdqa64 -# elif VEC_SIZE == 32 -/* Currently Unused. */ -# define KMOV kmovd -# define KORTEST kortestd -# define RAX eax -# define RCX ecx -# define RDX edx -# define SHR shrl -# define TEXTSUFFIX evex256 -# define VMM0 ymm16 -# define VMM1 ymm17 -# define VMM2 ymm18 -# define VMM3 ymm19 -# define VMM4 ymm20 -# define VMOVA vmovdqa32 -# endif - - .section .text.TEXTSUFFIX, "ax", @progbits + .section SECTION(.text),"ax",@progbits /* Aligning entry point to 64 byte, provides better performance for one vector length string. */ ENTRY_P2ALIGN (STRLEN, 6) @@ -86,18 +54,18 @@ ENTRY_P2ALIGN (STRLEN, 6) # endif movl %edi, %eax - vpxorq %XMM0, %XMM0, %XMM0 + vpxorq %VEC_xmm(0), %VEC_xmm(0), %VEC_xmm(0) andl $(PAGE_SIZE - 1), %eax cmpl $(PAGE_SIZE - VEC_SIZE), %eax ja L(page_cross) /* Compare [w]char for null, mask bit will be set for match. */ - VPCMP $0, (%rdi), %VMM0, %k0 - KMOV %k0, %RAX - test %RAX, %RAX + VPCMP $0, (%rdi), %VEC(0), %k0 + KMOV %k0, %VRAX + test %VRAX, %VRAX jz L(align_more) - bsf %RAX, %RAX + bsf %VRAX, %VRAX # ifdef USE_AS_STRNLEN cmpq %rsi, %rax cmovnb %rsi, %rax @@ -120,7 +88,7 @@ L(align_more): movq %rax, %rdx subq %rdi, %rdx # ifdef USE_AS_WCSLEN - SHR $2, %RDX + shr $2, %VRDX # endif /* At this point rdx contains [w]chars already compared. */ subq %rsi, %rdx @@ -131,9 +99,9 @@ L(align_more): # endif /* Loop unroll 4 times for 4 vector loop. */ - VPCMP $0, (%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (%rax), %VEC(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x1) # ifdef USE_AS_STRNLEN @@ -141,9 +109,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, VEC_SIZE(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, VEC_SIZE(%rax), %VEC(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x2) # ifdef USE_AS_STRNLEN @@ -151,9 +119,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, (VEC_SIZE * 2)(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (VEC_SIZE * 2)(%rax), %VEC(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x3) # ifdef USE_AS_STRNLEN @@ -161,9 +129,9 @@ L(align_more): jbe L(ret_max) # endif - VPCMP $0, (VEC_SIZE * 3)(%rax), %VMM0, %k0 - KMOV %k0, %RCX - test %RCX, %RCX + VPCMP $0, (VEC_SIZE * 3)(%rax), %VEC(0), %k0 + KMOV %k0, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x4) # ifdef USE_AS_STRNLEN @@ -179,7 +147,7 @@ L(align_more): # ifdef USE_AS_STRNLEN subq %rax, %rcx # ifdef USE_AS_WCSLEN - SHR $2, %RCX + shr $2, %VRCX # endif /* rcx contains number of [w]char will be recompared due to alignment fixes. rdx must be incremented by rcx to offset @@ -199,42 +167,42 @@ L(loop_entry): # endif /* VPMINU and VPCMP combination provide better performance as compared to alternative combinations. */ - VMOVA (VEC_SIZE * 4)(%rax), %VMM1 - VPMINU (VEC_SIZE * 5)(%rax), %VMM1, %VMM2 - VMOVA (VEC_SIZE * 6)(%rax), %VMM3 - VPMINU (VEC_SIZE * 7)(%rax), %VMM3, %VMM4 + VMOVA (VEC_SIZE * 4)(%rax), %VEC(1) + VPMINU (VEC_SIZE * 5)(%rax), %VEC(1), %VEC(2) + VMOVA (VEC_SIZE * 6)(%rax), %VEC(3) + VPMINU (VEC_SIZE * 7)(%rax), %VEC(3), %VEC(4) - VPTESTN %VMM2, %VMM2, %k0 - VPTESTN %VMM4, %VMM4, %k1 + VPTESTN %VEC(2), %VEC(2), %k0 + VPTESTN %VEC(4), %VEC(4), %k1 subq $-(VEC_SIZE * 4), %rax KORTEST %k0, %k1 jz L(loop) - VPTESTN %VMM1, %VMM1, %k2 - KMOV %k2, %RCX - test %RCX, %RCX + VPTESTN %VEC(1), %VEC(1), %k2 + KMOV %k2, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x1) - KMOV %k0, %RCX + KMOV %k0, %VRCX /* At this point, if k0 is non zero, null char must be in the second vector. */ - test %RCX, %RCX + test %VRCX, %VRCX jnz L(ret_vec_x2) - VPTESTN %VMM3, %VMM3, %k3 - KMOV %k3, %RCX - test %RCX, %RCX + VPTESTN %VEC(3), %VEC(3), %k3 + KMOV %k3, %VRCX + test %VRCX, %VRCX jnz L(ret_vec_x3) /* At this point null [w]char must be in the fourth vector so no need to check. */ - KMOV %k1, %RCX + KMOV %k1, %VRCX /* Fourth, third, second vector terminating are pretty much same, implemented this way to avoid branching and reuse code from pre loop exit condition. */ L(ret_vec_x4): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN subq $-(VEC_SIZE * 3), %rax @@ -250,7 +218,7 @@ L(ret_vec_x4): ret L(ret_vec_x3): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN subq $-(VEC_SIZE * 2), %rax @@ -268,7 +236,7 @@ L(ret_vec_x3): L(ret_vec_x2): subq $-VEC_SIZE, %rax L(ret_vec_x1): - bsf %RCX, %RCX + bsf %VRCX, %VRCX subq %rdi, %rax # ifdef USE_AS_WCSLEN shrq $2, %rax @@ -289,13 +257,13 @@ L(page_cross): /* ecx contains number of w[char] to be skipped as a result of address alignment. */ xorq %rdi, %rax - VPCMP $0, (PAGE_SIZE - VEC_SIZE)(%rax), %VMM0, %k0 - KMOV %k0, %RAX + VPCMP $0, (PAGE_SIZE - VEC_SIZE)(%rax), %VEC(0), %k0 + KMOV %k0, %VRAX /* Ignore number of character for alignment adjustment. */ - SHR %cl, %RAX + shr %cl, %VRAX jz L(align_more) - bsf %RAX, %RAX + bsf %VRAX, %VRAX # ifdef USE_AS_STRNLEN cmpq %rsi, %rax cmovnb %rsi, %rax diff --git a/sysdeps/x86_64/multiarch/strlen-evex512.S b/sysdeps/x86_64/multiarch/strlen-evex512.S index 116f8981c8..dfd0a7821b 100644 --- a/sysdeps/x86_64/multiarch/strlen-evex512.S +++ b/sysdeps/x86_64/multiarch/strlen-evex512.S @@ -2,6 +2,6 @@ # define STRLEN __strlen_evex512 #endif -#define VEC_SIZE 64 - +#include "evex512-vecs.h" +#include "reg-macros.h" #include "strlen-evex-base.S"