From patchwork Fri Jun 28 05:27:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1953703 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=fWUPwdxt; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9PCB6Qwwz20Zy for ; Fri, 28 Jun 2024 15:29:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B6431382EF00 for ; Fri, 28 Jun 2024 05:29:48 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by sourceware.org (Postfix) with ESMTPS id C6D363830B5C for ; Fri, 28 Jun 2024 05:29:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C6D363830B5C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C6D363830B5C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552565; cv=none; b=mm33jwXMF1zyhIjxuTg8WxgQQaW7E1uoXWBkU/M+zgu2hDLn01yF+QxcYQg5mEYl2fZLDwzk2fXIylSwaucbOmsZOww+AqIsEc3CIFbd3pk4Eo3clbrBKKHCo+R4gXVcb5DKrdyExDqfXIWe6kf1WK01/xnPvstGcBlKSMYw1Ew= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552565; c=relaxed/simple; bh=WZJxUFhGDZsby0U9YncGHrbg8cAzjGhabID4O+3q/PU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=TE8k04KDJb2OqORxccygAqvXQJTQ6fI0t8MD8KM5zJUHXMHuPu4SaKZA5191b5MUFULA98xOMJOr0jWIXn4sbmLknjc2NntbewcF7Ze2juRP8fs+yp/UFoiVe0ZQ+yKUGq8SLUPvq8/Liy/IQ5fTzASO1kzhoIOUDY6vRTCsyhU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719552564; x=1751088564; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WZJxUFhGDZsby0U9YncGHrbg8cAzjGhabID4O+3q/PU=; b=fWUPwdxtTwqjYTr9VNXLRgzZJGiK848httwyqSOwfPGKbHJKIpNGS42A 9kNLxYu5hUpHF/yHH69BanttOkpKD4ygQmodLa3is2/pjX35dK5Z6qAp3 F9WN/k7TOYT1v1gf55sFWYyOqbYMCPqKz8Pc7z4k25LzqC2mmz/Y/VPN7 +5UmMlYErKSqi1p96OkZy7SqLZkimlMfIADHFpUIqVmyeunVOG20hiMW7 w1ZS9DxzMe3yAj6dLL9+Oems/FMi8VH7DwQ0Dd7vO7LEpNgUWtN4aUxDE 9nDTIrFDQn14YjXUziR654c3sbpmtr87oXbfYa4E0u0Nodej9hr07R7uP w==; X-CSE-ConnectionGUID: EZd89HT5RseBe8OV5ieT1Q== X-CSE-MsgGUID: vtZdP/sSTIiEBeRhoZQPZQ== X-IronPort-AV: E=McAfee;i="6700,10204,11116"; a="27341263" X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="27341263" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2024 22:29:22 -0700 X-CSE-ConnectionGUID: 9oc05r5rQ++cYkkfaipn7w== X-CSE-MsgGUID: YqNiNc5URlmy8ngDH9Orcg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="49218815" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa004.fm.intel.com with ESMTP; 27 Jun 2024 22:29:20 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 17FEA100568F; Fri, 28 Jun 2024 13:29:20 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com Subject: [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables. Date: Fri, 28 Jun 2024 13:27:17 +0800 Message-Id: <20240628052719.330451-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20240628052719.330451-1-hongtao.liu@intel.com> References: <20240628052719.330451-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The testcases are supposed to scan for vpopcnt{b,w,d,q} operations with k mask, but mask is defined as uninitialized local variable which will be set as 0 at rtl expand phase. And it's further simplified off by late_combine which caused scan assembly failure. Move the definition of mask outside to make the testcases more stable. gcc/testsuite/ChangeLog: PR target/115610 * gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as extern instead of uninitialized local variables. * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto. --- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c | 3 +-- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c | 4 ++-- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c | 2 +- gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c | 4 ++-- gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 5 +++-- gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 2 +- 6 files changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c index 44b82c0519d..66d24107c26 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c @@ -7,10 +7,9 @@ #include extern __m512i z, z1; - +extern __mmask16 msk; int foo () { - __mmask16 msk; __m512i c = _mm512_popcnt_epi8 (z); asm volatile ("" : "+v" (c)); c = _mm512_mask_popcnt_epi8 (z1, msk, z); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c index 8c2dfaba9c6..8ab05653f7c 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c @@ -11,11 +11,11 @@ extern __m256i y, y_1; extern __m128i x, x_1; +extern __mmask32 msk32; +extern __mmask16 msk16; int foo () { - __mmask32 msk32; - __mmask16 msk16; __m256i c256 = _mm256_popcnt_epi8 (y); asm volatile ("" : "+v" (c256)); c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c index 2ef8589f6c1..c741bf48a51 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c @@ -7,10 +7,10 @@ #include extern __m512i z, z1; +extern __mmask16 msk; int foo () { - __mmask16 msk; __m512i c = _mm512_popcnt_epi16 (z); asm volatile ("" : "+v" (c)); c = _mm512_mask_popcnt_epi16 (z1, msk, z); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c index c976461b12e..79bb3c31e85 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c @@ -11,11 +11,11 @@ extern __m256i y, y_1; extern __m128i x, x_1; +extern __mmask16 msk16; +extern __mmask8 msk8; int foo () { - __mmask16 msk16; - __mmask8 msk8; __m256i c256 = _mm256_popcnt_epi16 (y); asm volatile ("" : "+v" (c256)); c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y); diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c index b4d82f97032..776a4753d8e 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c @@ -15,11 +15,12 @@ extern __m128i x, x_1; extern __m256i y, y_1; extern __m512i z, z_1; +extern __mmask16 msk; +extern __mmask8 msk8; + int foo () { - __mmask16 msk; - __mmask8 msk8; __m128i a = _mm_popcnt_epi32 (x); asm volatile ("" : "+v" (a)); a = _mm_mask_popcnt_epi32 (x_1, msk8, x); diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c index e87d6c999b6..c6314ac5deb 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c @@ -15,10 +15,10 @@ extern __m128i x, x_1; extern __m256i y, y_1; extern __m512i z, z_1; +extern __mmask8 msk; int foo () { - __mmask8 msk; __m128i a = _mm_popcnt_epi64 (x); asm volatile ("" : "+v" (a)); a = _mm_mask_popcnt_epi64 (x_1, msk, x); From patchwork Fri Jun 28 05:27:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1953706 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=ef7T05vo; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9PD73JmWz1yhT for ; Fri, 28 Jun 2024 15:30:47 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6D1E3882640 for ; Fri, 28 Jun 2024 05:30:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by sourceware.org (Postfix) with ESMTPS id D4501382FADF for ; Fri, 28 Jun 2024 05:29:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D4501382FADF Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D4501382FADF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552567; cv=none; b=WlHc7Cvt5ni/9XrBhyNc825qalCr55fayWvdiqeix0cDgBL616/V36vgAH38c5PpwD0s36iVuV3EEXMlPQtrTWvWhXWGNwCpcZv3dK1vx1eE1mr1luobxsoPa4mIlJybIQn1JoySFuITTWTC5nXxyOu6UKSpVdYrZVKHgBmEFbE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552567; c=relaxed/simple; bh=JcXGuWpEk3oNwpJGR+J2z+bN87kH0r1ugr1efCZ1Tk8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=tEt9/voE03vCRtCTnYZiQWE/nFMsMGpWaFb7REDryhzpF0E1d+dWlWtjs2wvgbdD4Y+vWK97jT6c89Mlklxo9BpkzmUlnB1UDIN4CsiqzvEdikSJY2F4W4juwJxZ5iVO/29BbaL4U/M9F0Kt9eRkzXK1jdFiDK54FLwqLUF8vGo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719552566; x=1751088566; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JcXGuWpEk3oNwpJGR+J2z+bN87kH0r1ugr1efCZ1Tk8=; b=ef7T05vouXmR4BnS4XGETpI3Su0MO3ZwPnimzl87muBNCuZYUNzYTOJ/ zMVpHQC0QN0q+h6+K9GJY1KwAPf9RhUaI+9yll48EYYFthhx1LYwYtUar AkL9Lw/2XQNXeIJBKG2uzQHtXBhS5VDejyPLn+9s2q13+ylw++fEg3TA8 5K2dwEeFHm8Y/xF78uQfc8eZ8HaP2Jgv3DE2ZRvMwqLVhp6bjdx320Egk q1ZBPNkYnOv5dc/O5r8jVWkIrK5Is9wN5+dm9aCUH0NF9eLQ1ouduCwVH CGKSzde+eECf1dbYsiFm0cNQDNuae14YHX2zF6PYYW2++3vf9gEPYJ/E+ Q==; X-CSE-ConnectionGUID: 9E9oJKQXTuiM1sICESPkuQ== X-CSE-MsgGUID: BuiGS0d2RXa8nufYqGeLfg== X-IronPort-AV: E=McAfee;i="6700,10204,11116"; a="27341265" X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="27341265" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2024 22:29:23 -0700 X-CSE-ConnectionGUID: DRE5HADKQ5CJ7vyeGbXoxw== X-CSE-MsgGUID: WkBhdIJbSCGWkTWpPSvjyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="49218814" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa004.fm.intel.com with ESMTP; 27 Jun 2024 22:29:20 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 1A0211005690; Fri, 28 Jun 2024 13:29:20 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com Subject: [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative. Date: Fri, 28 Jun 2024 13:27:18 +0800 Message-Id: <20240628052719.330451-3-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20240628052719.330451-1-hongtao.liu@intel.com> References: <20240628052719.330451-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org late_combine will combine lshift + zero into *lshifrtsi3_1_zext which cause extra mov between gpr and kmask, add ?k to the pattern. gcc/ChangeLog: PR target/115610 * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k, enable it only for lshiftrt and under avx512bw. * config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and add corresponding define_split after it. --- gcc/config/i386/i386.md | 19 +++++++++++++------ gcc/config/i386/sse.md | 28 ++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index fd48e764469..57a10c1af48 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16836,10 +16836,10 @@ (define_insn "*bmi2_si3_1_zext" (set_attr "mode" "SI")]) (define_insn "*si3_1_zext" - [(set (match_operand:DI 0 "register_operand" "=r,r,r") + [(set (match_operand:DI 0 "register_operand" "=r,r,r,?k") (zero_extend:DI - (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm") - (match_operand:QI 2 "nonmemory_operand" "cI,r,cI")))) + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm,k") + (match_operand:QI 2 "nonmemory_operand" "cI,r,cI,I")))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands, TARGET_APX_NDD)" @@ -16850,6 +16850,8 @@ (define_insn "*si3_1_zext" case TYPE_ISHIFTX: return "#"; + case TYPE_MSKLOG: + return "#"; default: if (operands[2] == const1_rtx && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) @@ -16860,8 +16862,8 @@ (define_insn "*si3_1_zext" : "{l}\t{%2, %k0|%k0, %2}"; } } - [(set_attr "isa" "*,bmi2,apx_ndd") - (set_attr "type" "ishift,ishiftx,ishift") + [(set_attr "isa" "*,bmi2,apx_ndd,avx512bw") + (set_attr "type" "ishift,ishiftx,ishift,msklog") (set (attr "length_immediate") (if_then_else (and (match_operand 2 "const1_operand") @@ -16869,7 +16871,12 @@ (define_insn "*si3_1_zext" (match_test "optimize_function_for_size_p (cfun)"))) (const_string "0") (const_string "*"))) - (set_attr "mode" "SI")]) + (set_attr "mode" "SI") + (set (attr "enabled") + (if_then_else + (eq_attr "alternative" "3") + (symbol_ref " == LSHIFTRT && TARGET_AVX512BW") + (const_string "*")))]) ;; Convert shift to the shiftx pattern to avoid flags dependency. (define_split diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0be2dcd8891..20665a6f097 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2179,6 +2179,34 @@ (define_split (match_dup 2))) (unspec [(const_int 0)] UNSPEC_MASKOP)])]) +(define_insn "*klshrsi3_1_zext" + [(set (match_operand:DI 0 "register_operand" "=k") + (zero_extend:DI + (lshiftrt:SI (match_operand:SI 1 "register_operand" "k") + (match_operand 2 "const_0_to_31_operand" "I")))) + (unspec [(const_int 0)] UNSPEC_MASKOP)] + "TARGET_AVX512BW" + "kshiftrd\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "msklog") + (set_attr "prefix" "vex") + (set_attr "mode" "SI")]) + +(define_split + [(set (match_operand:DI 0 "mask_reg_operand") + (zero_extend:DI + (lshiftrt:SI + (match_operand:SI 1 "mask_reg_operand") + (match_operand 2 "const_0_to_31_operand")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_AVX512BW && reload_completed" + [(parallel + [(set (match_dup 0) + (zero_extend:DI + (lshiftrt:SI + (match_dup 1) + (match_dup 2)))) + (unspec [(const_int 0)] UNSPEC_MASKOP)])]) + (define_insn "ktest" [(set (reg:CC FLAGS_REG) (unspec:CC From patchwork Fri Jun 28 05:27:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1953704 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Y44+MgdU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4W9PCC0l2wz20b0 for ; Fri, 28 Jun 2024 15:29:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3A375382EF02 for ; Fri, 28 Jun 2024 05:29:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by sourceware.org (Postfix) with ESMTPS id 312463882044 for ; Fri, 28 Jun 2024 05:29:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 312463882044 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 312463882044 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552572; cv=none; b=iuhAS5BQ6Jr2hIjbumdPhWNNSzy+Phh30FLGHzt9LbutCsmNuAr8QHBMFcibE4XY2/K9vu1HOrMSD20a+kfVxDETEZKGmaXYIZO2HFt33BSexFpmVTww8Be6th1k7B7Y02WJFNAAos+zXiTtk8vSVox2n30K6Bg+3g8Gjb8Bo+c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719552572; c=relaxed/simple; bh=Qq3I7XbNStKWnRI0eg8DY0fgUwpOBsch1CMTIbSHB6A=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=LUede1/wAsF/zmM3sgs8sT/azDD4HCeRmAk44JIOUZBvbHpxd4jk3n15l0FDjD0rs5zsggMltZ4NjJ6nZq1Op1hY25KXoUg20+VIGGgk9q9uglzVRS44WUbUqEOqXMP9UnG0nM9XXiF4FjO7OjToAel7FcwbjJ9sBqvwsAtEgJ8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719552566; x=1751088566; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qq3I7XbNStKWnRI0eg8DY0fgUwpOBsch1CMTIbSHB6A=; b=Y44+MgdU/wDlY3QZUUmRyGu36uAvld8Egf1v7cfG42dlnPEr/ig47Q4J AJvKLhpDv151sk+SxKHjlVGzTeidVYsUMew+eHv4hogPBeoKL9nZGJk4/ lBImbYOwhRuDmYna7E333jaiWFuhw2uxCZ1TpnV1Q5Ks/ZTEuthf/2f/d Mp2OpnVe+oJ0QdjXzo9CkrjMVrA1s+592yllzb8Jfl7fuPWiRZk8sZo6R ABWnn2aTF9+pbbpZGhKkvGJxlLplh6QLHREVarIvD9lVByhhY+6Wl5w2C 2VTgDtcp3F9rVcGB8oAVwjDGlBqlyvf089ZWv7/oeunhH/a3ag77yMoD7 w==; X-CSE-ConnectionGUID: sfDZIiy2SOSu0xsdmiMkuQ== X-CSE-MsgGUID: 922rBZ4PRV+Uwrde/VrCQg== X-IronPort-AV: E=McAfee;i="6700,10204,11116"; a="27341268" X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="27341268" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2024 22:29:23 -0700 X-CSE-ConnectionGUID: k5vpxA9rTDWIfkxk7Ge1bQ== X-CSE-MsgGUID: Q2Mpna4PTmSD6tZIf+QTKg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,168,1716274800"; d="scan'208";a="49218816" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmviesa004.fm.intel.com with ESMTP; 27 Jun 2024 22:29:20 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 1BECF1005691; Fri, 28 Jun 2024 13:29:20 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com Subject: [PATCH 3/3] [x86] Enable flate-combine. Date: Fri, 28 Jun 2024 13:27:19 +0800 Message-Id: <20240628052719.330451-4-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20240628052719.330451-1-hongtao.liu@intel.com> References: <20240628052719.330451-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also define target_insn_cost to prevent post_reload pass_late_combine to revert the optimziation did in pass_rpad. Adjust testcases since pass_late_combine generates better code but break scan assembly. .i.e Under 32-bit target, gcc used to generate broadcast from stack and then do the real operation. After flate_combine, they're combined into embeded broadcast operations. gcc/ChangeLog: * config/i386/i386-features.cc (ix86_rpad_gate): New function. * config/i386/i386-options.cc (ix86_override_options_after_change): Don't disable flate_combine. * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad after pre_reload pas_late_combine. * config/i386/i386-protos.h (ix86_rpad_gate): New declare. * config/i386/i386.cc (ix86_insn_cost): New function. (TARGET_INSN_COST): Define. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus testcase. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/pr91333.c: Ditto. * gcc.target/i386/vect-strided-4.c: Ditto. --- gcc/config/i386/i386-features.cc | 16 +++++++++++----- gcc/config/i386/i386-options.cc | 4 ---- gcc/config/i386/i386-passes.def | 4 ++-- gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/i386.cc | 18 ++++++++++++++++++ .../i386/avx512f-broadcast-pr87767-1.c | 4 ++-- .../i386/avx512f-broadcast-pr87767-5.c | 1 - .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c | 2 +- .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 2 +- .../i386/avx512vl-broadcast-pr87767-1.c | 4 ++-- .../i386/avx512vl-broadcast-pr87767-5.c | 2 -- gcc/testsuite/gcc.target/i386/pr91333.c | 2 +- gcc/testsuite/gcc.target/i386/vect-strided-4.c | 2 +- 15 files changed, 42 insertions(+), 24 deletions(-) diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 607d1991460..fc224ed06b0 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -2995,6 +2995,16 @@ make_pass_insert_endbr_and_patchable_area (gcc::context *ctxt) return new pass_insert_endbr_and_patchable_area (ctxt); } +bool +ix86_rpad_gate () +{ + return (TARGET_AVX + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && TARGET_SSE_MATH + && optimize + && optimize_function_for_speed_p (cfun)); +} + /* At entry of the nearest common dominator for basic blocks with conversions/rcp/sqrt/rsqrt/round, generate a single vxorps %xmmN, %xmmN, %xmmN @@ -3232,11 +3242,7 @@ public: /* opt_pass methods: */ bool gate (function *) final override { - return (TARGET_AVX - && TARGET_SSE_PARTIAL_REG_DEPENDENCY - && TARGET_SSE_MATH - && optimize - && optimize_function_for_speed_p (cfun)); + return ix86_rpad_gate (); } unsigned int execute (function *) final override diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 9c12d498928..1ef2c71a7a2 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -1944,10 +1944,6 @@ ix86_override_options_after_change (void) flag_cunroll_grow_size = flag_peel_loops || optimize >= 3; } - /* Late combine tends to undo some of the effects of STV and RPAD, - by combining instructions back to their original form. */ - if (!OPTION_SET_P (flag_late_combine_instructions)) - flag_late_combine_instructions = 0; } /* Clear stack slot assignments remembered from previous functions. diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def index 7d96766f7b9..2d29f65da88 100644 --- a/gcc/config/i386/i386-passes.def +++ b/gcc/config/i386/i386-passes.def @@ -25,11 +25,11 @@ along with GCC; see the file COPYING3. If not see */ INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper); - INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */); + INSERT_PASS_AFTER (pass_late_combine, 1, pass_stv, false /* timode_p */); /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and CONSTM1_RTX generated by the STV pass can be CSEed. */ INSERT_PASS_BEFORE (pass_cse2, 1, pass_stv, true /* timode_p */); INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area); - INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency); + INSERT_PASS_AFTER (pass_late_combine, 1, pass_remove_partial_avx_dependency); diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 4f48dc0bf75..3dbd18dc70b 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -422,6 +422,7 @@ extern rtl_opt_pass *make_pass_remove_partial_avx_dependency (gcc::context *); extern bool ix86_has_no_direct_extern_access; +extern bool ix86_rpad_gate (); /* In i386-expand.cc. */ bool ix86_check_builtin_isa_match (unsigned int, HOST_WIDE_INT*, diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 1f71ed04be6..9d2b7d1f174 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21371,6 +21371,22 @@ ix86_shift_rotate_cost (const struct processor_costs *cost, } } +static int +ix86_insn_cost (rtx_insn *insn, bool speed) +{ + int insn_cost = 0; + /* Add extra cost to avoid post_reload late_combine revert + the optimization did in pass_rpad. */ + if (reload_completed + && ix86_rpad_gate () + && recog_memoized (insn) >= 0 + && get_attr_avx_partial_xmm_update (insn) + == AVX_PARTIAL_XMM_UPDATE_TRUE) + insn_cost += COSTS_N_INSNS (3); + + return insn_cost + pattern_cost (PATTERN (insn), speed); +} + /* Compute a (partial) cost for rtx X. Return true if the complete cost has been computed, and false if subexpressions should be scanned. In either case, *TOTAL contains the cost result. */ @@ -26514,6 +26530,8 @@ static const scoped_attribute_specs *const ix86_attribute_table[] = #define TARGET_MEMORY_MOVE_COST ix86_memory_move_cost #undef TARGET_RTX_COSTS #define TARGET_RTX_COSTS ix86_rtx_costs +#undef TARGET_INSN_COST +#define TARGET_INSN_COST ix86_insn_cost #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST ix86_address_cost diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c index 138dbb4c973..3a50749e610 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c @@ -3,8 +3,8 @@ /* { dg-options "-O2 -mavx512f -mavx512dq" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c index d22251bc2a3..ea2f64861d0 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c @@ -3,7 +3,6 @@ /* { dg-options "-O2 -mavx512f" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to8\\\}" { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c index 8c117207efa..bbcc5ed0bec 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vfmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */ #define type __m512 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c index cc705af8ea5..fc72dd6e557 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */ #define type __m512 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c index db5c34678c0..342de482da8 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vfnmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */ #define type __m512 diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c index 7815251b82d..f56a3f8acc4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "vfnmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */ #define type __m512 diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c index e6df4d25f36..08898445be5 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c @@ -3,8 +3,8 @@ /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 } } */ -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c index ebdc3619d8e..c57a2e29767 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c @@ -3,8 +3,6 @@ /* { dg-options "-O2 -mavx512f -mavx512vl" } */ /* { dg-additional-options "-fno-PIE" { target ia32 } } */ /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } } -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to2\\\}" { target ia32 } } } */ -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to4\\\}" { target ia32 } } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */ /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c b/gcc/testsuite/gcc.target/i386/pr91333.c index 2bdff871024..b4940b5c9ec 100644 --- a/gcc/testsuite/gcc.target/i386/pr91333.c +++ b/gcc/testsuite/gcc.target/i386/pr91333.c @@ -1,6 +1,6 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O2 -mavx" } */ -/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 3 } } */ +/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 2 } } */ static inline double g (double x){ asm volatile ("" : "+x" (x)); diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-4.c b/gcc/testsuite/gcc.target/i386/vect-strided-4.c index dd922926a2a..3fb9f07886e 100644 --- a/gcc/testsuite/gcc.target/i386/vect-strided-4.c +++ b/gcc/testsuite/gcc.target/i386/vect-strided-4.c @@ -15,6 +15,6 @@ void foo (int * __restrict a, int * __restrict b, int *c, int s) /* Vectorization factor two, two two-element stores to a using movq and two two-element stores to b via pextrq/movhps of the high part. */ -/* { dg-final { scan-assembler-times "movq" 2 } } */ +/* { dg-final { scan-assembler-times "movq\[\t ]+%xmm\[0-9]" 2 } } */ /* { dg-final { scan-assembler-times "pextrq" 2 { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times "movhps" 2 { target { ia32 } } } } */