From patchwork Sat Oct 12 07:30:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongtao Liu X-Patchwork-Id: 1175676 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-510827-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="GCiqCpEM"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="qPkaV85G"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46qxJ320YCz9sP4 for ; Sat, 12 Oct 2019 18:28:16 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; q=dns; s=default; b=gz1BLsHU30CnAPFvs4QRZeV3qFaRejgSLqAZDK41scQ A9dyUgvYCMqSSVowisSXepeVzCMwOMxpeZ7FRJMcv/a8v3weEBZ5GtFvw2JFtZ7F D56BJEa42WBSat2aa7ciqUHHJ3GSkYKrlWQsFxq3euf4gMXg4JpEeumFWnTN1Rwk = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:cc:content-type; s=default; bh=Erh9dyl5GwOGuopqAG7RYcoVDbU=; b=GCiqCpEME4W7YK4uE Bx6VQCAIIfk4uw2DXaWM4O+hgU1FsfdD099hz/+CdHoL6Bg4iuSn+93ddzppaayF zQBn9s+vYaBHSuZtxkIDSRPv8dCfGJdhAWkvP2J4eZbM7dB6q3rPXCnsdWBMRyRd 528AjuE8NZ1Z/OpZC7EjCUhUXU= Received: (qmail 28849 invoked by alias); 12 Oct 2019 07:28:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 28633 invoked by uid 89); 12 Oct 2019 07:28:07 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.7 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=HX-Received:Sat, Renamed, ds, 3653 X-HELO: mail-oi1-f177.google.com Received: from mail-oi1-f177.google.com (HELO mail-oi1-f177.google.com) (209.85.167.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 12 Oct 2019 07:28:03 +0000 Received: by mail-oi1-f177.google.com with SMTP id x3so9933785oig.2 for ; Sat, 12 Oct 2019 00:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=/xPanp1JUh4h2lK4zIg2e2cEqZEnnqVzwanhKMyDGgo=; b=qPkaV85GryiebVPEJn6t3oU9ncuW9Rrf4b3NsVniY7b4xApRpcRjCMpa4CzYQgHWKz OFyqm/jhhme1d4oDWiafiI3bdsBXp7GCG+aK+edpvq/SdFvXValNDtE8I0/D8ciF7XYM VHl/nJLvpex6r0BGNYAFKHSVHpZAqAremD3hOXneRivjDgaz9kF0picz6ydCIGDVTWdr HkcmZE2d4JcCvumjQF/PpZ+okaLtYu3mF7wwINZx+yBVkvcanQdWCTdd2jf0OnO2uFae a10XJRngXhuvyqyCNva7fOgq9I+U23GZXLxwQeFFWVgQLeUYW0neP1UU1gfDETMj8iMT 9cKg== MIME-Version: 1.0 From: Hongtao Liu Date: Sat, 12 Oct 2019 15:30:57 +0800 Message-ID: Subject: [PATCH target/92035] Add missing avx512f intrinsics To: Jakub Jelinek , Jeff Law , Uros Bizjak , GCC Patches Cc: "H. J. Lu" , Lili Cui , "Zhang, Annita" , wwwhhhyyy333@gmail.com X-IsSubscribed: yes Hi: This patch is enabling missing avx512f intrinsics listed as _mm_mask_roundscale_sd _mm_mask_roundscale_round_sd _mm_maskz_roundscale_sd _mm_maskz_roundscale_round_sd _mm_mask_roundscale_ss _mm_mask_roundscale_round_ss _mm_maskz_roundscale_ss _mm_maskz_roundscale_round_ss Bootstrap ok, regression tests for i386/x86 ok. ChangeLog gcc/ * config/i386/avx512fintrin.h (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd, _mm_maskz_roundscale_round_sd): New intrinsics. (_mm_roundscale_ss, _mm_roundscale_round_ss): Fix. * config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round, __builtin_ia32_rndscalesd_round): Remove. (__builtin_ia32_rndscalesd_mask_round, __builtin_ia32_rndscalesd_mask_round): New intrinsics. * config/i386/sse.md (avx512f_rndscale): Renamed to ... (avx512f_rndscale): ... this. ((match_operand:VF_128 2 "" "")): Changed to ... ((match_operand:VF_128 2 "" "")): ... this. ("vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}"): Changed to ... ("vrndscale\t{%3,%2, %1, %0|%0, %1, %2, %3}"): ... this. gcc/testsuite/ * gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss): Test new intrinsics. * gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss): Test new intrinsics. * gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd, _mm_maskz_roundscale_round_sd): Test new intrinsics. * gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd, _mm_maskz_roundscale_round_sd): Test new intrinsics. * gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round, __builtin_ia32_rndscalefsd_round): Remove builtin. (__builtin_ia32_rndscalefss_mask_round, __builtin_ia32_rndscalefsd_mask_round): Test new builtin. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. From 39a2547f73c63493d502384c45b38b3dc54005c8 Mon Sep 17 00:00:00 2001 From: "Wang, Hongyu" Date: Sat, 12 Oct 2019 00:07:01 -0700 Subject: [PATCH] PR target/92035 Add missing mask[z]_roundscale_[round]_s[d,s] intrinsics gcc/ * config/i386/avx512fintrin.h (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss, _mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_mask_roundscale_round_sd, _mm_maskz_roundscale_round_sd): New intrinsics. (_mm_roundscale_ss, _mm_roundscale_round_ss): Fix. * config/i386/i386-builtin.def (__builtin_ia32_rndscaless_round, __builtin_ia32_rndscalesd_round): Remove. (__builtin_ia32_rndscalesd_mask_round, __builtin_ia32_rndscalesd_mask_round): New intrinsics. * config/i386/sse.md (avx512f_rndscale): Renamed to ... (avx512f_rndscale): ... this. ((match_operand:VF_128 2 "" "")): Changed to ... ((match_operand:VF_128 2 "" "")): ... this. ("vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}"): Changed to ... ("vrndscale\t{%3,%2, %1, %0|%0, %1, %2, %3}"): ... this. gcc/testsuite/ * gcc.target/i386/avx512f-vrndscaless-1.c (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss): Test new intrinsics. * gcc.target/i386/avx512f-vrndscaless-2.c (_mm_mask_roundscale_ss, _mm_maskz_roundscale_ss, _mm_maskz_roundscale_round_ss, _mm_maskz_roundscale_round_ss): Test new intrinsics. * gcc.target/i386/avx512f-vrndscalesd-1.c (_mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd, _mm_maskz_roundscale_round_sd): Test new intrinsics. * gcc.target/i386/avx512f-vrndscalesd-2.c (_mm_mask_roundscale_sd, _mm_maskz_roundscale_sd, _mm_maskz_roundscale_round_sd, _mm_maskz_roundscale_round_sd): Test new intrinsics. * gcc.target/i386/avx-1.c (__builtin_ia32_rndscalefss_round, __builtin_ia32_rndscalefsd_round): Remove builtin. (__builtin_ia32_rndscalefss_mask_round, __builtin_ia32_rndscalefsd_mask_round): Test new builtin. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. --- gcc/config/i386/avx512fintrin.h | 234 ++++++++++++++++-- gcc/config/i386/i386-builtin.def | 4 +- gcc/config/i386/sse.md | 9 +- gcc/testsuite/gcc.target/i386/avx-1.c | 4 +- .../gcc.target/i386/avx512f-vrndscalesd-1.c | 12 +- .../gcc.target/i386/avx512f-vrndscalesd-2.c | 42 +++- .../gcc.target/i386/avx512f-vrndscaless-1.c | 12 +- .../gcc.target/i386/avx512f-vrndscaless-2.c | 41 ++- gcc/testsuite/gcc.target/i386/sse-13.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 4 +- 10 files changed, 324 insertions(+), 42 deletions(-) diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index c2ca4e15acd..5773ac74360 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -9169,10 +9169,40 @@ _mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B, extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, const int __R) +_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, + const int __R) { - return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A, - (__v4sf) __B, __imm, __R); + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __A, + (__v4sf) __B, __imm, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) -1, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_round_ss (__m128 __A, __mmask8 __B, __m128 __C, + __m128 __D, const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __C, + (__v4sf) __D, __imm, + (__v4sf) __A, + (__mmask8) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_round_ss (__mmask8 __A, __m128 __B, __m128 __C, + const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __B, + (__v4sf) __C, __imm, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __A, + __R); } extern __inline __m128d @@ -9180,8 +9210,37 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm, const int __R) { - return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A, - (__v2df) __B, __imm, __R); + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __A, + (__v2df) __B, __imm, + (__v2df) + _mm_setzero_pd (), + (__mmask8) -1, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_round_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D, + const int __imm, const int __R) +{ + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __C, + (__v2df) __D, __imm, + (__v2df) __A, + (__mmask8) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_round_sd (__mmask8 __A, __m128d __B, __m128d __C, + const int __imm, const int __R) +{ + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __B, + (__v2df) __C, __imm, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __A, + __R); } #else @@ -9211,12 +9270,48 @@ _mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm, (int)(C), \ (__v8df)_mm512_setzero_pd(),\ (__mmask8)(A), R)) -#define _mm_roundscale_round_ss(A, B, C, R) \ - ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), R)) -#define _mm_roundscale_round_sd(A, B, C, R) \ - ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), R)) +#define _mm_roundscale_round_ss(A, B, I, R) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), \ + (int)(I), \ + (__v4sf)_mm_setzero_ps(),\ + (__mmask8)(-1), \ + (int)(R))) +#define _mm_mask_roundscale_round_ss(A, U, B, C, I, R) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(B), \ + (__v4sf)(__m128)(C), \ + (int)(I), \ + (__v4sf)(__m128)(A), \ + (__mmask8)(U), \ + (int)(R))) +#define _mm_maskz_roundscale_round_ss(U, A, B, I, R) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), \ + (int)(I), \ + (__v4sf)_mm_setzero_ps(),\ + (__mmask8)(U), \ + (int)(R))) +#define _mm_roundscale_round_sd(A, B, I, R) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),\ + (__v2df)(__m128d)(B),\ + (int)(I), \ + (__v2df)_mm_setzero_pd(),\ + (__mmask8)(-1), \ + (int)(R))) +#define _mm_mask_roundscale_round_sd(A, U, B, C, I, R) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(B),\ + (__v2df)(__m128d)(C),\ + (int)(I), \ + (__v2df)(__m128d)(A),\ + (__mmask8)(U), \ + (int)(R))) +#define _mm_maskz_roundscale_round_sd(U, A, B, I, R) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A),\ + (__v2df)(__m128d)(B),\ + (int)(I), \ + (__v2df)_mm_setzero_pd(),\ + (__mmask8)(U), \ + (int)(R))) #endif extern __inline __m512 @@ -14812,18 +14907,75 @@ extern __inline __m128 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm) { - return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A, - (__v4sf) __B, __imm, - _MM_FROUND_CUR_DIRECTION); + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __A, + (__v4sf) __B, __imm, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_ss (__m128 __A, __mmask8 __B, __m128 __C, __m128 __D, + const int __imm) +{ + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __C, + (__v4sf) __D, __imm, + (__v4sf) __A, + (__mmask8) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_ss (__mmask8 __A, __m128 __B, __m128 __C, + const int __imm) +{ + return (__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf) __B, + (__v4sf) __C, __imm, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __A, + _MM_FROUND_CUR_DIRECTION); } extern __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm) { - return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A, - (__v2df) __B, __imm, - _MM_FROUND_CUR_DIRECTION); + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __A, + (__v2df) __B, __imm, + (__v2df) + _mm_setzero_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_roundscale_sd (__m128d __A, __mmask8 __B, __m128d __C, __m128d __D, + const int __imm) +{ + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __C, + (__v2df) __D, __imm, + (__v2df) __A, + (__mmask8) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_roundscale_sd (__mmask8 __A, __m128d __B, __m128d __C, + const int __imm) +{ + return (__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df) __B, + (__v2df) __C, __imm, + (__v2df) + _mm_setzero_pd (), + (__mmask8) __A, + _MM_FROUND_CUR_DIRECTION); } #else @@ -14853,12 +15005,48 @@ _mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm) (int)(C), \ (__v8df)_mm512_setzero_pd(),\ (__mmask8)(A), _MM_FROUND_CUR_DIRECTION)) -#define _mm_roundscale_ss(A, B, C) \ - ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A), \ - (__v4sf)(__m128)(B), (int)(C), _MM_FROUND_CUR_DIRECTION)) -#define _mm_roundscale_sd(A, B, C) \ - ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A), \ - (__v2df)(__m128d)(B), (int)(C), _MM_FROUND_CUR_DIRECTION)) +#define _mm_roundscale_ss(A, B, I) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), \ + (int)(I), \ + (__v4sf)_mm_setzero_ps(),\ + (__mmask8)(-1), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_mask_roundscale_ss(A, U, B, C, I) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(B), \ + (__v4sf)(__m128)(C), \ + (int)(I), \ + (__v4sf)(__m128)(A), \ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_maskz_roundscale_ss(U, A, B, I) \ + ((__m128) __builtin_ia32_rndscaless_mask_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), \ + (int)(I), \ + (__v4sf)_mm_setzero_ps(),\ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_roundscale_sd(A, B, I) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B),\ + (int)(I), \ + (__v2df)_mm_setzero_pd(),\ + (__mmask8)(-1), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_mask_roundscale_sd(A, U, B, C, I) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(B), \ + (__v2df)(__m128d)(C),\ + (int)(I), \ + (__v2df)(__m128d)(A),\ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_maskz_roundscale_sd(U, A, B, I) \ + ((__m128d) __builtin_ia32_rndscalesd_mask_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B),\ + (int)(I), \ + (__v2df)_mm_setzero_pd(),\ + (__mmask8)(U), \ + _MM_FROUND_CUR_DIRECTION)) #endif #ifdef __OPTIMIZE__ diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 6ac820eb897..11028331cda 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2828,8 +2828,8 @@ BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_round, "__builtin_ia3 BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_sse_vmmulv4sf3_mask_round, "__builtin_ia32_mulss_mask_round", IX86_BUILTIN_MULSS_MASK_ROUND, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev8df_mask_round, "__builtin_ia32_rndscalepd_mask", IX86_BUILTIN_RNDSCALEPD, UNKNOWN, (int) V8DF_FTYPE_V8DF_INT_V8DF_QI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev16sf_mask_round, "__builtin_ia32_rndscaleps_mask", IX86_BUILTIN_RNDSCALEPS, UNKNOWN, (int) V16SF_FTYPE_V16SF_INT_V16SF_HI_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_round, "__builtin_ia32_rndscalesd_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT) -BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_round, "__builtin_ia32_rndscaless_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT) +BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev2df_mask_round, "__builtin_ia32_rndscalesd_mask_round", IX86_BUILTIN_RNDSCALESD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT) +BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_rndscalev4sf_mask_round, "__builtin_ia32_rndscaless_mask_round", IX86_BUILTIN_RNDSCALESS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv8df_mask_round, "__builtin_ia32_scalefpd512_mask", IX86_BUILTIN_SCALEFPD512, UNKNOWN, (int) V8DF_FTYPE_V8DF_V8DF_V8DF_UQI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_scalefv16sf_mask_round, "__builtin_ia32_scalefps512_mask", IX86_BUILTIN_SCALEFPS512, UNKNOWN, (int) V16SF_FTYPE_V16SF_V16SF_V16SF_HI_INT) BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_avx512f_vmscalefv2df_mask_round, "__builtin_ia32_scalefsd_mask_round", IX86_BUILTIN_SCALEFSD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DF_UQI_INT) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 07922a1bf97..f474eed1c4e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -9694,18 +9694,17 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_rndscale" +(define_insn "avx512f_rndscale" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 - [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:VF_128 2 "" "") + [(match_operand:VF_128 2 "" "") (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_ROUND) - (match_dup 1) + (match_operand:VF_128 1 "register_operand" "v") (const_int 1)))] "TARGET_AVX512F" - "vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vrndscale\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 741b3c4f8e3..3600a7abe91 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -283,8 +283,8 @@ #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E) #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8) #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8) -#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4) -#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4) +#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4) +#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4) #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c index 255b384d565..f95d4709607 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-1.c @@ -1,14 +1,24 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscalesd\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include volatile __m128d x1, x2; +volatile __mmask8 m; void extern avx512f_test (void) { x1 = _mm_roundscale_sd (x1, x2, 0x42); x1 = _mm_roundscale_round_sd (x1, x2, 0x42, _MM_FROUND_NO_EXC); + x1 = _mm_mask_roundscale_sd (x1, m, x1, x2, 0x42); + x1 = _mm_mask_roundscale_round_sd (x1, m, x1, x2, 0x42, _MM_FROUND_NO_EXC); + x1 = _mm_maskz_roundscale_sd (m, x1, x2, 0x42); + x1 = _mm_maskz_roundscale_round_sd (m, x1, x2, 0x42, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c index b96aa462790..83b940d9636 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscalesd-2.c @@ -6,6 +6,7 @@ #include #include "avx512f-check.h" +#include "avx512f-mask-type.h" static void compute_rndscalesd (double *s1, double *s2, double *r, int imm) @@ -33,17 +34,54 @@ compute_rndscalesd (double *s1, double *s2, double *r, int imm) static void avx512f_test (void) { - int imm = _MM_FROUND_FLOOR | (7 << 4); - union128d s1, s2, res1; + int i, imm; + union128d s1, s2, res1, res2, res3, res4, res5, res6; double res_ref[SIZE]; + + MASK_TYPE mask = MASK_VALUE; + + imm = _MM_FROUND_FLOOR | (7 << 4); s1.x = _mm_set_pd (4.05084, -1.23162); s2.x = _mm_set_pd (-3.53222, 7.33527); + for(i = 0; i < SIZE; i++) + { + res2.a[i] = DEFAULT_VALUE; + res5.a[i] = DEFAULT_VALUE; + } + res1.x = _mm_roundscale_sd (s1.x, s2.x, imm); + res2.x = _mm_mask_roundscale_sd (res2.x, mask, s1.x, s2.x, imm); + res3.x = _mm_maskz_roundscale_sd (mask, s1.x, s2.x, imm); + res4.x = _mm_roundscale_round_sd (s1.x, s2.x, imm, _MM_FROUND_NO_EXC); + res5.x = _mm_mask_roundscale_round_sd (res5.x, mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC); + res6.x = _mm_maskz_roundscale_round_sd (mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC); compute_rndscalesd (s1.a, s2.a, res_ref, imm); if (check_union128d (res1, res_ref)) abort (); + + MASK_MERGE (d) (res_ref, mask, 1); + if (check_union128d (res2, res_ref)) + abort (); + + MASK_ZERO (d) (res_ref, mask, 1); + if (check_union128d (res3, res_ref)) + abort (); + + compute_rndscalesd (s1.a, s2.a, res_ref, imm); + + if (check_union128d (res4, res_ref)) + abort (); + + MASK_MERGE (d) (res_ref, mask, 1); + if (check_union128d (res5, res_ref)) + abort (); + + MASK_ZERO (d) (res_ref, mask, 1); + if (check_union128d (res6, res_ref)) + abort (); + } diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c index dbd6e21b762..19e3a973fa4 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-1.c @@ -1,14 +1,24 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\\S*,\[ \\t\]+\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vrndscaless\[ \\t\]+\[^\n\]*\{sae\}\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include volatile __m128 x1, x2; +volatile __mmask8 m; void extern avx512f_test (void) { x1 = _mm_roundscale_ss (x1, x2, 0x42); x1 = _mm_roundscale_round_ss (x1, x2, 0x42, _MM_FROUND_NO_EXC); + x1 = _mm_mask_roundscale_ss (x1, m, x1, x2, 0x42); + x1 = _mm_mask_roundscale_round_ss (x1, m, x1, x2, 0x42, _MM_FROUND_NO_EXC); + x1 = _mm_maskz_roundscale_ss (m, x1, x2, 0x42); + x1 = _mm_maskz_roundscale_round_ss (m, x1, x2, 0x42, _MM_FROUND_NO_EXC); } diff --git a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c index 42dd645ab87..6906880d362 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-vrndscaless-2.c @@ -6,6 +6,7 @@ #include #include "avx512f-check.h" +#include "avx512f-mask-type.h" static void compute_rndscaless (float *s1, float *s2, float *r, int imm) @@ -35,17 +36,53 @@ compute_rndscaless (float *s1, float *s2, float *r, int imm) static void avx512f_test (void) { - int imm = _MM_FROUND_FLOOR | (7 << 4); - union128 s1, s2, res1; + int i, imm; + union128 s1, s2, res1, res2, res3, res4, res5, res6; float res_ref[SIZE]; + + MASK_TYPE mask = MASK_VALUE; + imm = _MM_FROUND_FLOOR | (7 << 4); + s1.x = _mm_set_ps (4.05084, -1.23162, 2.00231, -6.22103); s2.x = _mm_set_ps (-4.19319, -3.53222, 7.33527, 5.57655); + + for(i = 0; i < SIZE; i++) + { + res2.a[i] = DEFAULT_VALUE; + res5.a[i] = DEFAULT_VALUE; + } res1.x = _mm_roundscale_ss (s1.x, s2.x, imm); + res2.x = _mm_mask_roundscale_ss (res2.x, mask, s1.x, s2.x, imm); + res3.x = _mm_maskz_roundscale_ss (mask, s1.x, s2.x, imm); + res4.x = _mm_roundscale_round_ss (s1.x, s2.x, imm, _MM_FROUND_NO_EXC); + res5.x = _mm_mask_roundscale_round_ss (res5.x, mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC); + res6.x = _mm_maskz_roundscale_round_ss (mask, s1.x, s2.x, imm, _MM_FROUND_NO_EXC); compute_rndscaless (s1.a, s2.a, res_ref, imm); if (check_union128 (res1, res_ref)) abort (); + + MASK_MERGE () (res_ref, mask, 1); + if (check_union128 (res2, res_ref)) + abort (); + + MASK_ZERO () (res_ref, mask, 1); + if (check_union128 (res3, res_ref)) + abort (); + + compute_rndscaless (s1.a, s2.a, res_ref, imm); + + if (check_union128 (res4, res_ref)) + abort (); + + MASK_MERGE () (res_ref, mask, 1); + if (check_union128 (res5, res_ref)) + abort (); + + MASK_ZERO () (res_ref, mask, 1); + if (check_union128 (res6, res_ref)) + abort (); } diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 39b2d31578c..45c1c285c57 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -300,8 +300,8 @@ #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E) #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8) #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8) -#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4) -#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4) +#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4) +#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4) #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 7ea665de747..e98c7693ef7 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -302,8 +302,8 @@ #define __builtin_ia32_pternlogq512_maskz(A, B, C, F, E) __builtin_ia32_pternlogq512_maskz(A, B, C, 1, E) #define __builtin_ia32_rndscalepd_mask(A, F, C, D, E) __builtin_ia32_rndscalepd_mask(A, 1, C, D, 8) #define __builtin_ia32_rndscaleps_mask(A, F, C, D, E) __builtin_ia32_rndscaleps_mask(A, 1, C, D, 8) -#define __builtin_ia32_rndscalesd_round(A, B, C, D) __builtin_ia32_rndscalesd_round(A, B, 1, 4) -#define __builtin_ia32_rndscaless_round(A, B, C, D) __builtin_ia32_rndscaless_round(A, B, 1, 4) +#define __builtin_ia32_rndscalesd_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscalesd_mask_round(A, B, 1, D, E, 4) +#define __builtin_ia32_rndscaless_mask_round(A, B, C, D, E, F) __builtin_ia32_rndscaless_mask_round(A, B, 1, D, E, 4) #define __builtin_ia32_scalefpd512_mask(A, B, C, D, E) __builtin_ia32_scalefpd512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefps512_mask(A, B, C, D, E) __builtin_ia32_scalefps512_mask(A, B, C, D, 8) #define __builtin_ia32_scalefsd_mask_round(A, B, C, D, E) __builtin_ia32_scalefsd_mask_round(A, B, C, D, 8) -- 2.17.1