From patchwork Sat Jul 13 07:42:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 1960083 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=I/Yt1jMc; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WLgTl0lJvz1xqc for ; Sat, 13 Jul 2024 17:44:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ED19D385F009 for ; Sat, 13 Jul 2024 07:44:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by sourceware.org (Postfix) with ESMTPS id 2FDF0385C6C7 for ; Sat, 13 Jul 2024 07:44:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2FDF0385C6C7 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2FDF0385C6C7 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720856663; cv=none; b=ltm/1wj9aUeKCxurGc9Hmq8nOfkBod96JBkfks6TFEtccX7zWie2rULJciZ0KcUI0J0EklkDXSutTakp5RujTdIH8KzwV7cxd6KAkPkII7uRO52g+3z60W4nwWJUl7PbIDjudIPFr5lTrJ5Y+zgVwFlA4300VRFxj7RCXxGeuX0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720856663; c=relaxed/simple; bh=AbM9MQFYJZA5zBOcPK0Baiy3QhWWQCbyfihZwtMtZ3c=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=J/vMtLByQpjiWNkk2zbaa0F72ke/z71Ava4rPgD26LlS9zyGK7WoM4Ad/6iIE+DHNc9gopLEt5jZYGS0SFOB1x44aX3nUWT9SxrWX80h3fLA6im6Pr7jscCboCERnE/gOD1F+0PqO//MPRG6t3agFdLjg18r1am5NOoyG5gcs9c= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1720856661; x=1752392661; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=AbM9MQFYJZA5zBOcPK0Baiy3QhWWQCbyfihZwtMtZ3c=; b=I/Yt1jMcNfrDvTrOfeqnTx1mqaJ9VI37UzZ12RAgEFOvzowe6e8ulqbQ LKACOGMKKQrc7jBKbglbIOEuUQYhcootO6YEB87S7ZcwbL+mSzCj3eYi3 D99FxULoTJttwzxhNYUADzwmXu8s82rzuVwjKsLtiC+TaTWVx0M8WAABz tjHxFiVLPM9YtPdB9tBxc8Ln4yUtVnFRU5nUIlcxL6iavPV4+LHw7jfjU tJifgvlKutpnGdHo1fZxZ7jZAUdlTPbf76Z3uo4SRUdDs453n3zLbIVrc Tft0CT756rBT5j0fMoUPw5JhapXrJNX2b8JB7+I5+K7IA5a7Aubs/wouu Q==; X-CSE-ConnectionGUID: tCg2Rn4FQqmsWK/C7+xiqw== X-CSE-MsgGUID: 0nGUD4SLR9Onlq189P6vng== X-IronPort-AV: E=McAfee;i="6700,10204,11131"; a="18254451" X-IronPort-AV: E=Sophos;i="6.09,205,1716274800"; d="scan'208";a="18254451" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2024 00:44:19 -0700 X-CSE-ConnectionGUID: guECafQAT+emDdVnUjwY7Q== X-CSE-MsgGUID: Puju9b60T7W0ywqB8Cu8Fg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,205,1716274800"; d="scan'208";a="49015236" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orviesa010.jf.intel.com with ESMTP; 13 Jul 2024 00:44:17 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 0F21F1007367; Sat, 13 Jul 2024 15:44:16 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, admin@levyhsu.com, ubizjak@gmail.com Subject: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889] Date: Sat, 13 Jul 2024 15:42:15 +0800 Message-Id: <20240713074215.2151225-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, According to the instruction spec of AVX512BF16, the convert from float to BF16 is not a simple truncation. It has special handling for denormal/nan, even for normal float it will add an extra bias according to the least significant bit for bf number. This means we cannot use the vcvtne2ps2bf16 for any bf16 vector shuffle. The optimization introduced in r15-1368 adds a specific split to convert HImode permutation with this instruction, so remove it and treat the BFmode permutation same as HFmode. Bootstrapped & regtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: PR target/115889 * config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove. * config/i386/sse.md (hi_cvt_bf): Remove. (HI_CVT_BF): Likewise. (vpermt2_sepcial_bf16_shuffle_):Likewise. gcc/testsuite/ChangeLog: PR target/115889 * gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust option and output scan. --- gcc/config/i386/predicates.md | 11 ------ gcc/config/i386/sse.md | 35 ------------------- .../i386/vpermt2-special-bf16-shufflue.c | 5 ++- 3 files changed, 2 insertions(+), 49 deletions(-) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index a894847adaf..5d0bb1e0f54 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -2327,14 +2327,3 @@ (define_predicate "apx_ndd_add_memory_operand" return true; }) - -;; Check that each element is odd and incrementally increasing from 1 -(define_predicate "vcvtne2ps2bf_parallel" - (and (match_code "const_vector") - (match_code "const_int" "a")) -{ - for (int i = 0; i < XVECLEN (op, 0); ++i) - if (INTVAL (XVECEXP (op, 0, i)) != (2 * i + 1)) - return false; - return true; -}) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b3b4697924b..c134494cd20 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -31460,38 +31460,3 @@ (define_insn "vpdp_" "TARGET_AVXVNNIINT16" "vpdp\t{%3, %2, %0|%0, %2, %3}" [(set_attr "prefix" "vex")]) - -(define_mode_attr hi_cvt_bf - [(V8HI "v8bf") (V16HI "v16bf") (V32HI "v32bf")]) - -(define_mode_attr HI_CVT_BF - [(V8HI "V8BF") (V16HI "V16BF") (V32HI "V32BF")]) - -(define_insn_and_split "vpermt2_sepcial_bf16_shuffle_" - [(set (match_operand:VI2_AVX512F 0 "register_operand") - (unspec:VI2_AVX512F - [(match_operand:VI2_AVX512F 1 "vcvtne2ps2bf_parallel") - (match_operand:VI2_AVX512F 2 "register_operand") - (match_operand:VI2_AVX512F 3 "nonimmediate_operand")] - UNSPEC_VPERMT2))] - "TARGET_AVX512VL && TARGET_AVX512BF16 && ix86_pre_reload_split ()" - "#" - "&& 1" - [(const_int 0)] -{ - rtx op0 = gen_reg_rtx (mode); - operands[2] = lowpart_subreg (mode, - force_reg (mode, operands[2]), - mode); - operands[3] = lowpart_subreg (mode, - force_reg (mode, operands[3]), - mode); - - emit_insn (gen_avx512f_cvtne2ps2bf16_(op0, - operands[3], - operands[2])); - emit_move_insn (operands[0], lowpart_subreg (mode, op0, - mode)); - DONE; -} -[(set_attr "mode" "")]) diff --git a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c index 5c65f2a9884..4cbc85735de 100755 --- a/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c +++ b/gcc/testsuite/gcc.target/i386/vpermt2-special-bf16-shufflue.c @@ -1,7 +1,6 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */ -/* { dg-final { scan-assembler-not "vpermi2b" } } */ -/* { dg-final { scan-assembler-times "vcvtne2ps2bf16" 3 } } */ +/* { dg-options "-O2 -mavx512vbmi -mavx512vl" } */ +/* { dg-final { scan-assembler-times "vpermi2w" 3 } } */ typedef __bf16 v8bf __attribute__((vector_size(16))); typedef __bf16 v16bf __attribute__((vector_size(32)));