From patchwork Wed Nov 15 09:47:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 1864125 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=TXEOyMya; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SVddz0KXBz1yRG for ; Wed, 15 Nov 2023 20:48:39 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4BE4338618AD for ; Wed, 15 Nov 2023 09:48:28 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id BA3333857C64 for ; Wed, 15 Nov 2023 09:47:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BA3333857C64 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BA3333857C64 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.100 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700041643; cv=none; b=viLTd+syubUIVkPrJWOGgqzLwKHQEuT0CxxWCt/iyKMM0l5SJCQzHfQ4/Ufuh3O0IqpJ6ligM7X1jGKmPGHq2ElFmNKAEGOBkGdkSNAIWhA3vGnEgPGdMynMmDI5Zp1GoeuXpAAUe08oC/0jM2mXwrL7Y6c/UBF/65acF0XDz7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700041643; c=relaxed/simple; bh=hUGBxDXsdbPm2jiO+1gEBrDOdvirm52KKdq5VzLNTTk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Wif3vqFmQOFoz/CQCQadMaGwTCWMrlIqIOKdBUO00F8NF+SSZRoCxYVnLE7syIOuD8bUTffaEuIKio1M6/1ag/+X298G0Mf79j/2mJQa1Qto91P5uSz2dwfszP/92Yc41i02pluVF3dF6kfHYBndSh5nwlwl1O2nR3YtfkBQo4s= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700041640; x=1731577640; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hUGBxDXsdbPm2jiO+1gEBrDOdvirm52KKdq5VzLNTTk=; b=TXEOyMyaE/Z3FaXF3qRi7ShtcwjHk8OKxd3aLJg62XGXoA3qNQTVWOPN /o87hnTkLR8HObbyEWkhMoepqXz7ZNxr8IJhdYKR10uHYw3WV09N71ilJ ThcHeLDMahLVLbyx6rcfIxUZa1+GU46ZNqwhMtDOUvGZ/JDHe3mWrwAhe EvBfhiAHE+Fk5IR+SA+JCFoqsUXFFqOyD8rrzF6v8fIw7/PsJZZAp9QCl iS0UdVq/fBlgbM+IgiVZAFz5xPthn2ZQhzuruQaLOGrIfa+cdXjvymE+F K3N3eUq7Oq1VBipXsZov8yiC5hlrUjRgF6+qjPpcWUdKi0P2QzOnu3Asv w==; X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="457342614" X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="457342614" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Nov 2023 01:47:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="938431714" X-IronPort-AV: E=Sophos;i="6.03,304,1694761200"; d="scan'208";a="938431714" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga005.jf.intel.com with ESMTP; 15 Nov 2023 01:47:13 -0800 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CA85E1005684; Wed, 15 Nov 2023 17:47:05 +0800 (CST) From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 14/16] [APX NDD] Support APX NDD for rotate insns Date: Wed, 15 Nov 2023 17:47:03 +0800 Message-Id: <20231115094705.3976553-15-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231115094705.3976553-1-hongyu.wang@intel.com> References: <20231115094705.3976553-1-hongyu.wang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add ROTATE and ROTATERT. * config/i386/i386.md (*3_1): Extend with a new alternative to support NDD for SI/DI rotate, and adjust output template. (*si3_1_zext): Likewise. (*3_1): Likewise for QI/HI modes. (rcrsi2): Likewise. (rcrdi2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd.c: Add test for left/right rotate. --- gcc/config/i386/i386-expand.cc | 2 + gcc/config/i386/i386.md | 91 ++++++++++++++++--------- gcc/testsuite/gcc.target/i386/apx-ndd.c | 20 ++++++ 3 files changed, 80 insertions(+), 33 deletions(-) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 8e040346fbb..ab6f14485d6 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -1279,6 +1279,8 @@ bool ix86_can_use_ndd_p (enum rtx_code code) case ASHIFT: case ASHIFTRT: case LSHIFTRT: + case ROTATE: + case ROTATERT: return true; default: return false; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 3ff333d4a41..760c0d32f4d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -16362,13 +16362,15 @@ (define_insn "*bmi2_rorx3_1" (set_attr "mode" "")]) (define_insn "*3_1" - [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r") + [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r") (any_rotate:SWI48 - (match_operand:SWI48 1 "nonimmediate_operand" "0,rm") - (match_operand:QI 2 "nonmemory_operand" "c,"))) + (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm") + (match_operand:QI 2 "nonmemory_operand" "c,,c"))) (clobber (reg:CC FLAGS_REG))] - "ix86_binary_operator_ok (, mode, operands)" + "ix86_binary_operator_ok (, mode, operands, + ix86_can_use_ndd_p ())" { + bool use_ndd = (which_alternative == 2); switch (get_attr_type (insn)) { case TYPE_ROTATEX: @@ -16376,14 +16378,18 @@ (define_insn "*3_1" default: if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{}\t%0"; + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) + && !(use_ndd && REG_P (operands[1]) + && REGNO (operands[1]) == CX_REG)) + return use_ndd ? "{}\t{%1, %0|%0, %1}" + : "{}\t%0"; else - return "{}\t{%2, %0|%0, %2}"; + return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}" + : "{}\t{%2, %0|%0, %2}"; } } - [(set_attr "isa" "*,bmi2") - (set_attr "type" "rotate,rotatex") + [(set_attr "isa" "*,bmi2,apx_ndd") + (set_attr "type" "rotate,rotatex,rotate") (set (attr "preferred_for_size") (cond [(eq_attr "alternative" "0") (symbol_ref "true")] @@ -16433,13 +16439,14 @@ (define_insn "*bmi2_rorxsi3_1_zext" (set_attr "mode" "SI")]) (define_insn "*si3_1_zext" - [(set (match_operand:DI 0 "register_operand" "=r,r") + [(set (match_operand:DI 0 "register_operand" "=r,r,r") (zero_extend:DI - (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm") - (match_operand:QI 2 "nonmemory_operand" "cI,I")))) + (any_rotate:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm") + (match_operand:QI 2 "nonmemory_operand" "cI,I,cI")))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && ix86_binary_operator_ok (, SImode, operands)" { + bool use_ndd = (which_alternative == 2); switch (get_attr_type (insn)) { case TYPE_ROTATEX: @@ -16447,14 +16454,18 @@ (define_insn "*si3_1_zext" default: if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{l}\t%k0"; + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) + && !(use_ndd && REG_P (operands[1]) + && REGNO (operands[1]) == CX_REG)) + return use_ndd ? "{l}\t{%1, %k0|%k0, %1}" + : "{l}\t%k0"; else - return "{l}\t{%2, %k0|%k0, %2}"; + return use_ndd ? "{l}\t{%2, %1, %k0|%k0, %1, %2}" + : "{l}\t{%2, %k0|%k0, %2}"; } } - [(set_attr "isa" "*,bmi2") - (set_attr "type" "rotate,rotatex") + [(set_attr "isa" "*,bmi2,apx_ndd") + (set_attr "type" "rotate,rotatex,rotate") (set (attr "preferred_for_size") (cond [(eq_attr "alternative" "0") (symbol_ref "true")] @@ -16498,19 +16509,27 @@ (define_split (zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2))))]) (define_insn "*3_1" - [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m") - (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0") - (match_operand:QI 2 "nonmemory_operand" "c"))) + [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m,r") + (any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm") + (match_operand:QI 2 "nonmemory_operand" "c,c"))) (clobber (reg:CC FLAGS_REG))] - "ix86_binary_operator_ok (, mode, operands)" + "ix86_binary_operator_ok (, mode, operands, + ix86_can_use_ndd_p ())" { if (operands[2] == const1_rtx - && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))) - return "{}\t%0"; + && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)) + && !(which_alternative && REG_P (operands[1]) + && REGNO (operands[1]) == CX_REG)) + return which_alternative + ? "{}\t{%1, %0|%0, %1}" + : "{}\t%0"; else - return "{}\t{%2, %0|%0, %2}"; + return which_alternative + ? "{}\t{%2, %1, %0|%0, %1, %2}" + : "{}\t{%2, %0|%0, %2}"; } - [(set_attr "type" "rotate") + [(set_attr "isa" "*,apx_ndd") + (set_attr "type" "rotate") (set (attr "length_immediate") (if_then_else (and (match_operand 2 "const1_operand") @@ -16567,31 +16586,37 @@ (define_split ;; Rotations through carry flag (define_insn "rcrsi2" - [(set (match_operand:SI 0 "register_operand" "=r") + [(set (match_operand:SI 0 "register_operand" "=r,r") (plus:SI - (lshiftrt:SI (match_operand:SI 1 "register_operand" "0") + (lshiftrt:SI (match_operand:SI 1 "register_operand" "0,r") (const_int 1)) (ashift:SI (ltu:SI (reg:CCC FLAGS_REG) (const_int 0)) (const_int 31)))) (clobber (reg:CC FLAGS_REG))] "" - "rcr{l}\t%0" - [(set_attr "type" "ishift1") + "@ + rcr{l}\t{%1, %0|%0, %1} + rcr{l}\t%0" + [(set_attr "isa" "*,apx_ndd") + (set_attr "type" "ishift1") (set_attr "memory" "none") (set_attr "length_immediate" "0") (set_attr "mode" "SI")]) (define_insn "rcrdi2" - [(set (match_operand:DI 0 "register_operand" "=r") + [(set (match_operand:DI 0 "register_operand" "=r,r") (plus:DI - (lshiftrt:DI (match_operand:DI 1 "register_operand" "0") + (lshiftrt:DI (match_operand:DI 1 "register_operand" "0,r") (const_int 1)) (ashift:DI (ltu:DI (reg:CCC FLAGS_REG) (const_int 0)) (const_int 63)))) (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT" - "rcr{q}\t%0" - [(set_attr "type" "ishift1") + "@ + rcr{q}\t{%1, %0|%0, %1} + rcr{q}\t%0" + [(set_attr "isa" "*,apx_ndd") + (set_attr "type" "ishift1") (set_attr "length_immediate" "0") (set_attr "mode" "DI")]) diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c b/gcc/testsuite/gcc.target/i386/apx-ndd.c index 28c0df72988..b8b70511023 100644 --- a/gcc/testsuite/gcc.target/i386/apx-ndd.c +++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c @@ -40,6 +40,14 @@ foo3_##OP_NAME##_##TYPE (TYPE a) \ return b; \ } +#define FOO4(TYPE, OP_NAME, OP1, OP2, IMM1) \ +TYPE \ +__attribute__ ((noipa)) \ +foo4_##OP_NAME##_##TYPE (TYPE a) \ +{ \ + TYPE b = (a OP1 IMM1 | a OP2 (8 * sizeof(TYPE) - IMM1)); \ + return b; \ +} #define F(TYPE, OP_NAME, OP) \ TYPE \ @@ -152,6 +160,16 @@ FOO3 (uint32_t, shr, >>, 7) FOO (uint64_t, shr, >>) FOO3 (uint64_t, shr, >>, 7) +FOO4 (uint8_t, ror, >>, <<, 1) +FOO4 (uint16_t, ror, >>, <<, 1) +FOO4 (uint32_t, ror, >>, <<, 1) +FOO4 (uint64_t, ror, >>, <<, 1) + +FOO4 (uint8_t, rol, <<, >>, 1) +FOO4 (uint16_t, rol, <<, >>, 1) +FOO4 (uint32_t, rol, <<, >>, 1) +FOO4 (uint64_t, rol, <<, >>, 1) + /* { dg-final { scan-assembler-times "add(?:l|w|q)\[^\n\r]*1, \\(%rdi\\), %(?:|r|e)ax" 4 } } */ /* { dg-final { scan-assembler-times "lea(?:l|q)\[^\n\r]\\(%r(?:d|s)i,%r(?:d|s)i\\), %(?:|r|e)ax" 4 } } */ /* { dg-final { scan-assembler-times "add(?:l|w|q)\[^\n\r]%(?:|r|e)si, \\(%rdi\\), %(?:|r|e)ax" 4 } } */ @@ -180,3 +198,5 @@ FOO3 (uint64_t, shr, >>, 7) /* { dg-final { scan-assembler-times "sar(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */ /* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]\\(%rdi\\), %(?:|r|e)a(?:x|l)" 4 } } */ /* { dg-final { scan-assembler-times "shr(?:b|l|w|q)\[^\n\r]*7, %(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */ +/* { dg-final { scan-assembler-times "ror(?:b|l|w|q)\[^\n\r]%(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */ +/* { dg-final { scan-assembler-times "rol(?:b|l|w|q)\[^\n\r]%(?:|r|e)di(?:|l), %(?:|r|e)a(?:x|l)" 4 } } */