From patchwork Tue Sep 20 12:40:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dominik Vogt X-Patchwork-Id: 672247 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3sdj5z61Cgz9sQw for ; Tue, 20 Sep 2016 22:40:39 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=LA3jpciB; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:reply-to:references:mime-version:content-type :in-reply-to:message-id; q=dns; s=default; b=Ob22wJJwqLf0RqWEvJ8 2gmUMsNRkZkRkvTjWiDGIu+OxQkXUZX1WouNvQfbJD+KGLUbzwSY11O225VxRWFJ HYc/RD2wLMss4nUxqCpASXp0z61hXvXWls7HTjSBDOJmSKVcYGx7JO8+KVJVThTK 96J71NjnLscqoxzBbLsOAWuk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:reply-to:references:mime-version:content-type :in-reply-to:message-id; s=default; bh=5/X+9MJtrBA80losoznsW2lAM rI=; b=LA3jpciBNrsIxo2q3IOIsaydVsjSWZQPM9txpCTtdmd8jiWX9F9JY2gzf yLCnMqvUc7fEUVWPoUwvZoZzAQl6JdsOFRoj/vezSfkFcqppgQO3wbBP3vefoAQL 7lYhxaofvpZDHp3+jrNZk4ThxpQJXD7jN0Y3ZnYHxeewT1ZKIA= Received: (qmail 92939 invoked by alias); 20 Sep 2016 12:40:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 92908 invoked by uid 89); 20 Sep 2016 12:40:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL, BAYES_50, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=f11, e40, tng, f22 X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Sep 2016 12:40:16 +0000 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8KCcwLP120497 for ; Tue, 20 Sep 2016 08:40:14 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0b-001b2d01.pphosted.com with ESMTP id 25jmr0jqy8-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 20 Sep 2016 08:40:13 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 20 Sep 2016 13:40:11 +0100 Received: from d06dlp03.portsmouth.uk.ibm.com (9.149.20.15) by e06smtp14.uk.ibm.com (192.168.101.144) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 20 Sep 2016 13:40:10 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 00CA31B08023 for ; Tue, 20 Sep 2016 13:42:01 +0100 (BST) Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u8KCe95Z34603248 for ; Tue, 20 Sep 2016 12:40:09 GMT Received: from d06av04.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u8KCe9sh032430 for ; Tue, 20 Sep 2016 06:40:09 -0600 Received: from oc5510024614.ibm.com (sig-9-145-36-245.uk.ibm.com [9.145.36.245]) by d06av04.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u8KCe7og032337; Tue, 20 Sep 2016 06:40:07 -0600 Received: by oc5510024614.ibm.com (Postfix, from userid 500) id 7CB85C6AC; Tue, 20 Sep 2016 14:40:11 +0200 (CEST) Date: Tue, 20 Sep 2016 13:40:11 +0100 From: Dominik Vogt To: gcc-patches@gcc.gnu.org, Andreas Krebbel Subject: Re: [PATCH 3/3] S/390: Improved risbg usage. Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: vogt@linux.vnet.ibm.com, gcc-patches@gcc.gnu.org, Andreas Krebbel References: <20160920123718.GA26830@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160920123718.GA26830@linux.vnet.ibm.com> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16092012-0016-0000-0000-0000023BAE59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16092012-0017-0000-0000-0000231CD218 Message-Id: <20160920124011.GC28823@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-09-20_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609200163 On Tue, Sep 20, 2016 at 01:37:18PM +0100, Dominik Vogt wrote: > The following series of patches improves usage of the risbg and > risbgn instructions on s390/s390x. The patches have been > regression tested on s390 and s390x and pass the Spec2006 > testsuite without any negative effects. > Patch 3 adds new patterns and tests based on the first two patches > to make better use of the risbg and risbgn instructions. Ciao Dominik ^_^ ^_^ diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 2848b41..9de442f 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -606,6 +606,7 @@ ;; same template. (define_mode_iterator INT [(DI "TARGET_ZARCH") SI HI QI]) (define_mode_iterator DINT [(TI "TARGET_ZARCH") DI SI HI QI]) +(define_mode_iterator SINT [SI HI QI]) ;; This iterator allows some 'ashift' and 'lshiftrt' pattern to be defined from ;; the same template. @@ -3750,25 +3751,93 @@ } }) -(define_insn "*extzv_zEC12" +(define_insn "*extzv" [(set (match_operand:GPR 0 "register_operand" "=d") (zero_extract:GPR (match_operand:GPR 1 "register_operand" "d") (match_operand 2 "const_int_operand" "") ; size - (match_operand 3 "const_int_operand" "")))] ; start] - "TARGET_ZEC12" - "risbgn\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, shift - [(set_attr "op_type" "RIE")]) + (match_operand 3 "const_int_operand" ""))) ; start + ] + "" + "\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, shift + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) -(define_insn "*extzv_z10" - [(set (match_operand:GPR 0 "register_operand" "=d") - (zero_extract:GPR - (match_operand:GPR 1 "register_operand" "d") - (match_operand 2 "const_int_operand" "") ; size - (match_operand 3 "const_int_operand" ""))) ; start - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10" - "risbg\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift +; 64 bit: (a & -16) | ((b >> 8) & 15) +(define_insn "*extzvdi_lshiftrt" + [(set (zero_extract:DI (match_operand:DI 0 "register_operand" "+d") + (match_operand 1 "const_int_operand" "") ; size + (match_operand 2 "const_int_operand" "")) ; start + (lshiftrt:DI (match_operand:DI 3 "register_operand" "d") + (match_operand:DI 4 "nonzero_shift_count_operand" "")))] + " + && 64 - UINTVAL (operands[4]) >= UINTVAL (operands[1])" + "\t%0,%3,%2,%2+%1-1,128-%2-%1-%4" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +; 32 bit: (a & -16) | ((b >> 8) & 15) +(define_insn "*_ior_and_sr_ze" + [(set (match_operand:SI 0 "register_operand" "=d") + (ior:SI (and:SI + (match_operand:SI 1 "register_operand" "0") + (match_operand:SI 2 "const_int_operand" "")) + (subreg:SI + (zero_extract:DI + (match_operand:DI 3 "register_operand" "d") + (match_operand 4 "const_int_operand" "") ; size + (match_operand 5 "const_int_operand" "")) ; start + 4)))] + " + && UINTVAL (operands[2]) == (~(0ULL) << UINTVAL (operands[4]))" + "\t%0,%3,64-%4,63,%4+%5" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +; ((int)foo >> 10) & 1; +(define_insn "*extract1bitdi" + [(set (match_operand:DI 0 "register_operand" "=d") + (ne:DI (zero_extract:DI + (match_operand:DI 1 "register_operand" "d") + (const_int 1) ; size + (match_operand 2 "const_int_operand" "")) ; start + (const_int 0)))] + "" + "\t%0,%1,64-1,128+63,%2+1" ; dst, src, start, end, shift + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +(define_insn "*_and_subregdi_rotr" + [(set (match_operand:DI 0 "register_operand" "=d") + (and:DI (subreg:DI + (rotate:SINT (match_operand:SINT 1 "register_operand" "d") + (match_operand:SINT 2 "const_int_operand" "")) 0) + (match_operand:DI 3 "contiguous_bitmask_operand" "")))] + " + && UINTVAL (operands[3]) < (1ULL << (UINTVAL (operands[2]) & 0x3f))" + "\t%0,%1,%s3,128+%e3,%2" ; dst, src, start, end, shift + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +(define_insn "*_and_subregdi_rotl" + [(set (match_operand:DI 0 "register_operand" "=d") + (and:DI (subreg:DI + (rotate:SINT (match_operand:SINT 1 "register_operand" "d") + (match_operand:SINT 2 "const_int_operand" "")) 0) + (match_operand:DI 3 "contiguous_bitmask_operand" "")))] + " + && !(UINTVAL (operands[3]) & ((1ULL << (UINTVAL (operands[2]) & 0x3f)) - 1))" + "\t%0,%1,%s3,128+%e3,%2" ; dst, src, start, end, shift + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +(define_insn "*_di_and_rot" + [(set (match_operand:DI 0 "register_operand" "=d") + (and:DI (rotate:DI (match_operand:DI 1 "register_operand" "d") + (match_operand:DI 2 "const_int_operand" "")) + (match_operand:DI 3 "contiguous_bitmask_operand" "")))] + "" + "\t%0,%1,%s3,128+%e3,%2" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -3842,74 +3911,126 @@ ; The normal RTL expansion will never generate a zero_extract where ; the location operand isn't word mode. However, we do this in the ; back-end when generating atomic operations. See s390_two_part_insv. -(define_insn "*insv_zEC12" +(define_insn "*insv" [(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d") (match_operand 1 "const_int_operand" "I") ; size (match_operand 2 "const_int_operand" "I")) ; pos (match_operand:GPR 3 "nonimmediate_operand" "d"))] - "TARGET_ZEC12 - && (INTVAL (operands[1]) + INTVAL (operands[2])) <= " - "risbgn\t%0,%3,%2,%2+%1-1,-%2-%1" - [(set_attr "op_type" "RIE")]) - -(define_insn "*insv_z10" - [(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d") - (match_operand 1 "const_int_operand" "I") ; size - (match_operand 2 "const_int_operand" "I")) ; pos - (match_operand:GPR 3 "nonimmediate_operand" "d")) - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10 + " && (INTVAL (operands[1]) + INTVAL (operands[2])) <= " - "risbg\t%0,%3,%2,%2+%1-1,-%2-%1" + "\t%0,%3,%2,%2+%1-1,-%2-%1" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) ; and op1 with a mask being 1 for the selected bits and 0 for the rest ; and op3=op0 with a mask being 0 for the selected bits and 1 for the rest -(define_insn "*insv_zEC12_noshift" - [(set (match_operand:GPR 0 "nonimmediate_operand" "=d") - (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "d") +(define_insn "*insv_noshift" + [(set (match_operand:GPR 0 "nonimmediate_operand" "=d,d") + (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "d,0") (match_operand:GPR 2 "contiguous_bitmask_operand" "")) - (and:GPR (match_operand:GPR 3 "nonimmediate_operand" "0") + (and:GPR (match_operand:GPR 3 "nonimmediate_operand" "0,d") (match_operand:GPR 4 "const_int_operand" ""))))] - "TARGET_ZEC12 && INTVAL (operands[2]) == ~INTVAL (operands[4])" - "risbgn\t%0,%1,%2,%2,0" - [(set_attr "op_type" "RIE")]) + " && INTVAL (operands[2]) == ~INTVAL (operands[4])" + "@ + \t%0,%1,%2,%2,0 + \t%0,%3,%4,%4,0" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) -(define_insn "*insv_z10_noshift" - [(set (match_operand:GPR 0 "nonimmediate_operand" "=d") - (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "d") - (match_operand:GPR 2 "contiguous_bitmask_operand" "")) - (and:GPR (match_operand:GPR 3 "nonimmediate_operand" "0") - (match_operand:GPR 4 "const_int_operand" "")))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10 && INTVAL (operands[2]) == ~INTVAL (operands[4])" - "risbg\t%0,%1,%2,%2,0" +(define_insn "*insv_z10_noshift_cc" + [(set (reg CC_REGNUM) + (compare + (ior:DI + (and:DI (match_operand:DI 1 "nonimmediate_operand" "d,0") + (match_operand:DI 2 "contiguous_bitmask_operand" "")) + (and:DI (match_operand:DI 3 "nonimmediate_operand" "0,d") + (match_operand:DI 4 "const_int_operand" ""))) + (const_int 0))) + (set (match_operand:DI 0 "nonimmediate_operand" "=d,d") + (ior:DI (and:DI (match_dup 1) (match_dup 2)) + (and:DI (match_dup 3) (match_dup 4))))] + "TARGET_Z10 && s390_match_ccmode (insn, CCSmode) + && INTVAL (operands[2]) == ~INTVAL (operands[4])" + "@ + risbg\t%0,%1,%s2,%e2,0 + risbg\t%0,%3,%s4,%e4,0" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +(define_insn "*insv_z10_noshift_cconly" + [(set + (reg CC_REGNUM) + (compare + (ior:DI + (and:DI (match_operand:DI 1 "nonimmediate_operand" "d,0") + (match_operand:DI 2 "contiguous_bitmask_operand" "")) + (and:DI (match_operand:DI 3 "nonimmediate_operand" "0,d") + (match_operand:DI 4 "const_int_operand" ""))) + (const_int 0))) + (clobber (match_scratch:DI 0 "=d,d"))] + "TARGET_Z10 && s390_match_ccmode (insn, CCSmode) + && INTVAL (operands[2]) == ~INTVAL (operands[4])" + "@ + risbg\t%0,%1,%s2,%e2,0 + risbg\t%0,%3,%s4,%e4,0" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) ; Implement appending Y on the left of S bits of X ; x = (y << s) | (x & ((1 << s) - 1)) -(define_insn "*insv_zEC12_appendbitsleft" +(define_insn "*insv_appendbitsleft" [(set (match_operand:GPR 0 "nonimmediate_operand" "=d") (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "0") (match_operand:GPR 2 "immediate_operand" "")) (ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "d") (match_operand:GPR 4 "nonzero_shift_count_operand" ""))))] - "TARGET_ZEC12 && UINTVAL (operands[2]) == (1UL << UINTVAL (operands[4])) - 1" - "risbgn\t%0,%3,,64-%4-1,%4" + " + && UINTVAL (operands[2]) == (1UL << UINTVAL (operands[4])) - 1" + "\t%0,%3,,64-%4-1,%4" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) -(define_insn "*insv_z10_appendbitsleft" - [(set (match_operand:GPR 0 "nonimmediate_operand" "=d") - (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "0") - (match_operand:GPR 2 "immediate_operand" "")) - (ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "d") - (match_operand:GPR 4 "nonzero_shift_count_operand" "")))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10 && !TARGET_ZEC12 && UINTVAL (operands[2]) == (1UL << UINTVAL (operands[4])) - 1" - "risbg\t%0,%3,,64-%4-1,%4" +; a = ((i32)a & -16777216) | (((ui32)b) >> 8) +(define_insn "*__ior_and_lshiftrt" + [(set (match_operand:GPR 0 "register_operand" "=d") + (ior:GPR (and:GPR + (match_operand:GPR 1 "register_operand" "0") + (match_operand:GPR 2 "const_int_operand" "")) + (lshiftrt:GPR + (match_operand:GPR 3 "register_operand" "d") + (match_operand:GPR 4 "nonzero_shift_count_operand" ""))))] + " && UINTVAL (operands[2]) + == (~(0ULL) << (GET_MODE_BITSIZE (mode) - UINTVAL (operands[4])))" + "\t%0,%3,%4,63,64-%4" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +; (ui32)(((ui64)x) >> 48) | ((i32)y & -65536); +(define_insn "*_sidi_ior_and_lshiftrt" + [(set (match_operand:SI 0 "register_operand" "=d") + (ior:SI (and:SI + (match_operand:SI 1 "register_operand" "0") + (match_operand:SI 2 "const_int_operand" "")) + (subreg:SI + (lshiftrt:DI + (match_operand:DI 3 "register_operand" "d") + (match_operand:DI 4 "nonzero_shift_count_operand" "")) 4)))] + " + && UINTVAL (operands[2]) == ~(~(0ULL) >> UINTVAL (operands[4]))" + "\t%0,%3,%4,63,64-%4" + [(set_attr "op_type" "RIE") + (set_attr "z10prop" "z10_super_E1")]) + +; (ui32)(((ui64)x) >> 12) & -4 +(define_insn "*trunc_sidi_and_subreg_lshrt" + [(set (match_operand:SI 0 "register_operand" "=d") + (and:SI + (subreg:SI (lshiftrt:DI + (match_operand:DI 1 "register_operand" "d") + (match_operand:DI 2 "nonzero_shift_count_operand" "")) 4) + (match_operand:SI 3 "contiguous_bitmask_nowrap_operand" "")))] + "" + "\t%0,%1,%t3,128+%f3,64-%2" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -7049,32 +7170,30 @@ "s390_narrow_logical_operator (AND, &operands[0], &operands[1]);") ;; These two are what combine generates for (ashift (zero_extract)). -(define_insn "*extzv__srl" +(define_insn "*extzv__srl" [(set (match_operand:GPR 0 "register_operand" "=d") (and:GPR (lshiftrt:GPR (match_operand:GPR 1 "register_operand" "d") (match_operand:GPR 2 "nonzero_shift_count_operand" "")) - (match_operand:GPR 3 "contiguous_bitmask_operand" ""))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10 + (match_operand:GPR 3 "contiguous_bitmask_operand" "")))] + " /* Note that even for the SImode pattern, the rotate is always DImode. */ && s390_extzv_shift_ok (, -INTVAL (operands[2]), INTVAL (operands[3]))" - "risbg\t%0,%1,%3,128+%3,64-%2" + "\t%0,%1,%3,128+%3,64-%2" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) -(define_insn "*extzv__sll" +(define_insn "*extzv__sll" [(set (match_operand:GPR 0 "register_operand" "=d") (and:GPR (ashift:GPR (match_operand:GPR 1 "register_operand" "d") (match_operand:GPR 2 "nonzero_shift_count_operand" "")) - (match_operand:GPR 3 "contiguous_bitmask_operand" ""))) - (clobber (reg:CC CC_REGNUM))] - "TARGET_Z10 + (match_operand:GPR 3 "contiguous_bitmask_operand" "")))] + " && s390_extzv_shift_ok (, INTVAL (operands[2]), INTVAL (operands[3]))" - "risbg\t%0,%1,%3,128+%3,%2" + "\t%0,%1,%3,128+%3,%2" [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md index 8a1b814..ad45644 100644 --- a/gcc/config/s390/subst.md +++ b/gcc/config/s390/subst.md @@ -120,3 +120,24 @@ (clobber (match_scratch:DSI 0 "=d,d"))]) (define_subst_attr "cconly" "cconly_subst" "" "_cconly") + + +; Does transformations to switch between patterns unsing risbg + +; clobber CC (z10) and risbgn without clobber (zEC12). +(define_subst "clobbercc_or_nocc_subst" + [(set (match_operand 0 "" "") (match_operand 1 "" ""))] + "" + [(set (match_dup 0) (match_dup 1)) + (clobber (reg:CC CC_REGNUM))]) + +; Use this in the insn name to add the target suffix. +(define_subst_attr "clobbercc_or_nocc" "clobbercc_or_nocc_subst" + "_nocc" "_clobbercc") + +; Use this in the condition. +(define_subst_attr "z10_or_zEC12_cond" "clobbercc_or_nocc_subst" + "TARGET_ZEC12" "TARGET_Z10 && ! TARGET_ZEC12") + +; Use this instead of the risbg instruction. +(define_subst_attr "risbg_n" "clobbercc_or_nocc_subst" + "risbgn" "risbg") diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-1.c b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c new file mode 100644 index 0000000..30350d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/risbg-ll-1.c @@ -0,0 +1,498 @@ +// Test sequences that can use RISBG with a zeroed first operand. +// The tests here assume that RISBLG isn't available. + +/* Tests ported from the Llvm testsuite. */ + +/* { dg-do compile { target s390x-*-* } } */ +/* { dg-options "-O3 -march=z10 -mzarch -fno-asynchronous-unwind-tables" } */ + +#define i64 signed long long +#define ui64 unsigned long long +#define i32 signed int +#define ui32 unsigned int +#define i8 signed char +#define ui8 unsigned char + +// Test an extraction of bit 0 from a right-shifted value. +i32 f1 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f1:\n\trisbg\t%r2,%r2,64-1,128\\\+63,53\\\+1" } } */ + i32 v_shr = ((ui32)v_foo) >> 10; + i32 v_and = v_shr & 1; + return v_and; +} + +// ...and again with i64. +i64 f2 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f2:\n\trisbg\t%r2,%r2,64-1,128\\\+63,53\\\+1" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f2:\n\trisbg\t%r3,%r3,64-1,128\\\+63,53\\\+1\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_shr = ((ui64)v_foo) >> 10; + i64 v_and = v_shr & 1; + return v_and; +} + +// Test an extraction of other bits from a right-shifted value. +i32 f3 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f3:\n\trisbg\t%r2,%r2,60,128\\\+61,64-22" } } */ + i32 v_shr = ((ui32)v_foo) >> 22; + i32 v_and = v_shr & 12; + return v_and; +} + +// ...and again with i64. +i64 f4 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f4:\n\trisbg\t%r2,%r2,60,128\\\+61,64-22" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f4:\n\trisbg\t%r3,%r3,60,128\\\+61,64-22\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_shr = ((ui64)v_foo) >> 22; + i64 v_and = v_shr & 12; + return v_and; +} + +// Test an extraction of most bits from a right-shifted value. +// The range should be reduced to exclude the zeroed high bits. +i32 f5 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f5:\n\trisbg\t%r2,%r2,34,128\\\+60,64-2" } } */ + i32 v_shr = ((ui32)v_foo) >> 2; + i32 v_and = v_shr & -8; + return v_and; +} + +// ...and again with i64. +i64 f6 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f6:\n\trisbg\t%r2,%r2,2,128\\\+60,64-2" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f6:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,2,128\\\+60,64-2" { target { ! lp64 } } } } */ + i64 v_shr = ((ui64)v_foo) >> 2; + i64 v_and = v_shr & -8; + return v_and; +} + +// Try the next value up (mask ....1111001). This needs a separate shift +// and mask. +i32 f7 (i32 v_foo) +{ + /* Should be + { dg-final { scan-assembler "f7:\n\tsrl\t%r2,2\n\tnill\t%r2,65529" { xfail { lp64 } } } } + but because a zeroextend is merged into the pattern it is actually + { dg-final { scan-assembler "f7:\n\tsrl\t%r2,2\n\tlgfi\t%r1,1073741817\n\tngr\t%r2,%r1" { target { lp64 } } } } + { dg-final { scan-assembler "f7:\n\tsrl\t%r2,2\n\tnill\t%r2,65529" { target { ! lp64 } } } } */ + i32 v_shr = ((ui32)v_foo) >> 2; + i32 v_and = v_shr & -7; + return v_and; +} + +// ...and again with i64. +i64 f8 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f8:\n\tsrlg\t%r2,%r2,2\n\tnill\t%r2,65529" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f8:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tsrlg\t%r2,%r3,2\n\tnill\t%r2,65529" { target { ! lp64 } } } } */ + i64 v_shr = ((ui64)v_foo) >> 2; + i64 v_and = v_shr & -7; + return v_and; +} + +// Test an extraction of bits from a left-shifted value. The range should +// be reduced to exclude the zeroed low bits. +i32 f9 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f9:\n\trisbg\t%r2,%r2,56,128\\\+61,2" } } */ + i32 v_shr = v_foo << 2; + i32 v_and = v_shr & 255; + return v_and; +} + +// ...and again with i64. +i64 f10 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f10:\n\trisbg\t%r2,%r2,56,128\\\+61,2" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f10:\n\trisbg\t%r3,%r3,56,128\\\+61,2\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_shr = v_foo << 2; + i64 v_and = v_shr & 255; + return v_and; +} + +// Try a wrap-around mask (mask ....111100001111). This needs a separate shift +// and mask. +i32 f11 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f11:\n\tsll\t%r2,2\n\tnill\t%r2,65295" } } */ + i32 v_shr = v_foo << 2; + i32 v_and = v_shr & -241; + return v_and; +} + +// ...and again with i64. +i64 f12 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f12:\n\tsllg\t%r2,%r2,2\n\tnill\t%r2,65295" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f12:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tsllg\t%r2,%r3,2\n\tnill\t%r2,65295" { target { ! lp64 } } } } */ + i64 v_shr = v_foo << 2; + i64 v_and = v_shr & -241; + return v_and; +} + +// Test an extraction from a rotated value, no mask wraparound. +// This is equivalent to the lshr case, because the bits from the +// shl are not used. +i32 f13 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f13:\n\trisbg\t%r2,%r2,56,128\\\+60,32\\\+14" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f13:\n\trll\t%r2,%r2,14\n\tnilf\t%r2,248" { target { ! lp64 } } } } */ + i32 v_parta = v_foo << 14; + i32 v_partb = ((ui32)v_foo) >> 18; + i32 v_rotl = v_parta | v_partb; + i32 v_and = v_rotl & 248; + return v_and; +} + +// ...and again with i64. +i64 f14 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f14:\n\trisbg\t%r2,%r2,56,128\\\+60,14" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f14:\n\trisbg\t%r3,%r2,56,128\\\+60,46\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_parta = v_foo << 14; + i64 v_partb = ((ui64)v_foo) >> 50; + i64 v_rotl = v_parta | v_partb; + i64 v_and = v_rotl & 248; + return v_and; +} + +// Try a case in which only the bits from the shl are used. +i32 f15 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f15:\n\trisbg\t%r2,%r2,47,128\\\+49,14" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f15:\n\trll\t%r2,%r2,14\n\tnilf\t%r2,114688" { target { ! lp64 } } } } */ + i32 v_parta = v_foo << 14; + i32 v_partb = ((ui32)v_foo) >> 18; + i32 v_rotl = v_parta | v_partb; + i32 v_and = v_rotl & 114688; + return v_and; +} + +// ...and again with i64. +i64 f16 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f16:\n\trisbg\t%r2,%r2,47,128\\\+49,14" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f16:\n\trisbg\t%r3,%r3,47,128\\\+49,14\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_parta = v_foo << 14; + i64 v_partb = ((ui64)v_foo) >> 50; + i64 v_rotl = v_parta | v_partb; + i64 v_and = v_rotl & 114688; + return v_and; +} + +// Test a 32-bit rotate in which both parts of the OR are needed. +// This needs a separate shift and mask. +i32 f17 (i32 v_foo) +{ + /* Should be + { dg-final { scan-assembler "f17:\n\trll\t%r2,%r2,4\n\tnilf\t%r2,126" { xfail { lp64 } } } } + but because a zeroextend is merged into the pattern it is actually + { dg-final { scan-assembler "f17:\n\trll\t%r2,%r2,4\n\trisbg\t%r2,%r2,57,128\\\+62,0" { target { lp64 } } } } + { dg-final { scan-assembler "f17:\n\trll\t%r2,%r2,4\n\tnilf\t%r2,126" { target { ! lp64 } } } } */ + i32 v_parta = v_foo << 4; + i32 v_partb = ((ui32)v_foo) >> 28; + i32 v_rotl = v_parta | v_partb; + i32 v_and = v_rotl & 126; + return v_and; +} + +// ...and for i64, where RISBG should do the rotate too. +i64 f18 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f18:\n\trisbg\t%r2,%r2,57,128\\\+62,4" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f18:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tlhi\t%r2,0\n\trisbg\t%r3,%r3,57,128\\\+62,4" { target { ! lp64 } } } } */ + i64 v_parta = v_foo << 4; + i64 v_partb = ((ui64)v_foo) >> 60; + i64 v_rotl = v_parta | v_partb; + i64 v_and = v_rotl & 126; + return v_and; +} + +// Test an arithmetic shift right in which some of the sign bits are kept. +// This needs a separate shift and mask. +i32 f19 (i32 v_foo) +{ + /* Should be + { dg-final { scan-assembler "f19:\n\tsra\t%r2,28\n\tnilf\t%r2,30" { xfail { lp64 } } } } + but because a zeroextend is merged into the pattern it is actually + { dg-final { scan-assembler "f19:\n\tsra\t%r2,28\n\trisbg\t%r2,%r2,59,128\\\+62,0" { target { lp64 } } } } + { dg-final { scan-assembler "f19:\n\tsra\t%r2,28\n\tnilf\t%r2,30" { target { ! lp64 } } } } */ + i32 v_shr = v_foo >> 28; + i32 v_and = v_shr & 30; + return v_and; +} + +// ...and again with i64. In this case RISBG is the best way of doing the AND. +i64 f20 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f20:\n\tsrag\t%r2,%r2,60\n\trisbg\t%r2,%r2,59,128\\\+62,0" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f20:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tlhi\t%r2,0\n\tsrag\t%r3,%r3,60\n\tnilf\t%r3,30" { target { ! lp64 } } } } */ + i64 v_shr = v_foo >> 60; + i64 v_and = v_shr & 30; + return v_and; +} + +// Now try an arithmetic right shift in which the sign bits aren't needed. +// Note: Unlike Llvm, Gcc replaces the ashrt with a lshrt in any case, using +// a risbg pattern without ashrt. +i32 f21 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f21:\n\trisbg\t%r2,%r2,60,128\\\+62,64-28" } } */ + i32 v_shr = v_foo >> 28; + i32 v_and = v_shr & 14; + return v_and; +} + +// ...and again with i64. +i64 f22 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f22:\n\trisbg\t%r2,%r2,60,128\\\+62,64-60" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f22:\n\trisbg\t%r3,%r2,60,128\\\+62,64-28\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_shr = v_foo >> 60; + i64 v_and = v_shr & 14; + return v_and; +} + +// Check that we use RISBG for shifted values even if the AND is a +// natural zero extension. +i64 f23 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f23:\n\trisbg\t%r2,%r2,64-8,128\\\+63,54\\\+8" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f23:\n\trisbg\t%r3,%r3,64-8,128\\\+63,54\\\+8\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_shr = ((ui64)v_foo) >> 2; + i64 v_and = v_shr & 255; + return v_and; +} + +// Test a case where the AND comes before a rotate. This needs a separate +// mask and rotate. +i32 f24 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f24:\n\tnilf\t%r2,254\n\trll\t%r2,%r2,29" } } */ + i32 v_and = v_foo & 254; + i32 v_parta = ((ui32)v_and) >> 3; + i32 v_partb = v_and << 29; + i32 v_rotl = v_parta | v_partb; + return v_rotl; +} + +// ...and again with i64, where a single RISBG is enough. +i64 f25 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f25:\n\trisbg\t%r2,%r2,57,128\\\+59,3" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f25:\n\trisbg\t%r3,%r3,57,128\\\+59,3\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_and = v_foo & 14; + i64 v_parta = v_and << 3; + i64 v_partb = ((ui64)v_and) >> 61; + i64 v_rotl = v_parta | v_partb; + return v_rotl; +} + +// Test a wrap-around case in which the AND comes before a rotate. +// This again needs a separate mask and rotate. +i32 f26 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f26:\n\tnill\t%r2,65487\n\trll\t%r2,%r2,5" } } */ + i32 v_and = v_foo & -49; + i32 v_parta = v_and << 5; + i32 v_partb = ((ui32)v_and) >> 27; + i32 v_rotl = v_parta | v_partb; + return v_rotl; +} + +// ...and again with i64, where a single RISBG is OK. +i64 f27 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f27:\n\trisbg\t%r2,%r2,55,128\\\+52,5" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f27:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,55,128\\\+52,5" { target { ! lp64 } } } } */ + i64 v_and = v_foo & -49; + i64 v_parta = v_and << 5; + i64 v_partb = ((ui64)v_and) >> 59; + i64 v_rotl = v_parta | v_partb; + return v_rotl; +} + +// Test a case where the AND comes before a shift left. +i32 f28 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f28:\n\trisbg\t%r2,%r2,32,128\\\+45,17" } } */ + i32 v_and = v_foo & 32766; + i32 v_shl = v_and << 17; + return v_shl; +} + +// ...and again with i64. +i64 f29 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f29:\n\trisbg\t%r2,%r2,0,128\\\+13,49" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f29:\n\trisbg\t%r\[23\],%r3,0,128\\\+13,49\n\tlr\t%r\[23\],%r\[32\]\n\tsrlg\t%r2,%r2" { target { ! lp64 } } } } */ + i64 v_and = v_foo & 32766; + i64 v_shl = v_and << 49; + return v_shl; +} + +// Test the next shift up from f28, in which the mask should get shortened. +i32 f30 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f30:\n\trisbg\t%r2,%r2,32,128\\\+44,18" } } */ + i32 v_and = v_foo & 32766; + i32 v_shl = v_and << 18; + return v_shl; +} + +// ...and again with i64. +i64 f31 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f31:\n\trisbg\t%r2,%r2,0,128\\\+12,50" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f31:\n\trisbg\t%r\[23\],%r3,0,128\\\+12,50\n\tlr\t%r\[23\],%r\[32\]\n\tsrlg\t%r2,%r2" { target { ! lp64 } } } } */ + i64 v_and = v_foo & 32766; + i64 v_shl = v_and << 50; + return v_shl; +} + +// Test a wrap-around case in which the shift left comes after the AND. +// We can't use RISBG for the shift in that case. +i32 f32 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f32:\n\tsll\t%r2,10\n\tnill\t%r2,58368" } } */ + i32 v_and = v_foo & -7; + i32 v_shl = v_and << 10; + return v_shl; +} + +// ...and again with i64. +i64 f33 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f33:\n\tsllg\t%r2,%r2,10\n\tnill\t%r2,58368" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f33:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tsllg\t%r2,%r3,10\n\tnill\t%r2,58368" { target { ! lp64 } } } } */ + i64 v_and = v_foo & -7; + i64 v_shl = v_and << 10; + return v_shl; +} + +// Test a case where the AND comes before a shift right. +i32 f34 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f34:\n\trisbg\t%r2,%r2,64-7,128\\\+63,48\\\+7" } } */ + i32 v_and = v_foo & 65535; + i32 v_shl = ((ui32)v_and) >> 9; + return v_shl; +} + +// ...and again with i64. +i64 f35 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f35:\n\trisbg\t%r2,%r2,64-7,128\\\+63,48\\\+7" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f35:\n\trisbg\t%r3,%r3,64-7,128\\\+63,48\\\+7\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i64 v_and = v_foo & 65535; + i64 v_shl = ((ui64)v_and) >> 9; + return v_shl; +} + +// Test a wrap-around case where the AND comes before a shift right. +// We can't use RISBG for the shift in that case. +i32 f36 (i32 v_foo) +{ + /* { dg-final { scan-assembler "f36:\n\tsrl\t%r2,1\n\tlgfi\t%r1,2147483635\n\tngr\t%r2,%r1" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f36:\n\tsrl\t%r2,1\n\tnilf\t%r2,2147483635" { target { ! lp64 } } } } */ + i32 v_and = v_foo & -25; + i32 v_shl = ((ui32)v_and) >> 1; + return v_shl; +} + +// ...and again with i64. +i64 f37 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f37:\n\(\t.*\n\)*\tsrlg\t%r2,%r2,1\n\tng\t%r2," { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f37:\n\(\t.*\n\)*\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tsrlg\t%r2,%r3,1\n\tng\t%r2," { target { ! lp64 } } } } */ + i64 v_and = v_foo & -25; + i64 v_shl = ((ui64)v_and) >> 1; + return v_shl; +} + +// Test a combination involving a large ASHR and a shift left. We can't +// use RISBG there. +i64 f38 (i64 v_foo) +{ + /* { dg-final { scan-assembler "f38:\n\tsrag\t%r2,%r2,32\n\tsllg\t%r2,%r2,5" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f38:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tsrag\t%r2,%r3,32\n\tsllg\t%r2,%r2,5" { target { ! lp64 } } } } */ + i64 v_ashr = v_foo >> 32; + i64 v_shl = v_ashr << 5; + return v_shl; +} + +// Try a similar thing in which no shifted sign bits are kept. +i64 f39 (i64 v_foo, i64 *v_dest) +{ + /* { dg-final { scan-assembler "f39:\n\tsrag\t%r2,%r2,35\n\(\t.*\n\)*\trisbg\t%r2,%r2,33,128\\\+61,2" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f39:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tlhi\t%r2,0\n\tsrag\t%r3,%r3,35\n\(\t.*\n\)*\trisbg\t%r3,%r3,33,128\\\+61,2" { target { ! lp64 } } } } */ + i64 v_ashr = v_foo >> 35; + *v_dest = v_ashr; + i64 v_shl = v_ashr << 2; + i64 v_and = v_shl & 2147483647; + return v_and; +} + +// ...and again with the next highest shift value, where one sign bit is kept. +i64 f40 (i64 v_foo, i64 *v_dest) +{ + /* { dg-final { scan-assembler "f40:\n\tsrag\t%r2,%r2,36\n\(\t.*\n\)*\trisbg\t%r2,%r2,33,128\\\+61,2" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f40:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\tlhi\t%r2,0\n\tsrag\t%r3,%r3,36\n\(\t.*\n\)*\trisbg\t%r3,%r3,33,128\\\+61,2" { target { ! lp64 } } } } */ + i64 v_ashr = v_foo >> 36; + *v_dest = v_ashr; + i64 v_shl = v_ashr << 2; + i64 v_and = v_shl & 2147483647; + return v_and; +} + +// Check a case where the result is zero-extended. +i64 f41 (i32 v_a) +{ + /* { dg-final { scan-assembler "f41:\n\trisbg\t%r2,%r2,64-28,128\\\+63,34\\\+28" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f41:\n\trisbg\t%r3,%r2,64-28,128\\\+63,34\\\+28\n\tlhi\t%r2,0" { target { ! lp64 } } } } */ + i32 v_shl = v_a << 2; + i32 v_shr = ((ui32)v_shl) >> 4; + i64 v_ext = (ui64)v_shr; + return v_ext; +} + +// In this case the sign extension is converted to a pair of 32-bit shifts, +// which is then extended to 64 bits. We previously used the wrong bit size +// when testing whether the shifted-in bits of the shift right were significant. +typedef struct { ui64 pad : 63; ui8 a : 1; } t42; +i64 f42 (t42 v_x) +{ + /* { dg-final { scan-assembler "f42:\n\tsllg\t%r2,%r2,63\n\tsrag\t%r2,%r2,63\n\tllgcr\t%r2,%r2" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f42:\n\tsllg\t%r3,%r3,63\n\tlhi\t%r2,0\n\tsrag\t%r3,%r3,63\n\tllcr\t%r3,%r3" { target { ! lp64 } } } } */ + ui8 a = v_x.a << 7; + i8 ext = ((i8)a) >> 7; + i64 ext2 = (ui64)(ui8)ext; + return ext2; +} + +// Check that we get the case where a 64-bit shift is used by a 32-bit and. +i32 f43 (i64 v_x) +{ + /* { dg-final { scan-assembler "f43:\n\trisbg\t%r2,%r2,32,128\\\+61,64-12" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f43:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r2,%r3,32,128\\\+61,64-12" { target { ! lp64 } } } } */ + i64 v_shr3 = ((ui64)v_x) >> 12; + i32 v_shr3_tr = (ui32)v_shr3; + i32 v_conv = v_shr3_tr & -4; + return v_conv; +} + +// Check that we don't get the case where the 32-bit and mask is not contiguous +i32 f44 (i64 v_x) +{ + /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r2,12" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f44:\n\tsrlg\t%r2,%r3,12\n\tnilf\t%r2,10" { target { ! lp64 } } } } */ + i64 v_shr4 = ((ui64)v_x) >> 12; + i32 v_conv = (ui32)v_shr4; + i32 v_and = v_conv & 10; + return v_and; +} diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-2.c b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c new file mode 100644 index 0000000..6588dc7 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/risbg-ll-2.c @@ -0,0 +1,123 @@ +// Test sequences that can use RISBG with a normal first operand. + +/* Tests ported from the Llvm testsuite. */ + +/* { dg-do compile { target s390x-*-* } } */ +/* { dg-options "-O3 -march=z10 -mzarch -fno-asynchronous-unwind-tables" } */ + +#define i64 signed long long +#define ui64 unsigned long long +#define i32 signed int +#define ui32 unsigned int + +// Test a case with two ANDs. +i32 f1 (i32 v_a, i32 v_b) +{ + /* { dg-final { scan-assembler "f1:\n\trisbg\t%r2,%r3,60,62,0" } } */ + i32 v_anda = v_a & -15; + i32 v_andb = v_b & 14; + i32 v_or = v_anda | v_andb; + return v_or; +} + +// ...and again with i64. +i64 f2 (i64 v_a, i64 v_b) +{ + /* { dg-final { scan-assembler "f2:\n\trisbg\t%r2,%r3,60,62,0" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f2:\n\trisbg\t%r3,%r2,0,0\\\+32-1,64-0-32\n\(\t.*\n\)*\trisbg\t%r\[23\],%r5,60,62,0" { target { ! lp64 } } } } */ + i64 v_anda = v_a & -15; + i64 v_andb = v_b & 14; + i64 v_or = v_anda | v_andb; + return v_or; +} + +// Test a case with two ANDs and a shift. +i32 f3 (i32 v_a, i32 v_b) +{ + /* { dg-final { scan-assembler "f3:\n\trisbg\t%r2,%r3,64-4,63,4\\\+52" } } */ + i32 v_anda = v_a & -16; + i32 v_shr = ((ui32)v_b) >> 8; + i32 v_andb = v_shr & 15; + i32 v_or = v_anda | v_andb; + return v_or; +} + +// ...and again with i64. +i64 f4 (i64 v_a, i64 v_b) +{ + /* { dg-final { scan-assembler "f4:\n\trisbg\t%r2,%r3,60,60\\\+4-1,128-60-4-8" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f4:\n\(\t.*\n\)*\trisbg\t%r5,%r5,64-4,128\\\+63,52\\\+4" { target { ! lp64 } } } } */ + i64 v_anda = v_a & -16; + i64 v_shr = ((ui64)v_b) >> 8; + i64 v_andb = v_shr & 15; + i64 v_or = v_anda | v_andb; + return v_or; +} + +// Test a case with a single AND and a left shift. +i32 f5 (i32 v_a, i32 v_b) +{ + /* { dg-final { scan-assembler "f5:\n\trisbg\t%r2,%r3,32,64-10-1,10" } } */ + i32 v_anda = v_a & 1023; + i32 v_shlb = v_b << 10; + i32 v_or = v_anda | v_shlb; + return v_or; +} + +// ...and again with i64. +i64 f6 (i64 v_a, i64 v_b) +{ + /* { dg-final { scan-assembler "f6:\n\trisbg\t%r2,%r3,0,64-10-1,10" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f6:\n\trisbg\t%r5,%r4,0,0\\\+32-1,64-0-32\n\(\t.*\n\)*\trisbg\t%r\[23\],%r5,0,64-10-1,10" { target { ! lp64 } } } } */ + i64 v_anda = v_a & 1023; + i64 v_shlb = v_b << 10; + i64 v_or = v_anda | v_shlb; + return v_or; +} + +// Test a case with a single AND and a right shift. +i32 f7 (i32 v_a, i32 v_b) +{ + /* { dg-final { scan-assembler "f7:\n\trisbg\t%r2,%r3,32\\\+8,63,64-8" } } */ + i32 v_anda = v_a & -16777216; + i32 v_shrb = ((ui32)v_b) >> 8; + i32 v_or = v_anda | v_shrb; + return v_or; +} + +// ...and again with i64. +i64 f8 (i64 v_a, i64 v_b) +{ + /* { dg-final { scan-assembler "f8:\n\trisbg\t%r2,%r3,8,63,64-8" { target { lp64 } } } } */ + /* With -m31 risbg is not really useful here, so do not test for it. */ + i64 v_anda = v_a & -72057594037927936; + i64 v_shrb = ((ui64)v_b) >> 8; + i64 v_or = v_anda | v_shrb; + return v_or; +} + +// Check that we can get the case where a 64-bit shift feeds a 32-bit or of +// ands with complement masks. +i32 f9 (i64 v_x, i32 v_y) +{ + /* { dg-final { scan-assembler "f9:\n\trisbg\t%r3,%r2,48,63,64-48" { target { lp64 } }} } */ + /* { dg-final { scan-assembler "f9:\n\trisbg\t%r4,%r2,32\\+16,63,64-16" { target { ! lp64 } }} } */ + i64 v_shr6 = ((ui64)v_x) >> 48; + i32 v_conv = (ui32)v_shr6; + i32 v_and1 = v_y & -65536; + i32 v_or = v_conv | v_and1; + return v_or; +} + +// Check that we don't get the case where a 64-bit shift feeds a 32-bit or of +// ands with incompatible masks. +i32 f10 (i64 v_x, i32 v_y) +{ + /* { dg-final { scan-assembler "f10:\n\tsrlg\t%r2,%r2,48\n\trosbg\t%r2,%r3,32,39,0" { target { lp64 } } } } */ + /* { dg-final { scan-assembler "f10:\n\tnilf\t%r4,4278190080\n\trosbg\t%r4,%r2,32\\\+16,63,64-16" { target { ! lp64 } } } } */ + i64 v_shr6 = ((ui64)v_x) >> 48; + i32 v_conv = (ui32)v_shr6; + i32 v_and1 = v_y & -16777216; + i32 v_or = v_conv | v_and1; + return v_or; +} diff --git a/gcc/testsuite/gcc.target/s390/risbg-ll-3.c b/gcc/testsuite/gcc.target/s390/risbg-ll-3.c new file mode 100644 index 0000000..838f1ff --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/risbg-ll-3.c @@ -0,0 +1,44 @@ +// Test use of RISBG vs RISBGN on zEC12. + +/* Tests ported from the Llvm testsuite. */ + +/* { dg-do compile { target s390x-*-* } } */ +/* { dg-options "-O3 -march=zEC12 -mzarch -fno-asynchronous-unwind-tables" } */ + +#define i64 signed long long +#define ui64 unsigned long long + +// On zEC12, we generally prefer RISBGN. +i64 f1 (i64 v_a, i64 v_b) +{ +/* { dg-final { scan-assembler "f1:\n\trisbgn\t%r2,%r3,60,60\\\+3-1,128-60-3-1" { target { lp64 } } } } */ +/* { dg-final { scan-assembler "f1:\n\trisbgn\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbgn\t%r3,%r5,60,62,0\n" { target { ! lp64 } } } } */ + i64 v_anda = v_a & -15; + i64 v_andb = v_b & 14; + i64 v_or = v_anda | v_andb; + return v_or; +} + +// But we may fall back to RISBG if we can use the condition code. +extern i64 f2_foo(); +i64 f2 (i64 v_a, i64 v_b) +{ +/* { dg-final { scan-assembler "f2:\n\trisbg\t%r2,%r3,60,62,0\n\tje\t" { target { lp64 } } } } */ +/* { dg-final { scan-assembler "f2:\n\trisbgn\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r3,%r5,60,62,0" { target { ! lp64 } } } } */ + i64 v_anda = v_a & -15; + i64 v_andb = v_b & 14; + i64 v_or = v_anda | v_andb; + if (! v_or) + return f2_foo(); + else + return v_or; +} + +void f2_bar (); +void f2_cconly (i64 v_a, i64 v_b) +{ +/* { dg-final { scan-assembler "f2_cconly:\n\trisbg\t%r3,%r2,63,59,0\n\tjne\t" { target { lp64 } } } } */ +/* { dg-final { scan-assembler "f2_cconly:\n\trisbgn\t%r3,%r2,0,0\\\+32-1,64-0-32\n\trisbg\t%r3,%r5,60,62,0\n\tjne\t" { target { ! lp64 } } } } */ + if ((v_a & -15) | (v_b & 14)) + f2_bar(); +}