From patchwork Tue Sep 15 06:40:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1364139 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=uqcg/yUW; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BrDBs6gmxz9sTH for ; Tue, 15 Sep 2020 16:40:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C42CB393C855; Tue, 15 Sep 2020 06:40:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C42CB393C855 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1600152051; bh=G0xYjt7zab9J376VfQD+OSGoW27qzXziaY6rx2NKuC8=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=uqcg/yUWKwYpuhBJV+W9Kxf9lba+mQ1/UekzPcrtBFB872MN6YOEWAKAqWcRIPN3Y lN0kqQpee2UMDYHEP9FPSYOA+gq3CoqL0kKToNYr1nGyulvcp6qRrU6COqPZ5c5rg4 O9eRSy771yUlnNAtYFWJ/cQ6SDl9BKcE2tgz5y1o= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id A7CB13858D35 for ; Tue, 15 Sep 2020 06:40:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A7CB13858D35 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08F6V5eJ102254; Tue, 15 Sep 2020 02:40:48 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 33jr4jrp6n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Sep 2020 02:40:47 -0400 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 08F6XYFI109588; Tue, 15 Sep 2020 02:40:47 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0b-001b2d01.pphosted.com with ESMTP id 33jr4jrp60-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Sep 2020 02:40:47 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 08F6NWLM010295; Tue, 15 Sep 2020 06:40:45 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma04fra.de.ibm.com with ESMTP id 33guvm1hmc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 15 Sep 2020 06:40:45 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 08F6ehFd27656668 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 Sep 2020 06:40:43 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 325EF11C050; Tue, 15 Sep 2020 06:40:43 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B2DC911C054; Tue, 15 Sep 2020 06:40:40 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.38.242]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 15 Sep 2020 06:40:40 +0000 (GMT) Subject: [PATCH v2] rs6000: Remove useless insns fed into lvx/stvx [PR97019] To: Segher Boessenkool References: <20200914171949.GY28786@gate.crashing.org> Message-ID: <79f5f0a8-fe82-19b4-f5d6-a78359e252f7@linux.ibm.com> Date: Tue, 15 Sep 2020 14:40:38 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <20200914171949.GY28786@gate.crashing.org> Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687 definitions=2020-09-15_04:2020-09-15, 2020-09-15 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 phishscore=0 clxscore=1015 spamscore=0 mlxscore=0 priorityscore=1501 suspectscore=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2009150057 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , GCC Patches , David Edelsohn Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi Segher, Thanks for the review! >> * config/rs6000/rs6000-p8swap.c (insn_rtx_pair_t): New type. > > Please don't do that. The "first" and "second" are completely > meaningless. Also, keeping it separate arrays can very well result in > better machine code, and certainly makes easier to read source code. OK, use separate arrays instead. Here the first is the AND rtx_insn while the second is its fully-expanded rtx, I thought it's better to bundle them together before, make_pair is an easy way for that. > >> +static bool >> +find_alignment_op (rtx_insn *insn, rtx base_reg, >> + vec *and_insn_vec) > > Don't name vecs "_vec" (just keep it "and_insn" here, or sometimes > and_insns is clearer). Done, also for those ommitted below. Thanks! > >> - rtx and_operation = 0; >> + rtx and_operation = NULL_RTX; > > Don't change code randomly (to something arguably worse, even). Done. I may think too much and thought NULL_RTX may be preferred since it has the potential to be changed by defining it as nullptr in the current C++11 context. Bootstrapped/regtested on powerpc64le-linux-gnu P8 again. Does the attached vesion look better? BR, Kewen ----- gcc/ChangeLog: * config/rs6000/rs6000-p8swap.c (find_alignment_op): Adjust to support multiple defintions which are all AND operations with the mask -16B. (recombine_lvx_pattern): Adjust to handle multiple AND operations from find_alignment_op. (recombine_stvx_pattern): Likewise. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr97019.c: New test. diff --git a/gcc/config/rs6000/rs6000-p8swap.c b/gcc/config/rs6000/rs6000-p8swap.c index 3d5dc7d8aae..be863aa479e 100644 --- a/gcc/config/rs6000/rs6000-p8swap.c +++ b/gcc/config/rs6000/rs6000-p8swap.c @@ -2095,11 +2095,15 @@ alignment_mask (rtx_insn *insn) return alignment_with_canonical_addr (SET_SRC (body)); } -/* Given INSN that's a load or store based at BASE_REG, look for a - feeding computation that aligns its address on a 16-byte boundary. - Return the rtx and its containing AND_INSN. */ -static rtx -find_alignment_op (rtx_insn *insn, rtx base_reg, rtx_insn **and_insn) +/* Given INSN that's a load or store based at BASE_REG, check if + all of its feeding computations align its address on a 16-byte + boundary. If so, return true and add all definition insns into + AND_INSNS and their corresponding fully-expanded rtxes for the + masking operations into AND_OPS. */ + +static bool +find_alignment_op (rtx_insn *insn, rtx base_reg, vec *and_insns, + vec *and_ops) { df_ref base_use; struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); @@ -2111,19 +2115,28 @@ find_alignment_op (rtx_insn *insn, rtx base_reg, rtx_insn **and_insn) continue; struct df_link *base_def_link = DF_REF_CHAIN (base_use); - if (!base_def_link || base_def_link->next) - break; + if (!base_def_link) + return false; - /* With stack-protector code enabled, and possibly in other - circumstances, there may not be an associated insn for - the def. */ - if (DF_REF_IS_ARTIFICIAL (base_def_link->ref)) - break; + while (base_def_link) + { + /* With stack-protector code enabled, and possibly in other + circumstances, there may not be an associated insn for + the def. */ + if (DF_REF_IS_ARTIFICIAL (base_def_link->ref)) + return false; - *and_insn = DF_REF_INSN (base_def_link->ref); - and_operation = alignment_mask (*and_insn); - if (and_operation != 0) - break; + rtx_insn *and_insn = DF_REF_INSN (base_def_link->ref); + and_operation = alignment_mask (and_insn); + + /* Stop if we find any one which doesn't align. */ + if (!and_operation) + return false; + + and_insns->safe_push (and_insn); + and_ops->safe_push (and_operation); + base_def_link = base_def_link->next; + } } return and_operation; @@ -2143,11 +2156,14 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *to_delete) rtx mem = XEXP (SET_SRC (body), 0); rtx base_reg = XEXP (mem, 0); - rtx_insn *and_insn; - rtx and_operation = find_alignment_op (insn, base_reg, &and_insn); + auto_vec and_insns; + auto_vec and_ops; + bool is_any_def_and + = find_alignment_op (insn, base_reg, &and_insns, &and_ops); - if (and_operation != 0) + if (is_any_def_and) { + gcc_assert (and_insns.length () == and_ops.length ()); df_ref def; struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); FOR_EACH_INSN_INFO_DEF (def, insn_info) @@ -2168,25 +2184,35 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *to_delete) to_delete[INSN_UID (swap_insn)].replace = true; to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn; - /* However, first we must be sure that we make the - base register from the AND operation available - in case the register has been overwritten. Copy - the base register to a new pseudo and use that - as the base register of the AND operation in - the new LVX instruction. */ - rtx and_base = XEXP (and_operation, 0); - rtx new_reg = gen_reg_rtx (GET_MODE (and_base)); - rtx copy = gen_rtx_SET (new_reg, and_base); - rtx_insn *new_insn = emit_insn_after (copy, and_insn); - set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); - df_insn_rescan (new_insn); - - XEXP (mem, 0) = gen_rtx_AND (GET_MODE (and_base), new_reg, - XEXP (and_operation, 1)); + rtx new_reg = 0; + rtx and_mask = 0; + for (unsigned i = 0; i < and_insns.length (); ++i) + { + /* However, first we must be sure that we make the + base register from the AND operation available + in case the register has been overwritten. Copy + the base register to a new pseudo and use that + as the base register of the AND operation in + the new LVX instruction. */ + rtx_insn *and_insn = and_insns[i]; + rtx and_op = and_ops[i]; + rtx and_base = XEXP (and_op, 0); + if (!new_reg) + { + new_reg = gen_reg_rtx (GET_MODE (and_base)); + and_mask = XEXP (and_op, 1); + } + rtx copy = gen_rtx_SET (new_reg, and_base); + rtx_insn *new_insn = emit_insn_after (copy, and_insn); + set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); + df_insn_rescan (new_insn); + } + + XEXP (mem, 0) = gen_rtx_AND (GET_MODE (new_reg), new_reg, and_mask); SET_SRC (body) = mem; INSN_CODE (insn) = -1; /* Force re-recognition. */ df_insn_rescan (insn); - + if (dump_file) fprintf (dump_file, "lvx opportunity found at %d\n", INSN_UID (insn)); @@ -2205,11 +2231,14 @@ recombine_stvx_pattern (rtx_insn *insn, del_info *to_delete) rtx mem = SET_DEST (body); rtx base_reg = XEXP (mem, 0); - rtx_insn *and_insn; - rtx and_operation = find_alignment_op (insn, base_reg, &and_insn); + auto_vec and_insns; + auto_vec and_ops; + bool is_any_def_and + = find_alignment_op (insn, base_reg, &and_insns, &and_ops); - if (and_operation != 0) + if (is_any_def_and) { + gcc_assert (and_insns.length () == and_ops.length ()); rtx src_reg = XEXP (SET_SRC (body), 0); df_ref src_use; struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); @@ -2234,25 +2263,35 @@ recombine_stvx_pattern (rtx_insn *insn, del_info *to_delete) to_delete[INSN_UID (swap_insn)].replace = true; to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn; - /* However, first we must be sure that we make the - base register from the AND operation available - in case the register has been overwritten. Copy - the base register to a new pseudo and use that - as the base register of the AND operation in - the new STVX instruction. */ - rtx and_base = XEXP (and_operation, 0); - rtx new_reg = gen_reg_rtx (GET_MODE (and_base)); - rtx copy = gen_rtx_SET (new_reg, and_base); - rtx_insn *new_insn = emit_insn_after (copy, and_insn); - set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); - df_insn_rescan (new_insn); - - XEXP (mem, 0) = gen_rtx_AND (GET_MODE (and_base), new_reg, - XEXP (and_operation, 1)); + rtx new_reg = 0; + rtx and_mask = 0; + for (unsigned i = 0; i < and_insns.length (); ++i) + { + /* However, first we must be sure that we make the + base register from the AND operation available + in case the register has been overwritten. Copy + the base register to a new pseudo and use that + as the base register of the AND operation in + the new STVX instruction. */ + rtx_insn *and_insn = and_insns[i]; + rtx and_op = and_ops[i]; + rtx and_base = XEXP (and_op, 0); + if (!new_reg) + { + new_reg = gen_reg_rtx (GET_MODE (and_base)); + and_mask = XEXP (and_op, 1); + } + rtx copy = gen_rtx_SET (new_reg, and_base); + rtx_insn *new_insn = emit_insn_after (copy, and_insn); + set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn)); + df_insn_rescan (new_insn); + } + + XEXP (mem, 0) = gen_rtx_AND (GET_MODE (new_reg), new_reg, and_mask); SET_SRC (body) = src_reg; INSN_CODE (insn) = -1; /* Force re-recognition. */ df_insn_rescan (insn); - + if (dump_file) fprintf (dump_file, "stvx opportunity found at %d\n", INSN_UID (insn)); diff --git a/gcc/testsuite/gcc.target/powerpc/pr97019.c b/gcc/testsuite/gcc.target/powerpc/pr97019.c new file mode 100644 index 00000000000..cb4cba4a284 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr97019.c @@ -0,0 +1,82 @@ +/* This issue can only exist on little-endian P8 targets, since + the built-in functions vec_ld/vec_st will use lxvd2x/stxvd2x + (P8 big-endian) or lxv/stxv (P9 and later). */ +/* { dg-do compile { target { powerpc_p8vector_ok && le } } } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ + +/* Test there are no useless instructions "rldicr x,y,0,59" + to align the addresses for lvx/stvx. */ + +extern int a, b, c; +extern vector unsigned long long ev5, ev6, ev7, ev8; +extern int dummy (vector unsigned long long); + +int test_vec_ld(unsigned char *pe) { + + vector unsigned long long v1, v2, v3, v4, v9; + vector unsigned long long v5 = ev5; + vector unsigned long long v6 = ev6; + vector unsigned long long v7 = ev7; + vector unsigned long long v8 = ev8; + + unsigned char *e = pe; + + do { + if (a) { + v1 = __builtin_vec_ld(16, (unsigned long long *)e); + v2 = __builtin_vec_ld(32, (unsigned long long *)e); + v3 = __builtin_vec_ld(48, (unsigned long long *)e); + e = e + 8; + for (int i = 0; i < a; i++) { + v4 = v5; + v5 = __builtin_crypto_vpmsumd(v1, v6); + v6 = __builtin_crypto_vpmsumd(v2, v7); + v7 = __builtin_crypto_vpmsumd(v3, v8); + e = e + 8; + } + } + v5 = __builtin_vec_ld(16, (unsigned long long *)e); + v6 = __builtin_vec_ld(32, (unsigned long long *)e); + v7 = __builtin_vec_ld(48, (unsigned long long *)e); + if (c) + b = 1; + } while (b); + + return dummy(v4); +} + +int test_vec_st(unsigned char *pe) { + + vector unsigned long long v1, v2, v3, v4; + vector unsigned long long v5 = ev5; + vector unsigned long long v6 = ev6; + vector unsigned long long v7 = ev7; + vector unsigned long long v8 = ev8; + + unsigned char *e = pe; + + do { + if (a) { + __builtin_vec_st(v1, 16, (unsigned long long *)e); + __builtin_vec_st(v2, 32, (unsigned long long *)e); + __builtin_vec_st(v3, 48, (unsigned long long *)e); + e = e + 8; + for (int i = 0; i < a; i++) { + v4 = v5; + v5 = __builtin_crypto_vpmsumd(v1, v6); + v6 = __builtin_crypto_vpmsumd(v2, v7); + v7 = __builtin_crypto_vpmsumd(v3, v8); + e = e + 8; + } + } + __builtin_vec_st(v5, 16, (unsigned long long *)e); + __builtin_vec_st(v6, 32, (unsigned long long *)e); + __builtin_vec_st(v7, 48, (unsigned long long *)e); + if (c) + b = 1; + } while (b); + + return dummy(v4); +} + +/* { dg-final { scan-assembler-not "rldicr\[ \t\]+\[0-9\]+,\[0-9\]+,0,59" } } */