From patchwork Tue May 26 05:49:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1297702 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=GbOG82Jg; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49WNMN22cQz9sRY for ; Tue, 26 May 2020 15:49:34 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 62E11387086E; Tue, 26 May 2020 05:49:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 62E11387086E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1590472170; bh=myOCOPbxdI1miaZcPsNPg2kbkrSgxXbQDpP8yZneVmw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=GbOG82JgO3sgUVXT8VDa5rbOAPez17s7YuaiAhp3PIroVAfe7qHajODcNOyMGX1gb 8d0Qxj6SXhhBfflfHJZZD+F6Dk9pUv0/1C96QNdc35rkbMnmj8Ve51wFLV/Z5Dq+o3 rU9JGUyoyQJjFZjddtrsaMnAoFX4jkdJ9Vlew4Rg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8457C3851C3A for ; Tue, 26 May 2020 05:49:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8457C3851C3A Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 04Q5XL0h000654; Tue, 26 May 2020 01:49:24 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 316vqfm1j0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 26 May 2020 01:49:23 -0400 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 04Q5Y5P5002944; Tue, 26 May 2020 01:49:23 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 316vqfm1hc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 26 May 2020 01:49:23 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 04Q5f568008291; Tue, 26 May 2020 05:49:21 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma03ams.nl.ibm.com with ESMTP id 316uf858t9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 26 May 2020 05:49:21 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 04Q5m5Ar57672098 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 26 May 2020 05:48:05 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E85E342042; Tue, 26 May 2020 05:49:18 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 772824203F; Tue, 26 May 2020 05:49:16 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.50.147]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 26 May 2020 05:49:16 +0000 (GMT) To: GCC Patches Subject: [PATCH 0/7] Support vector load/store with length Message-ID: <30906c0d-3b9f-e1e6-156f-c01fcf229cb9@linux.ibm.com> Date: Tue, 26 May 2020 13:49:14 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-05-25_10:2020-05-25, 2020-05-25 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 mlxlogscore=999 spamscore=0 clxscore=1015 priorityscore=1501 impostorscore=0 cotscore=-2147483648 mlxscore=0 suspectscore=9 phishscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005260042 X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Guenther , Bill Schmidt , dje.gcc@gmail.com, Segher Boessenkool Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi all, This patch set adds support for vector load/store with length, Power ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with length, it's good to be exploited for those cases we don't have enough stuffs to fill in the whole vector like epilogues. This support mainly refers to the handlings for fully-predicated loop but it also covers the epilogue usage. Now it supports two modes controlled by parameter vect-with-length-scope, it can support any loops fully with length or just for those cases with small iteration counts less than VF like epilogue, for now I don't have ready env to benchmark it, but based on the current inefficient length generation, I don't think it's a good idea to adopt vector with length for any loops. For the main loop which used to be vectorized, it increases register pressure and introduces extra computation for length, the pro for icache seems not comparable. But I think it might be a good idea to keep this parameter there for functionality testing, further benchmarking and other ports' potential future supports. As we don't have any benchmarking, this support isn't enabled by default for any particular cpus, all testings are with explicit parameter setting. Bootstrapped on powerpc64le-linux-gnu P9 with all vect-with-length-scope settings (0/1/2). Regress-test passed with vector-with-length-scope 0, for the other twos, several vector related cases need to be updated, no remarkable failures found. BTW, P9 is the one which supports the functionality but not ready to evaluate the performance. Here still are many things to be supported or improved, not limited to: - reduction/live-out support - Cost model adding/tweaking - IFN gimple folding - Some unnecessary ops improvements eg: vector_size check - Some possible refactoring I'll support/post the patches gradually. Any comments are highly appreciated. BR, Kewen ----- Patch set outline: [PATCH 1/7] ifn/optabs: Support vector load/store with length [PATCH 2/7] rs6000: lenload/lenstore optab support [PATCH 3/7] vect: Factor out codes for niters smaller than vf check [PATCH 4/7] hook/rs6000: Add vectorize length mode for vector with length [PATCH 5/7] vect: Support vector load/store with length in vectorizer [PATCH 6/7] ivopts: Add handlings for vector with length IFNs [PATCH 7/7] rs6000/testsuite: Vector with length test cases gcc/config/rs6000/rs6000.c | 3 + gcc/config/rs6000/vsx.md | 30 ++++++++++ gcc/doc/invoke.texi | 7 +++ gcc/doc/md.texi | 16 ++++++ gcc/doc/tm.texi | 6 ++ gcc/doc/tm.texi.in | 2 + gcc/internal-fn.c | 13 ++++- gcc/internal-fn.def | 6 ++ gcc/optabs.def | 2 + gcc/params.opt | 4 ++ gcc/target.def | 7 +++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-1.h | 18 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-2.h | 17 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-3.h | 31 +++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-4.h | 24 ++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-5.h | 29 ++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-6.h | 32 +++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c | 15 +++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-2.c | 15 +++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-3.c | 18 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-4.c | 15 +++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-5.c | 15 +++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-6.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-1.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-2.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-3.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-4.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-5.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-6.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-3.c | 17 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-4.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-5.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 16 ++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-1.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-2.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-3.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-4.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-5.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-6.c | 10 ++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-1.h | 34 ++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-2.h | 36 ++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-3.h | 34 ++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-4.h | 62 +++++++++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-5.h | 45 +++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-6.h | 52 +++++++++++++++++ gcc/testsuite/gcc.target/powerpc/p9-vec-length.h | 14 +++++ gcc/tree-ssa-loop-ivopts.c | 4 ++ gcc/tree-vect-loop-manip.c | 268 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- gcc/tree-vect-loop.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- gcc/tree-vect-stmts.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++++++ gcc/tree-vectorizer.h | 32 +++++++++++ 53 files changed, 1545 insertions(+), 18 deletions(-)