From patchwork Wed Nov 2 02:05:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 690225 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3t7rzJ3nzwz9t9b for ; Wed, 2 Nov 2016 13:05:31 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=rmXMCOUB; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=XTLSe Xi+wYIFVkWSRwCe4Ud1/VFHSI6lvBFa7Z/eYdZFOkykK+sQfEc1qPVsa5jVkL3yd GDT2nkagwZtgf/HtYRxcTmL6VFa4UXkoF5189vJwYOfNv+Ckfc9+WEIWkWcHeGV7 ZbQZDWoXV9Q6PUq77cz2ru5uwUtRYeAoGVtPL0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=okU9LJpWmec zyu0+8U5GAFlzK+c=; b=rmXMCOUBeqZ+s9/PAHDhzBo4U361+w3pWclv6jEIeMQ ojsqbeqpbAbhcf6Vz61qrPOeooichm3B4xxZN29deBw5a3nHk9QJUH7+IZG+q2Cc 9MpWjIjbPPkMto9hMyGN7bRPXW78bI/5DWZGWJ1ZLyGuKDz9iao60EvPzl++9mDk = Received: (qmail 46604 invoked by alias); 2 Nov 2016 02:05:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 46448 invoked by uid 89); 2 Nov 2016 02:05:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL, BAYES_05, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=deliberately, timode, TImode, powerpc_vsx_ok X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 02 Nov 2016 02:05:09 +0000 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uA223VNu048436 for ; Tue, 1 Nov 2016 22:05:07 -0400 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0a-001b2d01.pphosted.com with ESMTP id 26f3nex0w0-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 01 Nov 2016 22:05:07 -0400 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 1 Nov 2016 20:05:07 -0600 Received: from d03dlp02.boulder.ibm.com (9.17.202.178) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 1 Nov 2016 20:05:05 -0600 Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id BD44C3E40030; Tue, 1 Nov 2016 20:05:04 -0600 (MDT) Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id uA224wMh26345566; Wed, 2 Nov 2016 02:05:04 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3B64128041; Tue, 1 Nov 2016 22:05:04 -0400 (EDT) Received: from BigMac.local (unknown [9.85.138.244]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP id DC24F2805A; Tue, 1 Nov 2016 22:05:03 -0400 (EDT) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , will_schmidt@vnet.ibm.com From: Bill Schmidt Subject: [PATCH, rs6000] Fold vector addition built-ins in GIMPLE Date: Tue, 1 Nov 2016 21:05:03 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16110202-0008-0000-0000-000005F50E77 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006017; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00775622; UDB=6.00372940; IPR=6.00552698; BA=6.00004849; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013180; XFM=3.00000011; UTC=2016-11-02 02:05:06 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16110202-0009-0000-0000-00003CA2CCCA Message-Id: X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-11-01_10:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1611020036 X-IsSubscribed: yes Hi, As Jakub suggested in response to my *ahem* ornate patch for overloaded function built-ins, a much better approach is to use the existing machinery for overloading and then immediately fold the specific functions during gimplification. There is a target hook available for this purpose that we have not previously used. This patch demonstrates this functionality by implementing the target hook and folding vector addition built-ins within it. Future patches will fold other such operations, improving the optimization available for many vector intrinsics. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2016-11-01 Bill Schmidt * config/rs6000/rs6000.c (gimple-ssa.h): New #include. (TARGET_GIMPLE_FOLD_BUILTIN): Define as rs6000_gimple_fold_builtin. (rs6000_gimple_fold_builtin): New function. Add handling for early expansion of vector addition builtins. [gcc/testsuite] 2016-11-01 Bill Schmidt * gcc.target/powerpc/fold-vec-add-1.c: New. * gcc.target/powerpc/fold-vec-add-2.c: New. * gcc.target/powerpc/fold-vec-add-3.c: New. * gcc.target/powerpc/fold-vec-add-4.c: New. * gcc.target/powerpc/fold-vec-add-5.c: New. * gcc.target/powerpc/fold-vec-add-6.c: New. * gcc.target/powerpc/fold-vec-add-7.c: New. Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 241624) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -56,6 +56,7 @@ #include "sched-int.h" #include "gimplify.h" #include "gimple-iterator.h" +#include "gimple-ssa.h" #include "gimple-walk.h" #include "intl.h" #include "params.h" @@ -1632,6 +1633,8 @@ static const struct attribute_spec rs6000_attribut #undef TARGET_FOLD_BUILTIN #define TARGET_FOLD_BUILTIN rs6000_fold_builtin +#undef TARGET_GIMPLE_FOLD_BUILTIN +#define TARGET_GIMPLE_FOLD_BUILTIN rs6000_gimple_fold_builtin #undef TARGET_EXPAND_BUILTIN #define TARGET_EXPAND_BUILTIN rs6000_expand_builtin @@ -16337,6 +16340,46 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRI #endif } +/* Fold a machine-dependent built-in in GIMPLE. (For folding into + a constant, use rs6000_fold_builtin.) */ + +bool +rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) +{ + gimple *stmt = gsi_stmt (*gsi); + tree fndecl = gimple_call_fndecl (stmt); + gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD); + enum rs6000_builtins fn_code + = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl); + tree arg0, arg1, lhs; + + switch (fn_code) + { + /* Flavors of vec_add. We deliberately don't expand + P8V_BUILTIN_VADDUQM as it gets lowered from V1TImode to + TImode, resulting in much poorer code generation. */ + case ALTIVEC_BUILTIN_VADDUBM: + case ALTIVEC_BUILTIN_VADDUHM: + case ALTIVEC_BUILTIN_VADDUWM: + case P8V_BUILTIN_VADDUDM: + case ALTIVEC_BUILTIN_VADDFP: + case VSX_BUILTIN_XVADDDP: + { + arg0 = gimple_call_arg (stmt, 0); + arg1 = gimple_call_arg (stmt, 1); + lhs = gimple_call_lhs (stmt); + gimple *g = gimple_build_assign (lhs, PLUS_EXPR, arg0, arg1); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi, g, true); + return true; + } + default: + break; + } + + return false; +} + /* Expand an expression EXP that calls a built-in function, with result going to TARGET if that's convenient (and in mode MODE if that's convenient). Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-1.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with char + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed char +test1 (vector bool char x, vector signed char y) +{ + return vec_add (x, y); +} + +vector signed char +test2 (vector signed char x, vector bool char y) +{ + return vec_add (x, y); +} + +vector signed char +test3 (vector signed char x, vector signed char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test4 (vector bool char x, vector unsigned char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test5 (vector unsigned char x, vector bool char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test6 (vector unsigned char x, vector unsigned char y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddubm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-2.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with short + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed short +test1 (vector bool short x, vector signed short y) +{ + return vec_add (x, y); +} + +vector signed short +test2 (vector signed short x, vector bool short y) +{ + return vec_add (x, y); +} + +vector signed short +test3 (vector signed short x, vector signed short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test4 (vector bool short x, vector unsigned short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test5 (vector unsigned short x, vector bool short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test6 (vector unsigned short x, vector unsigned short y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduhm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-3.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with int + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed int +test1 (vector bool int x, vector signed int y) +{ + return vec_add (x, y); +} + +vector signed int +test2 (vector signed int x, vector bool int y) +{ + return vec_add (x, y); +} + +vector signed int +test3 (vector signed int x, vector signed int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test4 (vector bool int x, vector unsigned int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test5 (vector unsigned int x, vector bool int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test6 (vector unsigned int x, vector unsigned int y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduwm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-4.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with long long + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed long long +test1 (vector bool long long x, vector signed long long y) +{ + return vec_add (x, y); +} + +vector signed long long +test2 (vector signed long long x, vector bool long long y) +{ + return vec_add (x, y); +} + +vector signed long long +test3 (vector signed long long x, vector signed long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test4 (vector bool long long x, vector unsigned long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test5 (vector unsigned long long x, vector bool long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test6 (vector unsigned long long x, vector unsigned long long y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddudm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-5.c (working copy) @@ -0,0 +1,16 @@ +/* Verify that overloaded built-ins for vec_add with float + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11 -mno-vsx" } */ + +#include + +vector float +test1 (vector float x, vector float y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddfp" 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-6.c (working copy) @@ -0,0 +1,23 @@ +/* Verify that overloaded built-ins for vec_add with float and + double inputs for VSX produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector float +test1 (vector float x, vector float y) +{ + return vec_add (x, y); +} + +vector double +test2 (vector double x, vector double y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "xvaddsp" 1 } } */ +/* { dg-final { scan-assembler-times "xvadddp" 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/fold-vec-add-7.c (working copy) @@ -0,0 +1,22 @@ +/* Verify that overloaded built-ins for vec_add with __int128 + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-additional-options "-std=gnu11 -Wno-pedantic" } */ + +#include "altivec.h" + +vector signed __int128 +test1 (vector signed __int128 x, vector signed __int128 y) +{ + return vec_add (x, y); +} + +vector unsigned __int128 +test2 (vector unsigned __int128 x, vector unsigned __int128 y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduqm" 2 } } */