From patchwork Tue Aug 29 21:47:18 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 807325 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-461124-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="IIwpzl+X"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xhj0p3bqzz9s7F for ; Wed, 30 Aug 2017 07:47:36 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=Odq3F OeVbHLOSkeHO4wfF7p4OjQvto4diTNXF1JvtxuShm9DrYjDUKKUdjp7jLEL8RgMb kk1Z3cx+do1CZh2ZloIJBetjFEdwC+exdE+i6BQTflY0HAWsiZqHSlOytUmaZnbi Zm7TnXecRhh5ak6CR2QJCin4fTlSKL2d4neks4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=9FI4zd7JKvS YcZOAyaEx9NCTKYE=; b=IIwpzl+XcZuxUCXk0NGnaza6E9Gtg6hGB7fwLxND6QL bX087AjC9+lcL9WhCWorLHcK/hCvLAuIiPvT2MuI99lIqC16ixzwq6fZQzWVK3/U Tutv5Gk/7n6vbuF8P1MoExzvaoCM/RUMMq71nDazocqaBsGY0bGCLIS3Y+RRFS/8 = Received: (qmail 92234 invoked by alias); 29 Aug 2017 21:47:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 92225 invoked by uid 89); 29 Aug 2017 21:47:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Aug 2017 21:47:25 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v7TLkFU9147163 for ; Tue, 29 Aug 2017 17:47:23 -0400 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0a-001b2d01.pphosted.com with ESMTP id 2cna46jtjw-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 29 Aug 2017 17:47:23 -0400 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 29 Aug 2017 17:47:21 -0400 Received: from b01cxnp22035.gho.pok.ibm.com (9.57.198.25) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 29 Aug 2017 17:47:20 -0400 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v7TLlJjP32833570; Tue, 29 Aug 2017 21:47:19 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E9E59112054; Tue, 29 Aug 2017 17:47:05 -0400 (EDT) Received: from BigMac.local (unknown [9.80.238.20]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id 9690A112047; Tue, 29 Aug 2017 17:47:05 -0400 (EDT) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn From: Bill Schmidt Subject: [PATCH v2, rs6000] Fix PR81833 Date: Tue, 29 Aug 2017 16:47:18 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17082921-0044-0000-0000-00000384E6BD X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007633; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000226; SDB=6.00909425; UDB=6.00456107; IPR=6.00689709; BA=6.00005560; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016921; XFM=3.00000015; UTC=2017-08-29 21:47:21 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17082921-0045-0000-0000-000007B2FF3C Message-Id: <75124dce-5afd-12bf-26a9-3370904c869f@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-08-29_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1708290330 X-IsSubscribed: yes Hi Segher, Thanks for approving the previous patch with changes. I've made those and also modified the test case to require VSX hardware for execution. I duplicated the test so we get coverage on P7 BE 32/64 and P8 BE/LE. I'd appreciate it if you could look over the dejagnu instructions once more on these. Thanks! Bill [gcc] 2017-08-29 Bill Schmidt PR target/81833 * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a define_insn to a define_expand. (altivec_vsum2sws_direct): New define_insn. (altivec_vsumsws): Convert from a define_insn to a define_expand. [gcc/testsuite] 2017-08-29 Bill Schmidt PR target/81833 * gcc.target/powerpc/pr81833-1.c: New file. * gcc.target/powerpc/pr81833-2.c: New file. Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 251369) +++ gcc/config/rs6000/altivec.md (working copy) @@ -1804,51 +1804,61 @@ "vsum4ss %0,%1,%2" [(set_attr "type" "veccomplex")]) -;; FIXME: For the following two patterns, the scratch should only be -;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should -;; be emitted separately. -(define_insn "altivec_vsum2sws" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") - (match_operand:V4SI 2 "register_operand" "v")] - UNSPEC_VSUM2SWS)) - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) - (clobber (match_scratch:V4SI 3 "=v"))] +(define_expand "altivec_vsum2sws" + [(use (match_operand:V4SI 0 "register_operand")) + (use (match_operand:V4SI 1 "register_operand")) + (use (match_operand:V4SI 2 "register_operand"))] "TARGET_ALTIVEC" { if (VECTOR_ELT_ORDER_BIG) - return "vsum2sws %0,%1,%2"; + emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1], + operands[2])); else - return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4"; -} - [(set_attr "type" "veccomplex") - (set (attr "length") - (if_then_else - (match_test "VECTOR_ELT_ORDER_BIG") - (const_string "4") - (const_string "12")))]) + { + rtx tmp1 = gen_reg_rtx (V4SImode); + rtx tmp2 = gen_reg_rtx (V4SImode); + emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2], + operands[2], GEN_INT (12))); + emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1)); + emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2, + GEN_INT (4))); + } + DONE; +}) -(define_insn "altivec_vsumsws" +; FIXME: This can probably be expressed without an UNSPEC. +(define_insn "altivec_vsum2sws_direct" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") - (match_operand:V4SI 2 "register_operand" "v")] - UNSPEC_VSUMSWS)) - (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR)) - (clobber (match_scratch:V4SI 3 "=v"))] + (match_operand:V4SI 2 "register_operand" "v")] + UNSPEC_VSUM2SWS)) + (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] "TARGET_ALTIVEC" + "vsum2sws %0,%1,%2" + [(set_attr "type" "veccomplex")]) + +(define_expand "altivec_vsumsws" + [(use (match_operand:V4SI 0 "register_operand")) + (use (match_operand:V4SI 1 "register_operand")) + (use (match_operand:V4SI 2 "register_operand"))] + "TARGET_ALTIVEC" { if (VECTOR_ELT_ORDER_BIG) - return "vsumsws %0,%1,%2"; + emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1], + operands[2])); else - return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12"; -} - [(set_attr "type" "veccomplex") - (set (attr "length") - (if_then_else - (match_test "(VECTOR_ELT_ORDER_BIG)") - (const_string "4") - (const_string "12")))]) + { + rtx tmp1 = gen_reg_rtx (V4SImode); + rtx tmp2 = gen_reg_rtx (V4SImode); + emit_insn (gen_altivec_vspltw_direct (tmp1, operands[2], const0_rtx)); + emit_insn (gen_altivec_vsumsws_direct (tmp2, operands[1], tmp1)); + emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2, + GEN_INT (12))); + } + DONE; +}) +; FIXME: This can probably be expressed without an UNSPEC. (define_insn "altivec_vsumsws_direct" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") Index: gcc/testsuite/gcc.target/powerpc/pr81833-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr81833-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr81833-1.c (working copy) @@ -0,0 +1,59 @@ +/* PR81833: This used to fail due to improper implementation of vec_msum. */ +/* Test case relies on -mcpu=power7 or later. Currently we don't have + machinery to express that, so we have two separate tests for -mcpu=power7 + and -mcpu=power8 to catch 32-bit BE on P7 and 64-bit BE/LE on P8. */ + +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O2" } */ + +#include + +#define vec_u8 vector unsigned char +#define vec_s8 vector signed char +#define vec_u16 vector unsigned short +#define vec_s16 vector signed short +#define vec_u32 vector unsigned int +#define vec_s32 vector signed int +#define vec_f vector float + +#define LOAD_ZERO const vec_u8 zerov = vec_splat_u8 (0) + +#define zero_u8v (vec_u8) zerov +#define zero_s8v (vec_s8) zerov +#define zero_u16v (vec_u16) zerov +#define zero_s16v (vec_s16) zerov +#define zero_u32v (vec_u32) zerov +#define zero_s32v (vec_s32) zerov + +signed int __attribute__((noinline)) +scalarproduct_int16_vsx (const signed short *v1, const signed short *v2, + int order) +{ + int i; + LOAD_ZERO; + register vec_s16 vec1; + register vec_s32 res = vec_splat_s32 (0), t; + signed int ires; + + for (i = 0; i < order; i += 8) { + vec1 = vec_vsx_ld (0, v1); + t = vec_msum (vec1, vec_vsx_ld (0, v2), zero_s32v); + res = vec_sums (t, res); + v1 += 8; + v2 += 8; + } + res = vec_splat (res, 3); + vec_ste (res, 0, &ires); + + return ires; +} + +int main(void) +{ + const signed short test_vec[] = { 1, 1, 1, 1, 1, 1, 1, 1 }; + if (scalarproduct_int16_vsx (test_vec, test_vec, 8) != 8) + __builtin_abort (); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr81833-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr81833-2.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr81833-2.c (working copy) @@ -0,0 +1,59 @@ +/* PR81833: This used to fail due to improper implementation of vec_msum. */ +/* Test case relies on -mcpu=power7 or later. Currently we don't have + machinery to express that, so we have two separate tests for -mcpu=power7 + and -mcpu=power8 to catch 32-bit BE on P7 and 64-bit BE/LE on P8. */ + +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -O2" } */ + +#include + +#define vec_u8 vector unsigned char +#define vec_s8 vector signed char +#define vec_u16 vector unsigned short +#define vec_s16 vector signed short +#define vec_u32 vector unsigned int +#define vec_s32 vector signed int +#define vec_f vector float + +#define LOAD_ZERO const vec_u8 zerov = vec_splat_u8 (0) + +#define zero_u8v (vec_u8) zerov +#define zero_s8v (vec_s8) zerov +#define zero_u16v (vec_u16) zerov +#define zero_s16v (vec_s16) zerov +#define zero_u32v (vec_u32) zerov +#define zero_s32v (vec_s32) zerov + +signed int __attribute__((noinline)) +scalarproduct_int16_vsx (const signed short *v1, const signed short *v2, + int order) +{ + int i; + LOAD_ZERO; + register vec_s16 vec1; + register vec_s32 res = vec_splat_s32 (0), t; + signed int ires; + + for (i = 0; i < order; i += 8) { + vec1 = vec_vsx_ld (0, v1); + t = vec_msum (vec1, vec_vsx_ld (0, v2), zero_s32v); + res = vec_sums (t, res); + v1 += 8; + v2 += 8; + } + res = vec_splat (res, 3); + vec_ste (res, 0, &ires); + + return ires; +} + +int main(void) +{ + const signed short test_vec[] = { 1, 1, 1, 1, 1, 1, 1, 1 }; + if (scalarproduct_int16_vsx (test_vec, test_vec, 8) != 8) + __builtin_abort (); + return 0; +}