From patchwork Sat Aug 30 02:46:23 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 384427 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id DE589140111 for ; Sat, 30 Aug 2014 12:46:52 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=wyMEozTDNTVhZY53AGyUoNpE7BZeRwwXx9Ux8S2ztuJDJO45JA uZABuqNowTmHw/Gy9cu91pOoF4YMcNlifur9y29+TJdUcoKFtaL6vBK5sp4Zcw0r rzLnBhdwolfYkvQFp+oJKirF+aVSgscB1MoaGlLxtmlbOjcPDuk5elMXM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=HAc3WKpuS+hOJZXc/b5awVPx6IY=; b=nSJpJQaingNzc+pk19hM iI2HNuUoahcL9UFec4JMmD8Dgajimbz6Rtwr8GkbRwDwJa4U3CUQEfXOKixrtI8T sw39d8FTvpH/ozqO6nrX37VHUDUDUQRYW3BBvP8zt4AqFfnrvSAHp/5OLiiVJmVS 90i2olE7vbpZ6D8kfypbFBs= Received: (qmail 12573 invoked by alias); 30 Aug 2014 02:46:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12528 invoked by uid 89); 30 Aug 2014 02:46:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00 autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 30 Aug 2014 02:46:33 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1XNYgP-0003RC-HI from Maciej_Rozycki@mentor.com ; Fri, 29 Aug 2014 19:46:29 -0700 Received: from localhost (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server (TLS) id 14.2.247.3; Sat, 30 Aug 2014 03:46:28 +0100 Date: Sat, 30 Aug 2014 03:46:23 +0100 From: "Maciej W. Rozycki" To: David Edelsohn CC: Subject: [PATCH] GCC/test: Disable loop-19.c for classic FPU Power Message-ID: User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Hi, The loop-19.c test case has regressed from 4.8 to 4.9 and trunk on classic FPU Power targets, these failures are now seen: FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2 FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )c," 2 However upon the inpection of generated code it is obvious that its quality has improved, the autoincrement rather than indexed addressing mode is now used in the loop produced, reducing the number of instructions in the loop from 4 to 3 and also removing another instruction from outside the loop, i.e. (new code): .globl tuned_STREAM_Copy .type tuned_STREAM_Copy, @function tuned_STREAM_Copy: lis 8,0x1e lis 10,a-8@ha ori 8,8,33920 lis 9,c-8@ha mtctr 8 la 10,a-8@l(10) la 9,c-8@l(9) .L2: lfdu 0,8(10) stfdu 0,8(9) bdnz .L2 blr .size tuned_STREAM_Copy, .-tuned_STREAM_Copy vs (old code): .globl tuned_STREAM_Copy .type tuned_STREAM_Copy, @function tuned_STREAM_Copy: lis 7,0x1e ori 7,7,33920 mtctr 7 lis 8,c@ha lis 10,a@ha li 9,0 la 8,c@l(8) la 10,a@l(10) .L3: lfdx 0,10,9 stfdx 0,8,9 addi 9,9,8 bdnz .L3 blr .size tuned_STREAM_Copy,.-tuned_STREAM_Copy The only Power targets that still pass this test are e500v2 ones such as `-mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe' that use the SPE unit for FP operations, because the indexed mode is still used (there's no autoincrement addressing mode available for the memory access instructions concerned): .globl tuned_STREAM_Copy .type tuned_STREAM_Copy, @function tuned_STREAM_Copy: lis 10,0x1e lis 7,c@ha lis 8,a@ha ori 10,10,0x8480 li 9,0 la 7,c@l(7) la 8,a@l(8) mtctr 10 .L2: evlddx 10,8,9 evstddx 10,7,9 addi 9,9,8 bdnz .L2 blr .size tuned_STREAM_Copy,.-tuned_STREAM_Copy [I have removed "-fno-common" from the current test flags for the purpose of this consideration to compare apples to apples; 4.8 didn't have it. The presence or absence of this flag does not appear to make a difference for this test case for Power targets.] The obvious reason of the failure is the offset of -8 now seen in new classic FP code for preinitialising the pointers before entering the loop. The initial offset is needed so that it is cancelled by the offset of 8 used in the loop itself to autoincrement these pointers. So the new code not only is better, but it actually has to use these offsets as well or autoincrementation would not work. Therefore I think at this point the test case is invalid for classic FP Power, so I propose that we exclude it from testing here, only leaving SPE FP Power for whatever value the test case may have for it, and especially x86 variants where there's actual code size penalty for using an immediate offset (displacement) in addition to a base register. For the record here are the optimization dumps examined by the test case, for the old generated code that passes: ;; Function tuned_STREAM_Copy (tuned_STREAM_Copy, funcdef_no=0, decl_uid=1382, cgraph_uid=0) tuned_STREAM_Copy () { sizetype ivtmp.10; double _4; : : # ivtmp.10_8 = PHI _4 = MEM[symbol: a, index: ivtmp.10_8, offset: 0B]; MEM[symbol: c, index: ivtmp.10_8, offset: 0B] = _4; ivtmp.10_2 = ivtmp.10_8 + 8; if (ivtmp.10_2 != 16000000) goto ; else goto ; : goto ; : return; } and for the new code that fails: ;; Function tuned_STREAM_Copy (tuned_STREAM_Copy, funcdef_no=0, decl_uid=2191, symbol_order=2) Removing basic block 5 tuned_STREAM_Copy () { unsigned int ivtmp.13; unsigned int ivtmp.9; double _4; void * _15; void * _16; unsigned int _17; : ivtmp.9_11 = (unsigned int) &MEM[(void *)&a + 4294967288B]; ivtmp.13_14 = (unsigned int) &MEM[(void *)&c + 4294967288B]; _17 = (unsigned int) &MEM[(void *)&a + 15999992B]; : # ivtmp.9_8 = PHI # ivtmp.13_12 = PHI ivtmp.9_2 = ivtmp.9_8 + 8; _15 = (void *) ivtmp.9_2; _4 = MEM[base: _15, offset: 0B]; ivtmp.13_13 = ivtmp.13_12 + 8; _16 = (void *) ivtmp.13_13; MEM[base: _16, offset: 0B] = _4; if (ivtmp.9_2 != _17) goto ; else goto ; : return; } Tested with the following powerpc-gnu-linux multilibs with the respective results noted on the right: -mcpu=603e UNSUPPORTED -mcpu=603e -msoft-float UNSUPPORTED -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe UNSUPPORTED -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe PASS -mcpu=7400 -maltivec -mabi=altivec UNSUPPORTED -mcpu=e6500 -maltivec -mabi=altivec UNSUPPORTED -mcpu=e5500 -m64 UNSUPPORTED -mcpu=e6500 -m64 -maltivec -mabi=altivec UNSUPPORTED Original results: -mcpu=603e FAIL -mcpu=603e -msoft-float UNSUPPORTED -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe UNSUPPORTED -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe PASS -mcpu=7400 -maltivec -mabi=altivec FAIL -mcpu=e6500 -maltivec -mabi=altivec FAIL -mcpu=e5500 -m64 FAIL -mcpu=e6500 -m64 -maltivec -mabi=altivec FAIL OK to apply (for trunk and 4.9)? 2014-08-30 Maciej W. Rozycki * gcc.dg/tree-ssa/loop-19.c: Exclude classic FPU Power targets. Maciej gcc-test-power-loop-19.diff Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c =================================================================== --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-29 16:45:27.748122597 +0100 +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-30 02:53:03.658955978 +0100 @@ -4,7 +4,7 @@ The testcase comes from PR 29256 (and originally, the stream benchmark). */ -/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || powerpc_hard_double } } } } */ +/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || { powerpc_hard_double && { ! powerpc_fprs } } } } } } */ /* { dg-require-effective-target nonpic } */ /* { dg-options "-O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common" } */