From patchwork Wed Jan 14 19:20:21 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Suchanek X-Patchwork-Id: 429068 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 53D81140161 for ; Thu, 15 Jan 2015 06:20:39 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=RlE XwXP288/gENhN9I7QF5d7qypMizZ/U3pykOhSy1sbR1AqyPpF97IXWgcW5YYniU0 bcgg1xFYFvVIQs/7LiE/b2RUNPM348n3YQUeDkVFKuI5QaRVhPMv3vABUYCbtkne Sf0/mY1F8teK78XqQACe2i+hH18qI3w9bVNDlx2Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=default; bh=t+isaqiRD gnCeQlFdls6/zRIzSE=; b=ZILvCrI94BJLYjSvAWUIpOjDTWYsCz6huyuORVcFg SUVthoeomiGVL08BJ6EFR16D/WJOoPOi2w0cMPUenzGi6hNaS4FXv0y5chZMFOBU 8gznlFbqAxfC2VKNMiq1NqUD2d5sngEFN4GPBLubUHlemj4JaLaItGJj8B0tOL0R rc= Received: (qmail 14896 invoked by alias); 14 Jan 2015 19:20:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 14880 invoked by uid 89); 14 Jan 2015 19:20:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mailapp01.imgtec.com Received: from mailapp01.imgtec.com (HELO mailapp01.imgtec.com) (195.59.15.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Jan 2015 19:20:26 +0000 Received: from KLMAIL01.kl.imgtec.org (unknown [192.168.5.35]) by Websense Email Security Gateway with ESMTPS id 5B92B868A9D90; Wed, 14 Jan 2015 19:20:19 +0000 (GMT) Received: from hhmail02.hh.imgtec.org (10.100.10.20) by KLMAIL01.kl.imgtec.org (192.168.5.35) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 14 Jan 2015 19:20:23 +0000 Received: from hhmail02.hh.imgtec.org ([::1]) by hhmail02.hh.imgtec.org ([::1]) with mapi id 14.03.0224.002; Wed, 14 Jan 2015 19:20:21 +0000 From: Robert Suchanek To: "vmakarov@redhat.com" CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH, RFC] LRA subreg handling Date: Wed, 14 Jan 2015 19:20:21 +0000 Message-ID: MIME-Version: 1.0 Hi Vladimir, An issue has been identified with LRA when running CPU2006 h264ref benchmark. I'll try to describe what the issue is and a fix applied as it is very difficult to reproduce it and it is next to impossible to create a narrowed testcase on top of the source code restrictions. The concerned LRA code in lra-constraints.c is the following: if (GET_CODE (*loc) == SUBREG) { reg = SUBREG_REG (*loc); byte = SUBREG_BYTE (*loc); if (REG_P (reg) /* Strict_low_part requires reload the register not the sub-register. */ && (curr_static_id->operand[i].strict_low || (GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (reg)) && (hard_regno = get_try_hard_regno (REGNO (reg))) >= 0 && (simplify_subreg_regno (hard_regno, GET_MODE (reg), byte, mode) < 0) && (goal_alt[i] == NO_REGS || (simplify_subreg_regno (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0)))) { loc = &SUBREG_REG (*loc); mode = GET_MODE (*loc); } } The above works just fine when we deal with strict_low_part or a subreg smaller than a word. However, multi-word operations that were emitted as a sequence of operations on word sized parts of the DImode register appears to expose a problem with LRA e.g. '(set (subreg: SI (reg: DI)) ...)'. LRA does not realize that it actually uses the other halve of the DI-mode register leading to a situation where it modifies one halve of the result and spills the whole register with the other halve undefined. In the dump I can see the following: Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 1487: r1552:DI#4=r1404:SI+r1509:SI REG_DEAD r1509:SI REG_DEAD r1404:SI Inserting insn reload after: 1735: r521:DI=r1552:DI There is nothing in the dump that sets r1552:DI#0 nor a reload is inserted to load the value before modifying it but it is spilled. As it is a multi-word register, the split pass emits an additional instruction to load the whole 64-bit value but since one halve was modified, only register $20 appears in the live-in set. In contrast to $20, $21 is being used but not added to the live-in set. ... ;; live in 4 [$4] 6 [$6] 7 [$7] 10 [$10] 11 [$11] 12 [$12] 13 [$13] [$14] 15 [$15] 16 [$16] 17 [$17] 20 [$20] 22 [$22] 23 [$23] 24 [$24] 25 [$25] 29 [$sp] 30 [$fp] 31 [$31] 52 [$f20] 79 [$fakec] ... (insn 1788 1077 1789 80 (set (reg:SI 20 $20 [orig:521 distortion ] [521]) (mem/c:SI (plus:SI (reg/f:SI 29 $sp) (const_int 40 [0x28])) [16 %sfp+40 S4 A64])) rdopt.c:257 288 {*movsi_internal} (nil)) (insn 1789 1788 1743 80 (set (reg:SI 21 $21 [ distortion+4 ]) (mem/c:SI (plus:SI (reg/f:SI 29 $sp) (const_int 44 [0x2c])) [16 %sfp+44 S4 A32])) rdopt.c:257 288 {*movsi_internal} (nil)) ... The potential fix for this is to promote the type of a subreg OP_OUT to OP_INOUT to treat the pseudo register (r1552 in this case) as input and LRA will be forced to insert a reload before modifying its contents. Handling of strict_low_part case is fine as the operand is described in the MD pattern as IN_OUT through modifiers. With the above change in place, we get a reload before assignment: Creating newreg=1552 from oldreg=521, assigning class GR_REGS to r1552 1487: r1552:DI#4=r1404:SI+r1509:SI REG_DEAD r1509:SI REG_DEAD r1404:SI Inserting insn reload before: 1735: r1552:DI=r521:DI Inserting insn reload after: 1736: r521:DI=r1552:DI and the benchmark happily passes the runtime check. The question is whether changing the type to OP_INOUT is the correct and valid fix? Regards, Robert 2015-01-14 Robert Suchanek gcc/ * lra-constraints.c (curr_insn_transform): Change the type of a reload pseudo to OP_INOUT. --- gcc/lra-constraints.c | 1 + 1 file changed, 1 insertion(+) -- 1.7.9.5 diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index ec28b7f..018968b 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -3798,6 +3798,7 @@ curr_insn_transform (void) (ira_class_hard_regs[goal_alt[i]][0], GET_MODE (reg), byte, mode) >= 0))))) { + type = OP_INOUT; loc = &SUBREG_REG (*loc); mode = GET_MODE (*loc); }