From patchwork Tue Oct 1 05:55:19 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Mi X-Patchwork-Id: 279338 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 2DCAB2C00A8 for ; Tue, 1 Oct 2013 15:55:30 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=Ay0XyXZUR3ftPzab8h 1k4RhEJPSWT+DYhFWpJmPXqHherY+9hyRve3djonI0QGlWpogAckMwFZufRCDHMd nzri3MnzqTvdPpqIcIMDG66226OEXk+5j4eGjgFDPtNjg7YOoMbN1yB9wJXR5Fnw vZ3frQR/IfdYEekn5oPy2eTYs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=nrc0pc+kiG+yc1La/Du5sgDc 3GY=; b=YotqsJfe7kfjobWgZZUh0+Er63mDmq6OdYVrK3iaoogK5QL1m7Z/p7PC iJNSok+OMHiJi+UYYXa38UqE24xiy3YTFZPxoZV4Nbbvu8f9xIWYnI89DV5wJL4F X2sEixWSoSxOe8Gc5ozlWd5qAnKvqZ3LvsmRCDBtwhWl3rNYabg= Received: (qmail 16004 invoked by alias); 1 Oct 2013 05:55:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15987 invoked by uid 89); 1 Oct 2013 05:55:22 -0000 Received: from mail-oa0-f41.google.com (HELO mail-oa0-f41.google.com) (209.85.219.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 01 Oct 2013 05:55:22 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: mail-oa0-f41.google.com Received: by mail-oa0-f41.google.com with SMTP id n10so4492251oag.28 for ; Mon, 30 Sep 2013 22:55:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=VarOvONRH3BNDZIVGF3opHwWfsczBsJhGm5wGtcXwJI=; b=lNgQiNchyzwvYsXBwLCEy9qsdlWZxj73mD0Il0ckMLo8rddeTbN3uQfQAM8YYGBshl HiCoq5oNdCa0OX5sh+gInMzlSw7twDAc2gYhDPmYj//zsWyJe4gkwP1XsEew6JatKEzj Rx48ntpsxp4YxR4+I/sLJ9ETIQu1S35oip9Tf5j6DVLBHcureWWM4eGXWN4oFZK6jkjE ucpZvqnHb6JdRqNbzS1FERLNGx4mYQ9iJ6bTxtcBf1kIxYcdkex+7fhAd1Yf4312cbca OfsJy1Zgw/P8CXd9NYZGlsH3VfpBbT4yBLSFTAp6YyGtueTYjV0MxuQ8MUzBi5dYVfjm ibGA== X-Gm-Message-State: ALoCoQlJwmhy17FtX2CDEpnHVXJtQQnRac0dsdgtPNKkiOcivHRQAYclWfxwBRd9sgZQEoDam5tX6UXWR0tgh9z2pafcxubbsbpMQ+aHZpa6QIP9TLE7JQJObVlwkAnSvM+R2IL9jFENQ1Ugp6elelH+yYEfryw7jQvI1IRGJN+HuFbSjhizml7ljDw92zjCuK8OgunsP42+MHvOsmXr6bysuEwhCpBcoA== MIME-Version: 1.0 X-Received: by 10.60.63.68 with SMTP id e4mr22956225oes.23.1380606919546; Mon, 30 Sep 2013 22:55:19 -0700 (PDT) Received: by 10.76.75.3 with HTTP; Mon, 30 Sep 2013 22:55:19 -0700 (PDT) In-Reply-To: <524339BB.6070301@redhat.com> References: <524304C2.1040304@redhat.com> <52431B06.4050405@redhat.com> <524339BB.6070301@redhat.com> Date: Mon, 30 Sep 2013 22:55:19 -0700 Message-ID: Subject: Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg. From: Wei Mi To: Vladimir Makarov Cc: GCC Patches , David Li , Paul Pluzhnikov > Probably the best place to add a code for this is in > lra-constraints.c::simplify_operand_subreg by permitting subreg reload > for paradoxical subregs whose hard regs are not fully in allocno class > of the inner pseudo. > > It needs a good testing (i'd check that the generated code is not > changed on variety benchmarks to see that the change has no impact on > the most programs performance) and you need to add a good comment > describing why this change is needed. > Vlad, thanks! I make another patch here by following your guidance. Please check whether it is ok. Boostrap and regression ok. I am also verifying its performance effect on google applications (But most of them are 64 bits, so I cannot verify its performance effect on 32 bits apps). The idea of the patch is here: For the following two types of paradoxical subreg, we insert reload in simplify_operand_subreg: 1. If the op_type is OP_IN, and the hardreg could not be paired with other hardreg to contain the outermode operand, for example R15 in x86-64 (checked by in_hard_reg_set_p), we need to insert a reload. If the hardreg allocated in IRA is R12, we don't need to insert reload here because upper half of rvalue paradoxical subreg is undefined so it is ok for R13 to contain undefined data. 2. If the op_type is OP_OUT or OP_INOUT. (It is possible that we don't need to insert reload for this case too, because the upper half of lvalue paradoxical subreg is useless. If the assignment to upper half of subreg register will not be generated by rtl split4 stage, we don't need to insert reload here. But I havn't got a testcase to verify it so I keep it) Here is a paradoxical subreg example showing how the reload is generated: (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64} In IRA, reg107 is allocated to a DImode hardreg. If reg107 is assigned to hardreg R15, compiler cannot find another hardreg to pair with R15 to contain TImode data. So we insert a TImode reload pseudo reg180 for it. After reload is inserted: (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0) (reg:DI 107 [ __comp ])) -1 (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) {*movti_internal_rex64} Two reload hard registers will be allocated to reg180 to save TImode operand in LRA_assign. Thanks, Wei Mi. 2013-09-30 Wei Mi * lra-constraints.c (insert_move_for_subreg): New function. (simplify_operand_subreg): Add reload for paradoxical subreg. } Index: lra-constraints.c =================================================================== --- lra-constraints.c (revision 201963) +++ lra-constraints.c (working copy) @@ -1158,6 +1158,30 @@ process_addr_reg (rtx *loc, rtx *before, return true; } +/* Insert move insn in simplify_operand_subreg. BEFORE returns + the insn to be inserted before curr insn. AFTER returns the + the insn to be inserted after curr insn. ORIGREG and NEWREG + are the original reg and new reg for reload. */ +static void +insert_move_for_subreg (rtx *before, rtx *after, rtx origreg, rtx newreg) +{ + if (before) + { + push_to_sequence (*before); + lra_emit_move (newreg, origreg); + *before = get_insns (); + end_sequence (); + } + if (after) + { + start_sequence (); + lra_emit_move (origreg, newreg); + emit_insn (*after); + *after = get_insns (); + end_sequence (); + } +} + /* Make reloads for subreg in operand NOP with internal subreg mode REG_MODE, add new reloads for further processing. Return true if any reload was generated. */ @@ -1169,6 +1193,8 @@ simplify_operand_subreg (int nop, enum m enum machine_mode mode; rtx reg, new_reg; rtx operand = *curr_id->operand_loc[nop]; + enum reg_class regclass; + enum op_type type; before = after = NULL_RTX; @@ -1177,6 +1203,7 @@ simplify_operand_subreg (int nop, enum m mode = GET_MODE (operand); reg = SUBREG_REG (operand); + type = curr_static_id->operand[nop].type; /* If we change address for paradoxical subreg of memory, the address might violate the necessary alignment or the access might be slow. So take this into consideration. We should not worry @@ -1215,13 +1242,9 @@ simplify_operand_subreg (int nop, enum m && (hard_regno_nregs[hard_regno][GET_MODE (reg)] >= hard_regno_nregs[hard_regno][mode]) && simplify_subreg_regno (hard_regno, GET_MODE (reg), - SUBREG_BYTE (operand), mode) < 0 - /* Don't reload subreg for matching reload. It is actually - valid subreg in LRA. */ - && ! LRA_SUBREG_P (operand)) + SUBREG_BYTE (operand), mode) < 0) || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg)) { - enum op_type type = curr_static_id->operand[nop].type; /* The class will be defined later in curr_insn_transform. */ enum reg_class rclass = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS); @@ -1229,29 +1252,85 @@ simplify_operand_subreg (int nop, enum m if (get_reload_reg (curr_static_id->operand[nop].type, reg_mode, reg, rclass, "subreg reg", &new_reg)) { + bool insert_before, insert_after; bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg)); - if (type != OP_OUT - || GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (mode)) - { - push_to_sequence (before); - lra_emit_move (new_reg, reg); - before = get_insns (); - end_sequence (); - } - if (type != OP_IN) - { - start_sequence (); - lra_emit_move (reg, new_reg); - emit_insn (after); - after = get_insns (); - end_sequence (); - } + + insert_before = (type != OP_OUT + || GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (mode)); + insert_after = (type != OP_IN); + insert_move_for_subreg (insert_before ? &before : NULL, + insert_after ? &after : NULL, + reg, new_reg); } SUBREG_REG (operand) = new_reg; lra_process_new_insns (curr_insn, before, after, "Inserting subreg reload"); return true; } + /* Force a reload for a paradoxical subreg. For paradoxical subreg, + IRA allocates hardreg to the inner pseudo reg according to its mode + instead of the outermode, so the size of the hardreg may not be enough + to contain the outermode operand, in that case we may need to insert + reload for the reg. For the following two types of paradoxical subreg, + we need to insert reload: + 1. If the op_type is OP_IN, and the hardreg could not be paired with + other hardreg to contain the outermode operand + (checked by in_hard_reg_set_p), we need to insert the reload. + 2. If the op_type is OP_OUT or OP_INOUT. + + Here is a paradoxical subreg example showing how the reload is generated: + + (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) + (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64} + + In IRA, reg107 is allocated to a DImode hardreg. We use x86-64 as example + here, if reg107 is assigned to hardreg R15, because R15 is the last + hardreg, compiler cannot find another hardreg to pair with R15 to + contain TImode data. So we insert a TImode reload reg180 for it. + After reload is inserted: + + (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0) + (reg:DI 107 [ __comp ])) -1 + (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) + (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) {*movti_internal_rex64} + + Two reload hard registers will be allocated to reg180 to save TImode data + in LRA_assign. */ + else if (REG_P (reg) + && REGNO (reg) >= FIRST_PSEUDO_REGISTER + && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0 + && (hard_regno_nregs[hard_regno][GET_MODE (reg)] + < hard_regno_nregs[hard_regno][mode]) + && (regclass = lra_get_allocno_class (REGNO (reg))) + && (type != OP_IN + || !in_hard_reg_set_p (reg_class_contents[regclass], + mode, hard_regno))) + { + /* The class will be defined later in curr_insn_transform. */ + enum reg_class rclass + = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS); + + if (get_reload_reg (curr_static_id->operand[nop].type, mode, reg, + rclass, "paradoxical subreg", &new_reg)) + { + rtx subreg; + bool insert_before, insert_after; + + PUT_MODE (new_reg, mode); + subreg = simplify_gen_subreg (GET_MODE (reg), new_reg, mode, 0); + bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg)); + + insert_before = (type != OP_OUT); + insert_after = (type != OP_IN); + insert_move_for_subreg (insert_before ? &before : NULL, + insert_after ? &after : NULL, + reg, subreg); + } + SUBREG_REG (operand) = new_reg; + lra_process_new_insns (curr_insn, before, after, + "Inserting paradoxical subreg reload"); + return true; + } return false;