From patchwork Fri Jul 8 15:07:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Preudhomme X-Patchwork-Id: 646444 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rmHty3b4bz9t0H for ; Sat, 9 Jul 2016 01:08:41 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=QSIaMEIf; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=vtG8tKmzAJJ/JmK+ HwPL/WeOoWb6+dOXqcrlbuUfFgKkTysJj00fGdPukQ1Bc7UkXAnn7itUehzbROTQ msjoGwAHhiNNlgeV4wr/VTOZtJ7HWULOE6NVGfoMrxR889JQ8hNY/L45AY3c8HQZ Fm48yLOtVm00ze4pS6qE4STvR2Y= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=1j/tgUYI1LamgvQ45ImJ6o +lvog=; b=QSIaMEIfTmwqFEap4j2JdFR/eVeK2V5PowDpJ0IlhqPRtRVRJ0213k HzWaztdv1iDuOJuacnGcI0aSzC4XFJAtQjju2OBySH3QFAEWHYui6/subzllhjOF 6N34LdKoG7k+xiSZ04DXwxAzrtaOspmryxC5DmEzBw25Vcd7oy8LY= Received: (qmail 120594 invoked by alias); 8 Jul 2016 15:08:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 120584 invoked by uid 89); 8 Jul 2016 15:08:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=Flag X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 08 Jul 2016 15:08:20 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F002E30C; Fri, 8 Jul 2016 08:09:18 -0700 (PDT) Received: from e108577-lin.localnet (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2B0A33F213; Fri, 8 Jul 2016 08:08:18 -0700 (PDT) From: Thomas Preudhomme To: gcc-patches@gcc.gnu.org, Vladimir Makarov Subject: [PATCH, GCC/LRA] Teach LRA to not use same register value for multiple output operands of an insn Date: Fri, 08 Jul 2016 16:07:57 +0100 Message-ID: <3064206.UDVhgfly4a@e108577-lin> User-Agent: KMail/4.13.3 (Linux/3.13.0-85-generic; KDE/4.13.3; x86_64; ; ) MIME-Version: 1.0 X-IsSubscribed: yes Hi, While investigating the root cause a testsuite regression for the ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug seems to also affect trunk. The bug manifests itself as an ICE in cselib due to a parallel insn with two SET to the same register. When processing the second SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt == 0) fails because the field was already set when processing the first SET. The root cause seems to be a register allocation issue in lra-constraints. When considering an output operand with matching input operand(s), match_reload does a number of checks to see if it can reuse the first matching input operand register value or if a new unique value should be given to the output operand. The current check ignores the case of multiple output operands (as in neon_vtrn_insn insn pattern in config/arm/arm.md). This can lead to cases where multiple output operands share a same register value when the first matching input operand has the same value as another output operand, leading to the ICE in cselib. This patch changes match_reload to get information about other output operands and check whether this case is met or not. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-07-01 Thomas Preud'homme * lra-constraints.c (match_reload): Pass information about other output operands. Create new unique register value if matching input operand shares same register value as output operand being considered. (curr_insn_transform): Record output operands already processed. Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu and testsuite results does not regress for these and for arm-none-eabi targeting Cortex-A8. Is this ok for trunk? Best regards, Thomas diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index bf08dce2e0a4c2ef4c339aedbda4dba47cba1645..a3fd6c93c648050f3479dc8aca359a819d24863e 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -871,15 +871,18 @@ regno_val_use_in (unsigned int regno, rtx x) } /* Generate reloads for matching OUT and INS (array of input operand - numbers with end marker -1) with reg class GOAL_CLASS. Add input - and output reloads correspondingly to the lists *BEFORE and *AFTER. - OUT might be negative. In this case we generate input reloads for - matched input operands INS. EARLY_CLOBBER_P is a flag that the - output operand is early clobbered for chosen alternative. */ + numbers with end marker -1) with reg class GOAL_CLASS, considering + output operands OUTS (similar array to INS) needing to be in different + registers. Add input and output reloads correspondingly to the lists + *BEFORE and *AFTER. OUT might be negative. In this case we generate + input reloads for matched input operands INS. EARLY_CLOBBER_P is a flag + that the output operand is early clobbered for chosen alternative. */ static void -match_reload (signed char out, signed char *ins, enum reg_class goal_class, - rtx_insn **before, rtx_insn **after, bool early_clobber_p) +match_reload (signed char out, signed char *ins, signed char *outs, + enum reg_class goal_class, rtx_insn **before, + rtx_insn **after, bool early_clobber_p) { + bool out_conflict; int i, in; rtx new_in_reg, new_out_reg, reg; machine_mode inmode, outmode; @@ -968,12 +971,32 @@ match_reload (signed char out, signed char *ins, enum reg_class goal_class, We don't care about eliminable hard regs here as we are interesting only in pseudos. */ + /* Matching input's register value is the same as one of the other + output operand. Output operands in a parallel insn must be in + different registers. */ + out_conflict = false; + if (REG_P (in_rtx)) + { + for (i = 0; outs[i] >= 0; i++) + { + rtx other_out_rtx = *curr_id->operand_loc[outs[i]]; + if (REG_P (other_out_rtx) + && (regno_val_use_in (REGNO (in_rtx), other_out_rtx) + != NULL_RTX)) + { + out_conflict = true; + break; + } + } + } + new_in_reg = new_out_reg = (! early_clobber_p && ins[1] < 0 && REG_P (in_rtx) && (int) REGNO (in_rtx) < lra_new_regno_start && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx)) && (out < 0 || regno_val_use_in (REGNO (in_rtx), out_rtx) == NULL_RTX) + && !out_conflict ? lra_create_new_reg (inmode, in_rtx, goal_class, "") : lra_create_new_reg_with_unique_value (outmode, out_rtx, goal_class, "")); @@ -3432,9 +3455,11 @@ curr_insn_transform (bool check_only_p) int i, j, k; int n_operands; int n_alternatives; + int n_outputs; int commutative; signed char goal_alt_matched[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS]; signed char match_inputs[MAX_RECOG_OPERANDS + 1]; + signed char outputs[MAX_RECOG_OPERANDS + 1]; rtx_insn *before, *after; bool alt_p = false; /* Flag that the insn has been changed through a transformation. */ @@ -3844,6 +3869,8 @@ curr_insn_transform (bool check_only_p) } } + n_outputs = 0; + outputs[0] = -1; for (i = 0; i < n_operands; i++) { int regno; @@ -4001,7 +4028,7 @@ curr_insn_transform (bool check_only_p) /* generate reloads for input and matched outputs. */ match_inputs[0] = i; match_inputs[1] = -1; - match_reload (goal_alt_matched[i][0], match_inputs, + match_reload (goal_alt_matched[i][0], match_inputs, outputs, goal_alt[i], &before, &after, curr_static_id->operand_alternative [goal_alt_number * n_operands + goal_alt_matched[i][0]] @@ -4011,9 +4038,9 @@ curr_insn_transform (bool check_only_p) && (curr_static_id->operand[goal_alt_matched[i][0]].type == OP_IN)) /* Generate reloads for output and matched inputs. */ - match_reload (i, goal_alt_matched[i], goal_alt[i], &before, &after, - curr_static_id->operand_alternative - [goal_alt_number * n_operands + i].earlyclobber); + match_reload (i, goal_alt_matched[i], outputs, goal_alt[i], &before, + &after, curr_static_id->operand_alternative + [goal_alt_number * n_operands + i].earlyclobber); else if (curr_static_id->operand[i].type == OP_IN && (curr_static_id->operand[goal_alt_matched[i][0]].type == OP_IN)) @@ -4023,12 +4050,22 @@ curr_insn_transform (bool check_only_p) for (j = 0; (k = goal_alt_matched[i][j]) >= 0; j++) match_inputs[j + 1] = k; match_inputs[j + 1] = -1; - match_reload (-1, match_inputs, goal_alt[i], &before, &after, false); + match_reload (-1, match_inputs, outputs, goal_alt[i], &before, + &after, false); } else /* We must generate code in any case when function process_alt_operands decides that it is possible. */ gcc_unreachable (); + + /* Memorise processed outputs so that output remaining to be processed + can avoid using the same register value (see match_reload). */ + if (curr_static_id->operand[i].type == OP_OUT) + { + outputs[n_outputs++] = i; + outputs[n_outputs] = -1; + } + if (optional_p) { lra_assert (REG_P (op));