From patchwork Wed Jan 17 11:24:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 862220 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-471462-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="v0jaBkw/"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zM4Ws1y1gz9rxj for ; Wed, 17 Jan 2018 22:25:29 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=H8XncZvbNZRV m7DwEU6sn4CvbWyAKFj/Qno/87QZtWYu0Q6YIu0L3ZiY4chifi1se/L/eJ0r42Sx NGj+w7Pn+aK0FJcTXJZ/yb7bTaNGMEsGYwVEL6nF9CFUyDPpgKHNHmB2meLjvBI9 KcFMqRnpnucH+AVTx3Gvw9BPHz9eVF4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=WO+VgJi4DTybbA/T6x E2Vye9GRU=; b=v0jaBkw/QDBKFcuBLm6s6h0OId7bfhvFPhvKhpSL13/fSsDGma nhdErO7FG78MuSbqlPLORzwTQ5ogrzHR5/oH8xuLyuFtOB1aDKs1EPxufNUe5Az3 cepMuG+ozpraR1mpy/g8bp3IMqIBHOMKaB475JjVhaJfsgcRgVhOGr9pU= Received: (qmail 18641 invoked by alias); 17 Jan 2018 11:25:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18624 invoked by uid 89); 17 Jan 2018 11:25:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.2 spammy=ry, rm, 12269, mw X-HELO: homiemail-a52.g.dreamhost.com Received: from sub5.mail.dreamhost.com (HELO homiemail-a52.g.dreamhost.com) (208.113.200.129) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 17 Jan 2018 11:25:19 +0000 Received: from homiemail-a52.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTP id EEF11600312D; Wed, 17 Jan 2018 03:25:17 -0800 (PST) Received: from devel.in.reserved-bit.com (unknown [202.189.238.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by homiemail-a52.g.dreamhost.com (Postfix) with ESMTPSA id CAD59600312C; Wed, 17 Jan 2018 03:25:15 -0800 (PST) From: Siddhesh Poyarekar To: gcc-patches@gcc.gnu.org Cc: wilson@tuliptree.org, kugan.vivekanandarajah@linaro.org, james.greenhalgh@arm.com, pinskia@gmail.com, Siddhesh Poyarekar Subject: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor Date: Wed, 17 Jan 2018 16:54:59 +0530 Message-Id: <20180117112459.23362-1-siddhesh@gotplt.org> X-IsSubscribed: yes From: Siddhesh Poyarekar Hi, Jim Wilson posted a patch for this in September[1] and it appears following discussions that the patch was an acceptable fix for falkor. Kugan followed up[2] with a test case since that was requested during initial review. Jim has moved on from Linaro, so I'm pinging this patch with the hope that it is OK for inclusion since it was posted before the freeze and is also isolated in impact to just falkor. Siddhesh [1] https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01547.html [2] https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00050.html On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13% performance increase on stream copy using this patch, and a 16% performance increase on stream scale using this patch. I also see a small performance increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3. 2018-01-17 Jim Wilson Kugan Vivekanandarajah gcc/ * config/aarch64/aarch64-protos.h (aarch64_falkor_movti_target_operand_p): Declare. constraint instead of m. * config/aarch64/aarch64.c (aarch64_falkor_movti_target_operand_p): New. * config/aarch64/constraints.md (Utf): New. * config/aarch64/aarch64.md (movti_aarch64): Use Utf constraint instead of m. (movtf_aarch64): Likewise. * config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf gcc/testsuite/ * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64.c | 10 ++++++++++ gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 6 ++++++ gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++++++++++ 6 files changed, 33 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 2d705d2..088d864 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -433,6 +433,7 @@ bool aarch64_simd_mem_operand_p (rtx); bool aarch64_sve_ld1r_operand_p (rtx); bool aarch64_sve_ldr_operand_p (rtx); bool aarch64_sve_struct_memory_operand_p (rtx); +bool aarch64_falkor_movti_target_operand_p (rtx); rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool); rtx aarch64_tls_get_addr (void); tree aarch64_fold_builtin (tree, int, tree *, bool); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3d1f6a0..f7daac3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -131,9 +131,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Utf, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz, w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 2e70f3a..0db7a4f 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13477,6 +13477,16 @@ aarch64_sve_struct_memory_operand_p (rtx op) && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last)); } +/* Return TRUE if OP is a good address mode for movti target on falkor. */ +bool +aarch64_falkor_movti_target_operand_p (rtx op) +{ + if ((enum attr_tune) aarch64_tune == TUNE_FALKOR) + return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS + && ! CONST_INT_P (XEXP (XEXP (op, 0), 1))); + return MEM_P (op); +} + /* Emit a register copy from operand to operand, taking care not to early-clobber source registers in the process. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index edb6a75..696fd12 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1079,7 +1079,7 @@ (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "=r, w,r,w,r,m,m,w,m") + "nonimmediate_operand" "=r, w,r,w,r,m,m,w,Utf") (match_operand:TI 1 "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) @@ -1226,9 +1226,9 @@ (define_insn "*movtf_aarch64" [(set (match_operand:TF 0 - "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,Utf,?r,m ,m") (match_operand:TF 1 - "general_operand" " w,?r, ?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,?r, ?r,w ,Y,Y ,m,w ,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], TFmode) || aarch64_reg_or_fp_zero (operands[1], TFmode))" "@ diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 6cc4cad..d9f2921 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -229,6 +229,12 @@ (and (match_code "mem") (match_test "aarch64_sve_ldr_operand_p (op)"))) +(define_memory_constraint "Utf" + "@iternal + A good address for a falkor movti target operand." + (and (match_code "mem") + (match_test "aarch64_falkor_movti_target_operand_p (op)"))) + (define_memory_constraint "Utv" "@internal An address valid for loading/storing opaque structure diff --git a/gcc/testsuite/gcc.target/aarch64/pr82533.c b/gcc/testsuite/gcc.target/aarch64/pr82533.c new file mode 100644 index 0000000..fa28ffa --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr82533.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=falkor -O2 -ftree-vectorize" } */ + +void +copy (int N, double *c, double *a) +{ + for (int i = 0; i < N; ++i) + c[i] = a[i]; +} + +/* { dg-final { scan-assembler-not "str\tq\[0-9\]+, \\\[x\[0-9\]+, x\[0-9\]+\\\]" } } */