From patchwork Thu Jan 4 17:46:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 855751 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470160-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="dz5xs0+K"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zCFc66C9tz9s4s for ; Fri, 5 Jan 2018 04:47:02 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=SQ2 PEw+akJqktwqOxt+Zxu3wHBGsyy5GBKjiFxT810+gPIMTib3dREPqez9wtAt1vxZ vh/ZPP532EYWO5umgxM271EtAZNgBkw669MHEghX/cyHU51fEdJzcqdYbSJVLvcQ sJFbUdX24OEg4hph3Zup78ZmGdk4h1DRCFvbY1mc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=default; bh=3qQlRDAhS jsHmmNVv1Dp5WuFWUI=; b=dz5xs0+Kd4//5lhBt8odEE5esaAZqTr+ITKZJN4ln D8PqoOvs7+A9bMLRGnSlTt4Y6DTMTP44zE6wZHcHLLIOKmGBbEwYUTwNXl/M1Ayw jRw+Wuv+lRozGreWIacgsLs0YjyaDCjk5xu4Fo2O3jEVKQ9KwQcLQCdqUUriBa2W BY= Received: (qmail 107954 invoked by alias); 4 Jan 2018 17:46:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 107922 invoked by uid 89); 4 Jan 2018 17:46:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-ve1eur01on0064.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (104.47.1.64) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 04 Jan 2018 17:46:50 +0000 Received: from DB6PR0801MB2053.eurprd08.prod.outlook.com (10.168.86.22) by DB6PR0801MB2055.eurprd08.prod.outlook.com (10.168.86.136) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.386.5; Thu, 4 Jan 2018 17:46:46 +0000 Received: from DB6PR0801MB2053.eurprd08.prod.outlook.com ([fe80::b069:5e2d:5b67:b385]) by DB6PR0801MB2053.eurprd08.prod.outlook.com ([fe80::b069:5e2d:5b67:b385%17]) with mapi id 15.20.0366.009; Thu, 4 Jan 2018 17:46:46 +0000 From: Wilco Dijkstra To: GCC Patches CC: nd Subject: [PATCH][AArch64] Improve register allocation of fma Date: Thu, 4 Jan 2018 17:46:46 +0000 Message-ID: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB6PR0801MB2055; 6:yXvJmiPSGnWYMxmEZ9/9EueP6bMd8tRfMdTJMj0DNeCol0GIeMIzPUSTdvMeW5+h7OHGzGT9yHsvKhfSTO9hQ/5cp8zu0Uv/Al0antg4JFlFp4xPegP58alrL/jqETM2Sy4FCwiJrVI2n8kC4n0B2E2nJAyuI3QX+Pcqg4U0gBUyLUJu12sT9OPzXtlyi6ajNaAh3sJOBxkk3/tVOeuA8yszYEMyOl33BB9o9dbTPG8+q0K2gLPXlZKJYZ56bkamKqChPsrkQ5t80UnaA62I9s3wjpXTGhmx2WCTTZaDFAZzDhcEUxkMcb+OZ0p1cQCtPLMTXx9el+4qDEC7QMm2nD7hdK0h1oesuviMeoKQ2Iy8eudmGhyR1sIeP7seko8+; 5:tU1p2zb/5Z9fqJ95TWZDpIl08x299J1Ofn2M4t6pXguFHo3ZWWY8H5Ie++so+A8wulU2a6ucc8snuGfZJ0D89VvZ/ykFq8afDB3UJ4QIusuCVw+8+1ReusTnrskZPmcIPewAhXLu3uucpKnjxAbwnxuhPe4Gl/dTaTd25F6iQdI=; 24:BcRKZ/h3pYwZDvWc27YGVorqZ7Y5Vk7QbeDgKjz7ZhVU4VbPPTR7qJMyfIeRFXaP7+6VwE8BSlPKFAMEdIspghlHfOS3E693YwFvK61cT2g=; 7:0zMzgkFlOlmp4XROq9YVYFesqh5nCJLA4PlR4B2xT5JoGO9FAjyRXrvgcWyZobupp/Owjl09nc+ouJkq3rHvJFFVo/I7jLb7TyWauiZde+Z32l0MoLwsBOeRFvvjtFMqK2LnEhDdUW7+2HPQkJIWuBE0K/SgUWl2FEK32rE6eYEeZIkPqjmH0EeNNc9Ev5ypM42OqqTn5ZhIpDDzzzWglmuvZfEJOakwzvonsBMSMnGozjPqJKHBx8bMnZewbsPM x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: d1e5954c-7b7c-49e0-07b0-08d5539b2027 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(48565401081)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060); SRVR:DB6PR0801MB2055; x-ms-traffictypediagnostic: DB6PR0801MB2055: nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040470)(2401047)(5005006)(8121501046)(3231023)(944501075)(3002001)(10201501046)(93006095)(93001095)(6055026)(6041268)(20161123558120)(20161123560045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(6072148)(201708071742011); SRVR:DB6PR0801MB2055; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:DB6PR0801MB2055; x-forefront-prvs: 054231DC40 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(346002)(376002)(39860400002)(366004)(39380400002)(199004)(189003)(54534003)(377424004)(66066001)(74316002)(4326008)(305945005)(53936002)(8936002)(9686003)(478600001)(72206003)(7736002)(316002)(6116002)(5660300001)(68736007)(59450400001)(6506007)(2906002)(3846002)(6916009)(25786009)(81166006)(6436002)(33656002)(106356001)(105586002)(5250100002)(102836004)(81156014)(3280700002)(3660700001)(575784001)(14454004)(8676002)(55016002)(86362001)(99286004)(97736004)(2900100001)(7696005); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0801MB2055; H:DB6PR0801MB2053.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: K7v5jE8u4ajSgnRyEkjcBxSYruOp3TomJfiMrs28Ib969Ghy8AcBWE9wkZlp0jeO0E+WWf7b1ynUUsd3GvQo9w== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: d1e5954c-7b7c-49e0-07b0-08d5539b2027 X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Jan 2018 17:46:46.6633 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0801MB2055 This patch improves register allocation of fma by preferring to update the accumulator register. This is done by adding fma insns with operand 1 as the accumulator. The register allocator considers copy preferences only in operand order, so if the first operand is dead, it has the highest chance of being reused as the destination. As a result code using fma often has a better register allocation. Performance of SPECFP2017 improves by over 0.5% on some implementations, while it had no effect on other implementations. Fma is more readable too, in a simple example we now generate: fmadd s16, s2, s1, s16 fmadd s7, s17, s16, s7 fmadd s6, s16, s7, s6 fmadd s5, s7, s6, s5 instead of: fmadd s16, s16, s2, s1 fmadd s7, s7, s16, s6 fmadd s6, s6, s7, s5 fmadd s5, s5, s6, s4 Bootstrap OK. OK for commit? ChangeLog: 2018-01-04 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (fma4): Change into expand pattern. (fnma4): Likewise. (fms4): Likewise. (fnms4): Likewise. (aarch64_fma4): Rename insn, reorder accumulator operand. (aarch64_fnma4): Likewise. (aarch64_fms4): Likewise. (aarch64_fnms4): Likewise. (aarch64_fnmadd4): Likewise. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 382953e6ec42ae4475d66143be1e25d22e48571f..e773ec0c41559e47cf38e719dcb8c42d5bb4da49 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4743,57 +4743,94 @@ (define_insn "*aarch64_fcvt2_mult" [(set_attr "type" "f_cvtf2i")] ) -;; fma - no throw +;; fma - expand fma into patterns with the accumulator operand first since +;; reusing the accumulator results in better register allocation. +;; The register allocator considers copy preferences in operand order, +;; so this prefers fmadd s0, s1, s2, s0 over fmadd s1, s1, s2, s0. + +(define_expand "fma4" + [(set (match_operand:GPF_F16 0 "register_operand") + (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand") + (match_operand:GPF_F16 2 "register_operand") + (match_operand:GPF_F16 3 "register_operand")))] + "TARGET_FLOAT" +) -(define_insn "fma4" +(define_insn "*aarch64_fma4" [(set (match_operand:GPF_F16 0 "register_operand" "=w") - (fma:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w") - (match_operand:GPF_F16 2 "register_operand" "w") - (match_operand:GPF_F16 3 "register_operand" "w")))] + (fma:GPF_F16 (match_operand:GPF_F16 2 "register_operand" "w") + (match_operand:GPF_F16 3 "register_operand" "w") + (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" - "fmadd\\t%0, %1, %2, %3" + "fmadd\\t%0, %2, %3, %1" [(set_attr "type" "fmac")] ) -(define_insn "fnma4" +(define_expand "fnma4" + [(set (match_operand:GPF_F16 0 "register_operand") + (fma:GPF_F16 + (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand")) + (match_operand:GPF_F16 2 "register_operand") + (match_operand:GPF_F16 3 "register_operand")))] + "TARGET_FLOAT" +) + +(define_insn "*aarch64_fnma4" [(set (match_operand:GPF_F16 0 "register_operand" "=w") (fma:GPF_F16 - (neg:GPF_F16 (match_operand:GPF_F16 1 "register_operand" "w")) - (match_operand:GPF_F16 2 "register_operand" "w") - (match_operand:GPF_F16 3 "register_operand" "w")))] + (neg:GPF_F16 (match_operand:GPF_F16 2 "register_operand" "w")) + (match_operand:GPF_F16 3 "register_operand" "w") + (match_operand:GPF_F16 1 "register_operand" "w")))] "TARGET_FLOAT" - "fmsub\\t%0, %1, %2, %3" + "fmsub\\t%0, %2, %3, %1" [(set_attr "type" "fmac")] ) -(define_insn "fms4" + +(define_expand "fms4" + [(set (match_operand:GPF 0 "register_operand") + (fma:GPF (match_operand:GPF 1 "register_operand") + (match_operand:GPF 2 "register_operand") + (neg:GPF (match_operand:GPF 3 "register_operand"))))] + "TARGET_FLOAT" +) + +(define_insn "*aarch64_fms4" [(set (match_operand:GPF 0 "register_operand" "=w") - (fma:GPF (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w") - (neg:GPF (match_operand:GPF 3 "register_operand" "w"))))] + (fma:GPF (match_operand:GPF 2 "register_operand" "w") + (match_operand:GPF 3 "register_operand" "w") + (neg:GPF (match_operand:GPF 1 "register_operand" "w"))))] "TARGET_FLOAT" - "fnmsub\\t%0, %1, %2, %3" + "fnmsub\\t%0, %2, %3, %1" [(set_attr "type" "fmac")] ) -(define_insn "fnms4" +(define_expand "fnms4" + [(set (match_operand:GPF 0 "register_operand") + (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand")) + (match_operand:GPF 2 "register_operand") + (neg:GPF (match_operand:GPF 3 "register_operand"))))] + "TARGET_FLOAT" +) + +(define_insn "*aarch64_fnms4" [(set (match_operand:GPF 0 "register_operand" "=w") - (fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w")) - (match_operand:GPF 2 "register_operand" "w") - (neg:GPF (match_operand:GPF 3 "register_operand" "w"))))] + (fma:GPF (neg:GPF (match_operand:GPF 2 "register_operand" "w")) + (match_operand:GPF 3 "register_operand" "w") + (neg:GPF (match_operand:GPF 1 "register_operand" "w"))))] "TARGET_FLOAT" - "fnmadd\\t%0, %1, %2, %3" + "fnmadd\\t%0, %2, %3, %1" [(set_attr "type" "fmac")] ) ;; If signed zeros are ignored, -(a * b + c) = -a * b - c. -(define_insn "*fnmadd4" +(define_insn "*aarch64_fnmadd4" [(set (match_operand:GPF 0 "register_operand" "=w") - (neg:GPF (fma:GPF (match_operand:GPF 1 "register_operand" "w") - (match_operand:GPF 2 "register_operand" "w") - (match_operand:GPF 3 "register_operand" "w"))))] + (neg:GPF (fma:GPF (match_operand:GPF 2 "register_operand" "w") + (match_operand:GPF 3 "register_operand" "w") + (match_operand:GPF 1 "register_operand" "w"))))] "!HONOR_SIGNED_ZEROS (mode) && TARGET_FLOAT" - "fnmadd\\t%0, %1, %2, %3" + "fnmadd\\t%0, %2, %3, %1" [(set_attr "type" "fmac")] )