From patchwork Tue Aug 31 13:29:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1522622 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=sMqm1Mc1; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GzSkM4q88z9sPf for ; Tue, 31 Aug 2021 23:30:50 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 227B03858400 for ; Tue, 31 Aug 2021 13:30:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 227B03858400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1630416648; bh=soxrCyJob0tmXpo9ojCSgY3Zc5DHt8PdAsbcqnAas1A=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=sMqm1Mc12N+JKLwE3Jdha7x+0Oj5DdHC1wWc25E/iqkM5tefqTD4c67OeNPzzQfgW sZbst29FoBOQweH1EoPSPbk18fPQ98gvZwlS88nQObT39IEGLRzpHcUIYjWv8ET4Uo dXPJGp+BSQ8F6xK8QCMSO1WOwRiFtrV30NhUfLRc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2064.outbound.protection.outlook.com [40.107.20.64]) by sourceware.org (Postfix) with ESMTPS id 2D535385840C for ; Tue, 31 Aug 2021 13:29:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2D535385840C Received: from AM5PR0602CA0018.eurprd06.prod.outlook.com (2603:10a6:203:a3::28) by AM6PR08MB5015.eurprd08.prod.outlook.com (2603:10a6:20b:e5::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.18; Tue, 31 Aug 2021 13:29:51 +0000 Received: from VE1EUR03FT030.eop-EUR03.prod.protection.outlook.com (2603:10a6:203:a3:cafe::74) by AM5PR0602CA0018.outlook.office365.com (2603:10a6:203:a3::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.19 via Frontend Transport; Tue, 31 Aug 2021 13:29:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT030.mail.protection.outlook.com (10.152.18.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17 via Frontend Transport; Tue, 31 Aug 2021 13:29:50 +0000 Received: ("Tessian outbound 8b41f5fb4e9e:v103"); Tue, 31 Aug 2021 13:29:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: adff945387166af8 X-CR-MTA-TID: 64aa7808 Received: from 3a95b1f1f026.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 2D8F3947-0BF2-4708-A926-AE1A81E4C9AB.1; Tue, 31 Aug 2021 13:29:35 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3a95b1f1f026.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 31 Aug 2021 13:29:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DaJ92BHzli9TK0aqo4F3b9hgqAVlhhN1CipdDuqqLrLj9snZ73h+P5FMKwDrremI9ptFvlCoPOu/Z6eYpx4JyZVnMwp0yDuo5SqzWtg8mkBJFm12u42/NgyVJPL3wXmOTAIoBLN6uMuPN4DWl4EtHKBKmCMV89C4O1KjhxvPPZUT3j9ZWFvjNFCdn1VTVJ8jXsLem4mg4YW7B7ZWmyjPa1v3UruijhrfVztiSypNesII5YuoHLMW+e/W4o6P9fmIyUnNZjwsO4YAyELVaXAj3if3z/IbEg+dHINaQ+wbKPVfpQ8INTNxtuaOQuxKQ9YLzqXtOapPZWcJQNw5cNXlZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=soxrCyJob0tmXpo9ojCSgY3Zc5DHt8PdAsbcqnAas1A=; b=hyikYhvA7ZWFPCTTpUt9ih8QjdkniavtG6gaBX/ZaBhvp22Vq6JmA8PCDM5+SzHtuVPJWn9Th2Ro1RJKHgauI0Tpg5emEhmKrpvqQXjhEo/Hx3WpJXsdDJL8HDtuL6oo1AhtJZ8+bfMw8zT40DGSh6JNCJlSrUEOtauT96MC0iQg4Zof3LZh1DeoyRfrQvJxuLv5EllFXxfu1uAm4Gytc/IrzUUVzgOQXFtLn7WAQboz1SW+Zp2UExsTh+UbTcIxaiMOjdZTNNpoVAUNslXuAIfUezZUyMJYRs/HnCHQmC7/lQNkCgfH5tKUL46eMZcLJ68xxCIPcovlIvfjJYGrkg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VE1PR08MB5760.eurprd08.prod.outlook.com (2603:10a6:800:1af::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Tue, 31 Aug 2021 13:29:31 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a%4]) with mapi id 15.20.4457.024; Tue, 31 Aug 2021 13:29:30 +0000 Date: Tue, 31 Aug 2021 14:29:28 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2]middle-end Teach CSE to be able to do vector extracts. Message-ID: Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO4P123CA0326.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18c::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO4P123CA0326.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.20 via Frontend Transport; Tue, 31 Aug 2021 13:29:30 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 244166e1-61a0-4aed-3e86-08d96c836919 X-MS-TrafficTypeDiagnostic: VE1PR08MB5760:|AM6PR08MB5015: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:5516;OLM:5516; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: TlthD3pNmVTkrJYhOpa4c1WrIx2YTnE7kY354hTnKvsidqvmncgPMgboqMUvjVQfwxyfoqUUDthLrI70kq+Cn8vxGP2TU+O4KlxyFGwReQIWYuNQC7Ef+4wTCQ8j2oPb0gpgGXBCy1pKvNUQOtuvViRR/gt+bzZ9SvmZMM+CslqIrAhY3kqICHPw9ChnwTTqZAkkiSajc4GMwFReTKsr50c1VzP+vgZRTOaCEB6OW688h3gWdSx8oRF2aRbewAN1BvsSgI8nZ+hGYXYMTfNlFMDV7isZvLzXleVIPYJzLgWN1cIXaRaqQmmm1YVNV90rDNarL4bYNQoBYIOw240+xUzTHri0s1teT5cky2Agqzm1Prf8N+SiJXVgLd6tZuswER6+6/OAZNfdBnq7wuRkdj2pf6Or9NltgoGhuIhSWBTzj3/L2xkh20yBNE599jMqMolojOPo2cDLmuTFDOsBQOKvqVjzpI50EwoPx8KV2rNt6hJgH8Z0cyI2X9Iri/Oud/Oubp6uFxArrVaWiaAqCgt2HnTcgVuR6ls9V9S25ANZIudFJZ4sTHqhxnCjCrLLUsUXsvh4I8lWk1ISxeJ/mgl/mUsb7hoVawfkGLol64zHtQgLsqjYA/9OsJJjDejrD/5Y/JQZ72LpgkK+SJZBupn43ilBGuxjmhsrX9ZD9KIZuMpSQxY77N33dTs9ywBwQ/kn9nyoqX/RcOzpPU2IHCzBZrzpyqXuREy0EjO0uEjAz54YSt3dMvdLLMuYjf4a X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(39860400002)(366004)(396003)(346002)(44832011)(8676002)(956004)(2906002)(7696005)(4326008)(2616005)(52116002)(44144004)(66556008)(5660300002)(83380400001)(66476007)(235185007)(26005)(316002)(55016002)(6916009)(33964004)(186003)(8886007)(38100700002)(66946007)(8936002)(38350700002)(478600001)(66616009)(36756003)(86362001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?Q0gzl8AAqUdIqTbPNT4R6ULHAyUa?= =?utf-8?q?ScQzoeGKwoLD6dOBMSSa1tfgYWvt1E96lxB/eZSy/RwHljVLKqcynbEhCDwJCthmQ?= =?utf-8?q?CKJ4DLYY2QRmF0LzTZmWR1mCUYAhjFeE7bbuz7knGsk0cxQsvPZZjTc9FHEH40dyx?= =?utf-8?q?UBaOVi7uG2WL5Ks9bmSOpSMzK/5D48mf+XUu9uYtfYeOdLNGZV85N8LKWotkptPNy?= =?utf-8?q?IUMQXGYPp9LZ3l5xS2k50hg60nOKXzFUcxgaialnNWa3POIlYJCy/jPKY7SedvvIb?= =?utf-8?q?jHLu/EmcUFJP8KwuC965gWY02pGKmJJWrqUrAZEyROt9bqEEPkDSFOPMlLVPQavCj?= =?utf-8?q?s328Xy0LewoOrAxq8fE6rtW2v3grsMIhMM8N9bZtIaP5z22aQzi1//Qi6ZyLGqLcz?= =?utf-8?q?d3vVMz7epPwsq2vwRq23+IvJETEw+sq9NulyAoHfUi+1wx+x4W3Ml6ykoL2Dmz/2q?= =?utf-8?q?DqPsGMXThMTXJGbCsaunhqWWbWpA2yFFVMNIutuve7hEfaqghN1tFgAo3jyc5VskU?= =?utf-8?q?cae1ruy483xxLtAnhWf9mO2wrIMQMBjo6B13Fug1GEkCWiM4poO0qB3RSDEcUvn+/?= =?utf-8?q?0YLNTo8ZtjkNBnhlu75R6jLRN2WMVFqiRTXNQjoz2fZfUCnCiVjSNP4XJvOAWP5y1?= =?utf-8?q?Xb04413ahvG8l1xnw2Fb6FUyV7R6j+4w1Hj8B9Fx49Oc9Z3JVXPRd3DJP6eh2V6cd?= =?utf-8?q?1NvZWSMcqnUt6As/09dsTlmUrSOiBpQtfAMf+Ftq1Zvq9S88n3DnXpZt7PYiJP6Kr?= =?utf-8?q?x+VBpbBgY8pyUPV0l8ZHRfgDny9BxnxXCOKU5CY0j24IDPi4E54yZL+Ukek7QsW2D?= =?utf-8?q?sTRTYoZRzSTVBObdnqN5qB93oisTUguSVoTKoyNeesrTMyHfa/Ca0ZbZWRNj6zjaN?= =?utf-8?q?VSMK+r4wcaG5SRWswmsZNfvYOTIaPPIBSH8m+1I6Fx8qXwGsESI1vu35p6zJdZdQb?= =?utf-8?q?7mSFZXDqU2WPunVWLzhr1eowSV/HVLchHlVdP3QYnHVQ/+DMTlsycXjVCRX0ClTzd?= =?utf-8?q?q98ckX5Z8Tjllktn/mPHIQNEjimXnMiNTt3xVWllJMk6uN/RAzYtWyd+ze8i6nSnk?= =?utf-8?q?/eV1BVS/sfqvgg96/R6SolVozMDH0luXJYsFNK0/USEnQ5AqnO47mdZv0jDhhrDiP?= =?utf-8?q?OCl1f3WmdpP0g6kr4fqmTsq42BUgWRaxFY0J5B8/Fu4nVMzhI7f2LNL1tl7FLLG9n?= =?utf-8?q?1CmpyKh19w2aRANxacbWLz8WaYCQ5vjcGSK+MEMT97+0gJMzGSuZE4+KVnl0rk1lm?= =?utf-8?q?e52HM+ojnLhDv02b?= X-MS-Exchange-Transport-Forked: True X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5760 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 342e94fc-43a3-4bbc-4bf3-08d96c835d07 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 74pDQivYc20AeGHVZWQnfmRdaMD90PtNkf7PB7RK2o05ySOubB316Q66dLzj7dt0QGJ0AyJKLGoCCllA8P8NYY9hDNESIf88gyNV5xuxi2ly7+FTfXtkoulVeWNqMHZSDAkm3DfL5VAGDdR+qnuHRgdEWrsO6zNS+/2WyBzT4tC2O9OD1B6kykOYrpFZUXrBNqJDZrgMuMiT+Rt0I6pSNOoolRp3bPbaeE2NbLDUCxh+lGjAVj8t0Qqn7lSqnUq2tU96xaKgMJqox/flsOkyq2ucJFp4x0Su/zKejHlbbLGx4o6woUf7b5YRUn9entKyn9k/hWItN6mEFGJMPhCbzos13T/DDRHslcJVmd3CA5tkRReh93TfbMBUpOp011tXUaYfs37/WCjjQboL+OIIvbhAmgI6TKsJFBnVXR1yoUgmmbnFyoTf74hn9TP6+21hK/eY28pep8REC15iQ6UczAMJawq05vPbZGPTJaERbBz+b1UvCEi5VJ/8MWAWPEYHo15NUU0kLHkljHVpwCajq+hDretYvOzUB28udgc/EinVMjAG/ZmoqGMsJ/U+3Mwt+6Qe4NWLES4E/2UbPiCda/pYPWSj3t2m61U7S0e0uvJmReaoLkvyG/oZUsWJMIKZBY7ArSiiCZq/oMn6PfR5lTJGHx8Hn2/f5pwiP16zvRAi0rGUG5ECyN4bQxLnQEYlhn7ceo0PVsndxPUnYHPOgRT/2k4V5UDGJu3HWUNlzXoaKKZCHk7WFU5yrS6Dnh2P X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(376002)(346002)(39860400002)(396003)(136003)(46966006)(36840700001)(55016002)(82310400003)(70586007)(66616009)(356005)(8936002)(7696005)(44144004)(336012)(316002)(44832011)(33964004)(8676002)(47076005)(2616005)(36860700001)(83380400001)(186003)(235185007)(5660300002)(8886007)(2906002)(6916009)(36756003)(478600001)(86362001)(107886003)(81166007)(4326008)(26005)(956004)(82740400003)(70206006)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Aug 2021 13:29:50.8014 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 244166e1-61a0-4aed-3e86-08d96c836919 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT030.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB5015 X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: nd@arm.com, rguenther@suse.de Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, This patch gets CSE to re-use constants already inside a vector rather than re-materializing the constant again. Basically consider the following case: #include #include uint64_t test (uint64_t a, uint64x2_t b, uint64x2_t* rt) { uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL}; uint64_t res = a | arr[0]; uint64x2_t val = vld1q_u64 (arr); *rt = vaddq_u64 (val, b); return res; } The actual behavior is inconsequential however notice that the same constants are used in the vector (arr and later val) and in the calculation of res. The code we generate for this however is quite sub-optimal: test: adrp x2, .LC0 sub sp, sp, #16 ldr q1, [x2, #:lo12:.LC0] mov x2, 16502 movk x2, 0x1023, lsl 16 movk x2, 0x4308, lsl 32 add v1.2d, v1.2d, v0.2d movk x2, 0x942, lsl 48 orr x0, x0, x2 str q1, [x1] add sp, sp, 16 ret .LC0: .xword 667169396713799798 .xword 667169396713799798 Essentially we materialize the same constant twice. The reason for this is because the front-end lowers the constant extracted from arr[0] quite early on. If you look into the result of fre you'll find : arr[0] = 667169396713799798; arr[1] = 667169396713799798; res_7 = a_6(D) | 667169396713799798; _16 = __builtin_aarch64_ld1v2di (&arr); _17 = VIEW_CONVERT_EXPR(_16); _11 = b_10(D) + _17; *rt_12(D) = _11; arr ={v} {CLOBBER}; return res_7; Which makes sense for further optimization. However come expand time if the constant isn't representable in the target arch it will be assigned to a register again. (insn 8 5 9 2 (set (reg:V2DI 99) (const_vector:V2DI [ (const_int 667169396713799798 [0x942430810234076]) repeated x2 ])) "cse.c":7:12 -1 (nil)) ... (insn 14 13 15 2 (set (reg:DI 103) (const_int 667169396713799798 [0x942430810234076])) "cse.c":8:12 -1 (nil)) (insn 15 14 16 2 (set (reg:DI 102 [ res ]) (ior:DI (reg/v:DI 96 [ a ]) (reg:DI 103))) "cse.c":8:12 -1 (nil)) And since it's out of the immediate range of the scalar instruction used combine won't be able to do anything here. This will then trigger the re-materialization of the constant twice. To fix this this patch extends CSE to be able to generate an extract for a constant from another vector, or to make a vector for a constant by duplicating another constant. Whether this transformation is done or not depends entirely on the costing for the target for the different constants and operations. I Initially also investigated doing this in PRE, but PRE requires at least 2 BB to work and does not currently have any way to remove redundancies within a single BB and it did not look easy to support. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * cse.c (find_sets_in_insn): Register constants in sets. (cse_insn): Try materializing using vec_dup. --- inline copy of patch -- diff --git a/gcc/cse.c b/gcc/cse.c index 330c1e90ce05b8f95b58f24576ec93e10ec55d89..d76e01b6478e22e9dd5760b7c78cecb536d7daef 100644 diff --git a/gcc/cse.c b/gcc/cse.c index 330c1e90ce05b8f95b58f24576ec93e10ec55d89..d76e01b6478e22e9dd5760b7c78cecb536d7daef 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3. If not see #include "regs.h" #include "function-abi.h" #include "rtlanal.h" +#include "expr.h" /* The basic idea of common subexpression elimination is to go through the code, keeping a record of expressions that would @@ -4274,6 +4275,25 @@ find_sets_in_insn (rtx_insn *insn, struct set **psets) someplace else, so it isn't worth cse'ing. */ else if (GET_CODE (SET_SRC (x)) == CALL) ; + else if (GET_CODE (SET_SRC (x)) == CONST_VECTOR + && GET_MODE_CLASS (GET_MODE (SET_SRC (x))) != MODE_VECTOR_BOOL) + { + /* First register the vector itself. */ + sets[n_sets++].rtl = x; + rtx src = SET_SRC (x); + machine_mode elem_mode = GET_MODE_INNER (GET_MODE (src)); + /* Go over the constants of the CONST_VECTOR in forward order, to + put them in the same order in the SETS array. */ + for (unsigned i = 0; i < const_vector_encoded_nelts (src) ; i++) + { + /* These are templates and don't actually get emitted but are + used to tell CSE how to get to a particular constant. */ + rtx tmp = gen_rtx_PARALLEL (VOIDmode, + gen_rtvec (1, GEN_INT (i))); + rtx y = gen_rtx_VEC_SELECT (elem_mode, SET_DEST (x), tmp); + sets[n_sets++].rtl = gen_rtx_SET (y, CONST_VECTOR_ELT (src, i)); + } + } else sets[n_sets++].rtl = x; } @@ -4513,7 +4533,14 @@ cse_insn (rtx_insn *insn) struct set *sets = (struct set *) 0; if (GET_CODE (x) == SET) - sets = XALLOCA (struct set); + { + /* For CONST_VECTOR we wants to be able to CSE the vector itself along with + elements inside the vector if the target says it's cheap. */ + if (GET_CODE (SET_SRC (x)) == CONST_VECTOR) + sets = XALLOCAVEC (struct set, const_vector_encoded_nelts (SET_SRC (x)) + 1); + else + sets = XALLOCA (struct set); + } else if (GET_CODE (x) == PARALLEL) sets = XALLOCAVEC (struct set, XVECLEN (x, 0)); @@ -4997,6 +5024,26 @@ cse_insn (rtx_insn *insn) src_related_is_const_anchor = src_related != NULL_RTX; } + /* Try to re-materialize a vec_dup with an existing constant. */ + if (GET_CODE (src) == CONST_VECTOR + && const_vector_encoded_nelts (src) == 1) + { + rtx const_rtx = CONST_VECTOR_ELT (src, 0); + machine_mode const_mode = GET_MODE_INNER (GET_MODE (src)); + struct table_elt *related_elt + = lookup (const_rtx, HASH (const_rtx, const_mode), const_mode); + if (related_elt) + { + for (related_elt = related_elt->first_same_value; + related_elt; related_elt = related_elt->next_same_value) + if (REG_P (related_elt->exp)) + { + src_eqv_here + = gen_rtx_VEC_DUPLICATE (GET_MODE (src), + related_elt->exp); + } + } + } if (src == src_folded) src_folded = 0; From patchwork Tue Aug 31 13:30:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1522623 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=L6S34Z2Q; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GzSlV5f0wz9sPf for ; Tue, 31 Aug 2021 23:31:50 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3BEE23858428 for ; Tue, 31 Aug 2021 13:31:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3BEE23858428 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1630416708; bh=6JBwySxAyE+tb/8jMmadLx8AL2iYcNF5uBS6NJ197hc=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=L6S34Z2Q03Nfor1B6IUPS/6Gr7uXU8oN1VVkm9MmIEV/lPiyt3hRDAlC5tqdSvsYU eZ9C3YNjZI1S8zIeRG0rMUdYejUchG5Do7Ay/l70Snm3WjWEh3+cdyqDkM+7hGfD73 c3s4GC9NdNQlnH7nzcbVx2RWvcJ4HUEIHaWuBluQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2063.outbound.protection.outlook.com [40.107.22.63]) by sourceware.org (Postfix) with ESMTPS id 0F6FA3858022 for ; Tue, 31 Aug 2021 13:30:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F6FA3858022 Received: from AM6P195CA0014.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::27) by AS8PR08MB7173.eurprd08.prod.outlook.com (2603:10a6:20b:404::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Tue, 31 Aug 2021 13:30:19 +0000 Received: from VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::c5) by AM6P195CA0014.outlook.office365.com (2603:10a6:209:81::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4478.17 via Frontend Transport; Tue, 31 Aug 2021 13:30:19 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; gcc.gnu.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;gcc.gnu.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT039.mail.protection.outlook.com (10.152.19.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.17 via Frontend Transport; Tue, 31 Aug 2021 13:30:18 +0000 Received: ("Tessian outbound 56612e04f172:v103"); Tue, 31 Aug 2021 13:30:18 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4605509b256a9f9a X-CR-MTA-TID: 64aa7808 Received: from 2995964353a3.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 15DDAC8D-9C0B-41FF-A184-C4CE6EAAD251.1; Tue, 31 Aug 2021 13:30:05 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2995964353a3.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Tue, 31 Aug 2021 13:30:05 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dTi7tEeBLwX3FjTraeFqXkvaVBQr2LDG+avZijAV+xwm1JROL1vZtA4K2gUc/DCnpezrmmduBQew4O22FntK0HsArmSpYTtTfMuBGsvZCpwhFpH/buBZcCOEr2BIl8M7Ik4y+rBEvhET/OU3PmW8BEr2t5m/olgztGG/GarzT742gYDIBHix3MBRQEoPznTBf4MNNk+OKqS+MsIsHXZW8BjdYuz4riXGBPJRFzBJIbuOlemtAz8gsKwyVnRTlBaSLcL/9NROXW6QcpOH6ou+7/UY/kMskvK+LwqBlYnQy1LZ2N0Fh1wJH1UyBSEQM0ePupTG/0U5dvFZZtdSXun3gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=6JBwySxAyE+tb/8jMmadLx8AL2iYcNF5uBS6NJ197hc=; b=QQcE70nE05F9VH5TBQe4hIds45+k3JHYySt4GNZRmMxWwK4SC3O1JPgqdUACwT33V06tZsWs6r3pEl3KHV7KatXr+YFOKKclItu8stNbUR4mgEv7o8FMgo178SNV4BIfl352FSmxoKjJgLYB0WMXE6Vp4E7K4ctjWquAN5ybUNxRxvpbQFcjKWqzUqOIiFcSkjpBOsf7MIPphXMhVSHf9A2EcKyyows2MXyq71Awwk0S3QOCKiXAk0F/KtDEVLtYYYMEKczTpuHiN78suXYjO6mu4/Aa/SxkTHS0iLZ/k8EhlRIZqFG52niby+fMwx+6eqjz25TZ+d/N/6Kr4rAorA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by VE1PR08MB5760.eurprd08.prod.outlook.com (2603:10a6:800:1af::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4457.23; Tue, 31 Aug 2021 13:30:02 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::bd45:5ad5:f666:272a%4]) with mapi id 15.20.4457.024; Tue, 31 Aug 2021 13:30:02 +0000 Date: Tue, 31 Aug 2021 14:30:00 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2]AArch64: Add better costing for vector constants and operations Message-ID: Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-ClientProxiedBy: LO4P123CA0125.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:192::22) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from arm.com (217.140.106.53) by LO4P123CA0125.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:192::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4478.17 via Frontend Transport; Tue, 31 Aug 2021 13:30:02 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1f458e7e-6bcf-4aba-1037-08d96c8379c7 X-MS-TrafficTypeDiagnostic: VE1PR08MB5760:|AS8PR08MB7173: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Oob-TLC-OOBClassifiers: OLM:8882;OLM:8882; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Y6qPaclcM5rmT7xbdLLzWRPCBYt9q3ZWn+EtyWZSyWXbhm8Zb54/KZicTzKoGOo5ncjkXt2GJ9ixaCZXeovCwJFRk/tFUwB2EynBWpHU1ZW5OwEx9F+hDTzE+jZ9cd1vWaQx3BPVkGAB7ga8WYeu27+slvJ5+UwLyV8q7RBtwZ6GEfVqUfVKXye1HmRZ5cCwkMJM22dvZyuYz919KRCwEJnnzKDPf2/on+slhP/yQ4YumvRLelgsoCbk3EMMudc7KrFPS2GK/3faNHMoD+aH/l+Ymvah5v4EvVWhUJRIwxLKhynbZ/7/PrRGbtOB0xX0YYwk1d/3/u05FroPXuQEnXJxH5+mCP0CVP7pFvB4CPqrd2jPSpks7QHTbwue0l1UWeA8X9fRHWoMScNIoAuZVYHh4JQc1f70im9IPsvB8O4/3zGIPWXF7w4BLufIv8JCp8G6lCdJVvQXPoSDyhwz4XutUd2wW161/+RKeGsV7fr+PkTfIZjOW/AR6cARH+C7fsIEBc6FM7UlXIfEVF9LR5rA/2gehjLSh8R4xU6hUx0DeramikJbljM+raUmNSifRcGIxZg1AOZ6xR9X4CSSeg4rt783LA8TR43/otsIkRWwcNc/RG4iiHo6P8qEpA+nGiqCDDtEr1wdR/cWLDHa533HiY3AxPRyHbVPEQ1C3CbmsnQ55jRPODTCEiuR40ozj/vebg6LHDbIxg69+Vc9d3/LSolaWRquwCLmD2AacG06rToAOAyBXomXcBbSUmGCb2ZmCzo5GV2QQxGzGuDs9IgvhDzTvIqaHMhMLQ+Xjmvw2Tm5TI35oEm/MO0TuHdya2p5QI54+IuUOV/rjcN9bg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(136003)(376002)(39850400004)(366004)(396003)(346002)(4743002)(44832011)(8676002)(956004)(2906002)(30864003)(7696005)(4326008)(2616005)(52116002)(44144004)(66556008)(5660300002)(83380400001)(66476007)(235185007)(26005)(316002)(55016002)(6916009)(33964004)(186003)(8886007)(38100700002)(66946007)(8936002)(38350700002)(478600001)(66616009)(36756003)(86362001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?SRbxguaveu2oeIR+bztNmH9IzYuY?= =?utf-8?q?LXCx2dSBZOIPda8TIW42LNnH3Kv4yhbWduR+8OOJnZn0XzwywKZxb1skcSN0LhT6Z?= =?utf-8?q?BiA6GXTe4H5a7zeaGVfFPdgAC8/VmFPYZOZh77EtagiQ39xlN82Iab0bRoXLdjpWy?= =?utf-8?q?ADjZNYJPX3LkxIt/oLPt9WRagP3pZxVhLEITjBfx14JLVilOfydO+bJoTPj+Ar/Rx?= =?utf-8?q?D8DoFdo4GTePAuUGHsipI/XqimTJFQn0JbLdym1+Sk7tlKftWAMzP7wktydARZ+/O?= =?utf-8?q?rJhpACPQvPgo5cqp059rpN27HIdqr75jVRuymn/Y4E8XVVOYzeOANWceFaG8ApJHt?= =?utf-8?q?/xDTbL5YMLbxnG0CV1NVHeZ3UUPnP77pwIbj+120qq7DikqmX9j1vv+bBLepFYK7F?= =?utf-8?q?jsNUE01NxxGfiR6AoS8R9M6V24F5UQsWusNWxd/f+hg2T1B6/+ISsLaFDX2uZLFdT?= =?utf-8?q?8YMDY94AFHak92yRafQ/jLt9BcKc23yDJfNmx7O5IdqJgz+KKHAs1Q+6Nppk313lY?= =?utf-8?q?uaprwfjqgjj19Z1WZ6WeqNl5dLAgWPp3Lw+j3Ln8Rt0Xko5JMNhjwbyNA3XGjN56g?= =?utf-8?q?RmmrnAR4+fjyiwkUHD8a5NjhKVLYQbb26yJ0xXbyARsqMHUemAMpxWBtMGr0LGeqi?= =?utf-8?q?QrnhlBW2akjtgcAlLPAsaNSB+H8/8jrUkB1YkfJh0LHtqc+NpMyIkPb8S+X4vIPrO?= =?utf-8?q?Wj0bO4+iD0zku/P/qELNWk5jSZZ1hLQ6RiSuje5VKhQtAVzD5OjSVLmOG2cLxKpYJ?= =?utf-8?q?CyYAhVabmwmrrJxGlUPTT8rU3VYIrhE4QSfsgaHErAs3CqgBiqcwB4uk63unQn6gb?= =?utf-8?q?U0PztVnP8OPSr2kHyUsI+gUDn2++eFzdHnN7tADXPg4aQDaqBLvVOjMkqzbU2Q31I?= =?utf-8?q?oJL8ziyZmE7LaPhu1BZn44aE3UUhjRy3L4Q7WkVegHQSvkLq/sV/iJQxL9gBLtSkT?= =?utf-8?q?HtuK9thHYrwbJGjd0OpcyqAFipOWSADpHgeeSlwvFauWer5znOE+yNjucHoGDdWus?= =?utf-8?q?4KXkQld7GqKtXTicTCT84Bi3z6assFyyq9x8hjuZ+KjJmglhkWWaBwgBW+f1yUFgv?= =?utf-8?q?G26DraXAkkK1RnXtgTu6X/sCK/9ej6t7WahcEKz0NQcIcBYUnIZhsNZj3YNHgaVE+?= =?utf-8?q?XzqHLFvsZ4+DmbLVHffEZb8xERI7Zggz7xA4s5oswEIGFD4a6dSli0DJGSJEMBuV7?= =?utf-8?q?mofd3af9/Pj5OYZvET4H3qtET6wfvphqNpsgDGVPPIjB6zpG+sEI945dDQFKqGLF1?= =?utf-8?q?LKb7CTAtSoNZIICy?= X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5760 Original-Authentication-Results: gcc.gnu.org; dkim=none (message not signed) header.d=none;gcc.gnu.org; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: af0d48e4-4cb5-43b9-af73-08d96c836ffe X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: U3ul/erA08Rl48Zp2LJKH1LpADOgIR8PqjXBtPnGDV2K8LLPEjPwprgOCe/HNPtaKOo/eVSR2nUlD9B3WQuewrBAXgSUTc/vQfWL7lnd2Hz+OPMDYd6WFCBzyCM3hL5zlB2f/iLZ5oRuNiOgujJmDF5PF6FmCzpiz/W6BfEwdOxOj5bQYPnX3jXAKkzfKh2J58Q5+yX89QJs72qmsrOj1Ind2iDVIRi27N2ztPijrIlbWOl+yT0iG/TgMbmv5WvykCvevU+YI58PVTdWX/OwKgCQOlVyEUHaVhlIczDNXqvUi5j7wSYowToiCGgbXokWmARp7PV8kLN8qIYh97dgc82dfzKm+ZB0/AWzMaIX8yuLQmYDedXIFGhlUDOvMc/q5ZQDi+LdWfwSkSKVr4XaSgJuMbM/3ftYiAENYnAFsrTYlqo2SHOj9yZqAw1FAfFoQzDtttOoG0yANeFV20/DUHWgWO8d1y67rcHiBc7C5za36hycKu3lHx2IUcvmWuJUCqTMJ1JyLIOTZGxfBpEl1ePEuEWtVf/jd8+eZCT0VhRxY/Z5SLrc68+e+y1iPUS/qHc6CQXdN+dmeF+YOsW5ChdTH/nEvwchjZVO5s9RmoYrFOquqnZCzwQvlFAjL3DAUcKuyHQfduPYMSUD4IWRgUFMKH3pBCGA6jPUvsEdIXw9LXLFlyU+g6IGaEHN2xnw7C3+Wmc6q4IGg42fnCyijI35716tmtI4vAGPH00WIZfJ2bb/QjBWx+R6ohZPkuj+bQkjn6VltcayYxQqAdPbjWF6nF+GvNGp3BR9L8g9yti8bgKAgwF1zBFglAVa7JMHdYJRCeidEWGfxgt2unUiMA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(396003)(376002)(346002)(39860400002)(136003)(46966006)(36840700001)(86362001)(44144004)(6916009)(33964004)(316002)(55016002)(2616005)(26005)(36860700001)(82310400003)(8886007)(186003)(4326008)(36756003)(7696005)(82740400003)(83380400001)(30864003)(478600001)(956004)(44832011)(81166007)(356005)(5660300002)(4743002)(2906002)(235185007)(8936002)(336012)(66616009)(47076005)(8676002)(70206006)(70586007)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Aug 2021 13:30:18.7817 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1f458e7e-6bcf-4aba-1037-08d96c8379c7 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT039.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7173 X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, MSGID_FROM_MTA_HEADER, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, This patch adds extended costing to cost the creation of constants and the manipulation of constants. The default values provided are based on architectural expectations and each cost models can be individually tweaked as needed. The changes in this patch covers: * Construction of PARALLEL or CONST_VECTOR: Adds better costing for vector of constants which is based on the constant being created and the instruction that can be used to create it. i.e. a movi is cheaper than a literal load etc. * Construction of a vector through a vec_dup. * Extraction of part of a vector using a vec_select. In this part we had to make some opportunistic assumptions. In particular we had to model extracting of the high-half of a register as being "free" in order to get fusion using NEON high-part instructions possible. In the event that there is no 2 variant for the instruction the select would still be cheaper than the load. Unfortunately on AArch64 you need -O3 when using intrinsics for this to kick in until we fix vld1/2/3 to be gimple instead of RTL intrinsics. This should also fix the stack allocations. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/aarch-common-protos.h (struct vector_cost_table): Add movi, dup and extract costing fields. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs, thunderx2t99_extra_costs, thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs): Use them. * config/arm/aarch-cost-tables.h (generic_extra_costs, cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, exynosm1_extra_costs, xgene1_extra_costs): Likewise * config/aarch64/aarch64-simd.md (aarch64_simd_dup): Add r->w dup. * config/aarch64/aarch64.c (aarch64_simd_make_constant): Expose. (aarch64_rtx_costs): Add extra costs. (aarch64_simd_dup_constant): Support check only mode. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-cse-codegen.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389 100644 diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ - COSTS_N_INSNS (4) /* Mult. */ + COSTS_N_INSNS (4), /* Mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* Alu. */ - COSTS_N_INSNS (4) /* Mult. */ + COSTS_N_INSNS (4), /* Mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c5638d096fa84a27b4ea397f62cd0d05a28e7c8c..6814dae079c9ff40aaa2bb625432bf9eb8906b73 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup" ) (define_insn "aarch64_simd_dup" - [(set (match_operand:VDQF_F16 0 "register_operand" "=w") + [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w") (vec_duplicate:VDQF_F16 - (match_operand: 1 "register_operand" "w")))] + (match_operand: 1 "register_operand" "w,r")))] "TARGET_SIMD" - "dup\\t%0., %1.[0]" - [(set_attr "type" "neon_dup")] + "@ + dup\\t%0., %1.[0] + dup\\t%0., %1" + [(set_attr "type" "neon_dup, neon_from_gp")] ) (define_insn "aarch64_dup_lane" diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f80de2ca8971086d6a4bf3aa7793d0cda953b5c8..26d78ffe98a3445dcc490c93849c46a8c2595cf8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -302,6 +302,7 @@ static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64); static bool aarch64_print_address_internal (FILE*, machine_mode, rtx, aarch64_addr_query_type); static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val); +static rtx aarch64_simd_make_constant (rtx, bool); /* Major revision number of the ARM Architecture implemented by the target. */ unsigned aarch64_architecture_version; @@ -12665,7 +12666,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer ATTRIBUTE_UNUSED, rtx op0, op1, op2; const struct cpu_cost_table *extra_cost = aarch64_tune_params.insn_extra_cost; - int code = GET_CODE (x); + rtx_code code = GET_CODE (x); scalar_int_mode int_mode; /* By default, assume that everything has equivalent cost to the @@ -13936,8 +13937,65 @@ cost_plus: mode, MULT, 1, speed); return true; } + break; + case PARALLEL: + /* Fall through */ + case CONST_VECTOR: + { + rtx gen_insn = aarch64_simd_make_constant (x, true); + /* Not a valid const vector. */ + if (!gen_insn) + break; - /* Fall through. */ + switch (GET_CODE (gen_insn)) + { + case CONST_VECTOR: + /* Load using MOVI/MVNI. */ + if (aarch64_simd_valid_immediate (x, NULL)) + *cost += extra_cost->vect.movi; + else /* Load using constant pool. */ + *cost += extra_cost->ldst.load; + break; + /* Load using a DUP. */ + case VEC_DUPLICATE: + *cost += extra_cost->vect.dup; + break; + default: + *cost += extra_cost->ldst.load; + break; + } + return true; + } + case VEC_CONCAT: + /* depending on the operation, either DUP or INS. + For now, keep default costing. */ + break; + case VEC_DUPLICATE: + *cost += extra_cost->vect.dup; + return true; + case VEC_SELECT: + { + /* cost subreg of 0 as free, otherwise as DUP */ + rtx op1 = XEXP (x, 1); + int nelts; + if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN) + || (BYTES_BIG_ENDIAN + && GET_MODE_NUNITS (mode).is_constant(&nelts) + && INTVAL (op1) == nelts - 1)) + ; + else if (vec_series_lowpart_p (mode, GET_MODE (op1), op1)) + ; + else if (vec_series_highpart_p (mode, GET_MODE (op1), op1)) + /* Selecting the high part is not technically free, but we lack + enough information to decide that here. For instance selecting + the high-part of a vec_dup *is* free or to feed into any _high + instruction. Both of which we can't really tell. That said + have a better chance to optimize an dup vs multiple constants. */ + ; + else + *cost += extra_cost->vect.extract; + return true; + } default: break; } @@ -20663,9 +20721,12 @@ aarch64_builtin_support_vector_misalignment (machine_mode mode, /* If VALS is a vector constant that can be loaded into a register using DUP, generate instructions to do so and return an RTX to - assign to the register. Otherwise return NULL_RTX. */ + assign to the register. Otherwise return NULL_RTX. + + If CHECK then the resulting instruction may not be used in + codegen but can be used for costing. */ static rtx -aarch64_simd_dup_constant (rtx vals) +aarch64_simd_dup_constant (rtx vals, bool check = false) { machine_mode mode = GET_MODE (vals); machine_mode inner_mode = GET_MODE_INNER (mode); @@ -20677,7 +20738,8 @@ aarch64_simd_dup_constant (rtx vals) /* We can load this constant by using DUP and a constant in a single ARM register. This will be cheaper than a vector load. */ - x = copy_to_mode_reg (inner_mode, x); + if (!check) + x = copy_to_mode_reg (inner_mode, x); return gen_vec_duplicate (mode, x); } @@ -20685,9 +20747,12 @@ aarch64_simd_dup_constant (rtx vals) /* Generate code to load VALS, which is a PARALLEL containing only constants (for vec_init) or CONST_VECTOR, efficiently into a register. Returns an RTX to copy into the register, or NULL_RTX - for a PARALLEL that cannot be converted into a CONST_VECTOR. */ + for a PARALLEL that cannot be converted into a CONST_VECTOR. + + If CHECK then the resulting instruction may not be used in + codegen but can be used for costing. */ static rtx -aarch64_simd_make_constant (rtx vals) +aarch64_simd_make_constant (rtx vals, bool check = false) { machine_mode mode = GET_MODE (vals); rtx const_dup; @@ -20719,7 +20784,7 @@ aarch64_simd_make_constant (rtx vals) && aarch64_simd_valid_immediate (const_vec, NULL)) /* Load using MOVI/MVNI. */ return const_vec; - else if ((const_dup = aarch64_simd_dup_constant (vals)) != NULL_RTX) + else if ((const_dup = aarch64_simd_dup_constant (vals, check)) != NULL_RTX) /* Loaded using DUP. */ return const_dup; else if (const_vec != NULL_RTX) diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 6be5fb1e083d7ff130386dfa181b9a0c8fd5437c..55a470d8e1410bdbcfbea084ec11b468485c1400 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -133,6 +133,9 @@ struct vector_cost_table { const int alu; const int mult; + const int movi; + const int dup; + const int extract; }; struct cpu_cost_table diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index 25ff702f01fab50d749b9a7b7b072c2be2504562..0e6a62665c7e18debc382a294a37945188fb90ef 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -122,7 +122,10 @@ const struct cpu_cost_table generic_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -226,7 +229,10 @@ const struct cpu_cost_table cortexa53_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -330,7 +336,10 @@ const struct cpu_cost_table cortexa57_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -434,7 +443,10 @@ const struct cpu_cost_table cortexa76_extra_costs = /* Vector */ { COSTS_N_INSNS (1), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -538,7 +550,10 @@ const struct cpu_cost_table exynosm1_extra_costs = /* Vector */ { COSTS_N_INSNS (0), /* alu. */ - COSTS_N_INSNS (4) /* mult. */ + COSTS_N_INSNS (4), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; @@ -642,7 +657,10 @@ const struct cpu_cost_table xgene1_extra_costs = /* Vector */ { COSTS_N_INSNS (2), /* alu. */ - COSTS_N_INSNS (8) /* mult. */ + COSTS_N_INSNS (8), /* mult. */ + COSTS_N_INSNS (1), /* movi. */ + COSTS_N_INSNS (2), /* dup. */ + COSTS_N_INSNS (2) /* extract. */ } }; diff --git a/gcc/testsuite/gcc.target/aarch64/vect-cse-codegen.c b/gcc/testsuite/gcc.target/aarch64/vect-cse-codegen.c new file mode 100644 index 0000000000000000000000000000000000000000..36e468aacfadd7701c6a7cd432bee81472111a16 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-cse-codegen.c @@ -0,0 +1,127 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -march=armv8.2-a+crypto -fno-schedule-insns -fno-schedule-insns2 -mcmodel=small" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +**test0: +** movi v2.16b, 0x3 +** ldr q0, \[x0\] +** uxtl v1.8h, v0.8b +** uxtl2 v0.8h, v0.16b +** ldr q3, \[x1\] +** umlal v1.8h, v3.8b, v2.8b +** umlal2 v0.8h, v3.16b, v2.16b +** addhn v0.8b, v1.8h, v0.8h +** str d0, \[x2\] +** ret +*/ + +void test0 (uint8_t *inptr0, uint8_t *inptr1, uint8_t *outptr0) +{ + uint8x16_t three_u8 = vdupq_n_u8(3); + uint8x16_t x = vld1q_u8(inptr0); + uint8x16_t y = vld1q_u8(inptr1); + uint16x8_t x_l = vmovl_u8(vget_low_u8(x)); + uint16x8_t x_h = vmovl_u8(vget_high_u8(x)); + uint16x8_t z_l = vmlal_u8(x_l, vget_low_u8(y), vget_low_u8(three_u8)); + uint16x8_t z_h = vmlal_u8(x_h, vget_high_u8(y), vget_high_u8(three_u8)); + vst1_u8(outptr0, vaddhn_u16(z_l, z_h)); +} + +/* +**test1: +** sub sp, sp, #16 +** adrp x2, .LC0 +** ldr q1, \[x2, #:lo12:.LC0\] +** add v0.2d, v1.2d, v0.2d +** str q0, \[x1\] +** fmov x1, d1 +** orr x0, x0, x1 +** add sp, sp, 16 +** ret +*/ + +uint64_t +test1 (uint64_t a, uint64x2_t b, uint64x2_t* rt) +{ + uint64_t arr[2] = { 0x0942430810234076UL, 0x0942430810234076UL}; + uint64_t res = a | arr[0]; + uint64x2_t val = vld1q_u64 (arr); + *rt = vaddq_u64 (val, b); + return res; +} + +/* +**test2: +** adrp x2, .LC1 +** ldr q1, \[x2, #:lo12:.LC1\] +** add v0.2d, v0.2d, v1.2d +** str q0, \[x1\] +** fmov x1, d1 +** orr x0, x0, x1 +** ret +*/ + +uint64_t +test2 (uint64_t a, uint64x2_t b, uint64x2_t* rt) +{ + uint64x2_t val = vdupq_n_u64 (0x0424303242234076UL); + uint64_t arr = vgetq_lane_u64 (val, 0); + uint64_t res = a | arr; + *rt = vaddq_u64 (val, b); + return res; +} + +/* +**test3: +** sub sp, sp, #16 +** adrp x2, .LC2 +** ldr q1, \[x2, #:lo12:.LC2\] +** add v0.4s, v1.4s, v0.4s +** str q0, \[x1\] +** fmov w1, s1 +** orr w0, w0, w1 +** add sp, sp, 16 +** ret +*/ + +uint32_t +test3 (uint32_t a, uint32x4_t b, uint32x4_t* rt) +{ + uint32_t arr[4] = { 0x094243, 0x094243, 0x094243, 0x094243 }; + uint32_t res = a | arr[0]; + uint32x4_t val = vld1q_u32 (arr); + *rt = vaddq_u32 (val, b); + return res; +} + +/* +**test4: +** ushr v0.16b, v0.16b, 7 +** mov x0, 16512 +** movk x0, 0x1020, lsl 16 +** movk x0, 0x408, lsl 32 +** movk x0, 0x102, lsl 48 +** fmov d1, x0 +** pmull v2.1q, v0.1d, v1.1d +** dup v1.2d, v1.d\[0\] +** pmull2 v0.1q, v0.2d, v1.2d +** trn2 v2.8b, v2.8b, v0.8b +** umov w0, v2.h\[3\] +** ret +*/ + +uint64_t +test4 (uint8x16_t input) +{ + uint8x16_t bool_input = vshrq_n_u8(input, 7); + poly64x2_t mask = vdupq_n_p64(0x0102040810204080UL); + poly64_t prodL = vmull_p64((poly64_t)vgetq_lane_p64((poly64x2_t)bool_input, 0), + vgetq_lane_p64(mask, 0)); + poly64_t prodH = vmull_high_p64((poly64x2_t)bool_input, mask); + uint8x8_t res = vtrn2_u8((uint8x8_t)prodL, (uint8x8_t)prodH); + return vget_lane_u16((uint16x4_t)res, 3); +} +