From patchwork Thu Jun 16 10:49:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1644265 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=iTr1m+RC; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LNzVT5752z9sGC for ; Thu, 16 Jun 2022 20:50:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6EC4A385C325 for ; Thu, 16 Jun 2022 10:50:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6EC4A385C325 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655376655; bh=981oDOGNOOktRct0tIq3fH8DQVgQ1g2K22IiMXeaWvs=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=iTr1m+RC8E8+lWx24j4KqJRgSaBbTJtm9ryBTkeZGkCepr0DmO85kvjqwyUvSJaIZ boSmcD5xUP3hTvtlFqfr0HJv+W8lc8bYWmjZI4coE3xaJYvM6Rrxni9FB6inj8/DVd ZhFD4NaP29pLMvPTsJu15xEwlkP1HgZkbOaM9ciw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2063.outbound.protection.outlook.com [40.107.21.63]) by sourceware.org (Postfix) with ESMTPS id D892A3856091 for ; Thu, 16 Jun 2022 10:50:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D892A3856091 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=imlBuJXcCriqkw5x6fkpAvi4AyOPm4K2/E1sGFDCWo2KqRNAAp2CXeEOv7ZOOzI7976QMXh1ZuNK5u6KRH2mxBHbr11kpIKkUvVBL78NVYQSUPejgUTwBPZn6qLqNsQDN0QX5ka+tpSWriajrBnwV8t+0qRbyMjDpHsl53RxVwJKW4A5o6xnF3x4aBZsNrpMxLClMdR0aTd8gFDgX3e0+tMqn6HEJsVGGbK30RU7YMuAU1nzMv16r+ux9t5JQb5d6yvZiDhGls9n40AKr8HvZn++rszyanXmxPPDDDyyFPbwTfumGCV5OeJrbNRIRMOQ5n1IPeoAENj5K+ZP8JOtPw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=981oDOGNOOktRct0tIq3fH8DQVgQ1g2K22IiMXeaWvs=; b=iwW15ikkfGa7MTWs3nmu/iSShYKBgGasFp2QZ0XW0AXqb+U4t2KdSBzLMnInsYB38/u1m20/xipO9CVPp1ChHCzD7qfIRjzCf7OXJAG/Ho/Eqk8FV1cAx9umih9Xz0/B7WlcsfdWGVaGzT6+8diC6bkIOzAUK21boC3+vLQtzwUnSDgXL6V0y9wzqm/DfLfKUt55PsogdAZvWkrMyKfiUrTNZ4n+8Uqspv1UkOKsjWrX+U9fDCQUD2gbh1ie5+SjEaodSRtw57QHWbYBc4PXQEA+0AM1MWo58fQksmH6JHvq6r6Up6WCmE4Uycv5HPt6pU2+XKwU7Hz27T2d20GjlQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AS9PR04CA0140.eurprd04.prod.outlook.com (2603:10a6:20b:48a::8) by VE1PR08MB5629.eurprd08.prod.outlook.com (2603:10a6:800:1a6::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.15; Thu, 16 Jun 2022 10:50:00 +0000 Received: from VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:48a:cafe::ca) by AS9PR04CA0140.outlook.office365.com (2603:10a6:20b:48a::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.15 via Frontend Transport; Thu, 16 Jun 2022 10:50:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT064.mail.protection.outlook.com (10.152.19.210) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14 via Frontend Transport; Thu, 16 Jun 2022 10:49:59 +0000 Received: ("Tessian outbound 1766a3bff204:v120"); Thu, 16 Jun 2022 10:49:59 +0000 X-CR-MTA-TID: 64aa7808 Received: from 3dd32e84903e.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3087299D-DE62-452E-A759-BD69AB00A44C.1; Thu, 16 Jun 2022 10:49:34 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 3dd32e84903e.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Jun 2022 10:49:34 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=A+KpHa2nhR/zuH1HxYaWVAMYf6WPiDxHP2LaWfcK3PwABKTHJNCYLdgp9DSRcCjcC6ASs14QKgU3QwrN7ze1BZdF03J1qXID4F9nNcLrgJxceyDQ3EMwqh06indIXht0wNNrNaFBmHyVrhXKUiBmCUgM4p5TzRE9PEN7+55Om3qmb+PCOrcjC4sCxAGiqxX18cLOcohnEbcwSQE2v4YuNURFHOWhMWgjcvoBtG8SDLK3yhJ3DDYN0ZUg8CVRat5RCsdziFVIFLSXDccHY5YTVtxtBY9N5oChNFRwchL3VAu1V8uT3S2wceXwQNj6ThDH5r6M0eFC1ZPzt6gjR7nF5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=981oDOGNOOktRct0tIq3fH8DQVgQ1g2K22IiMXeaWvs=; b=BvBV+0OK/5f45KMsJJ6Wemww7BTSXEBGTqiwIPAPUTLJ+A0MtMu2A1Gnf6b1Def6C+IWwPfN34/+ACcIa4F8WQsdIK4QI9ps5GbBP40ywMNYr4COgXJpeXGywn5ukIaGBV+BooXsvk52ovcONBKg76Tys4RPlt7wvMXqyi/VZELkLCRVM9nPuXOHDwF8IZYeaXTqWzlvpAmL18yJzj23g5q7l3MA4sKWqBIKirSTGyGt4pNZjEWlFEZRp4on8HNVf7KU04kSKKjpklyDb4uRqTzdtbMRugmuLSsCHeSHDjJbvpRemJwH4euyRd7JdmRDMwjE8Xar7qXNcQk4xX5GBA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAXPR08MB7334.eurprd08.prod.outlook.com (2603:10a6:102:231::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14; Thu, 16 Jun 2022 10:49:25 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%8]) with mapi id 15.20.5353.014; Thu, 16 Jun 2022 10:49:25 +0000 Date: Thu, 16 Jun 2022 11:49:17 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2] Add SVE fallback case using sdot for usdot Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SN4PR0201CA0019.namprd02.prod.outlook.com (2603:10b6:803:2b::29) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 2c5e92ed-6717-48c0-203c-08da4f85f5e8 X-MS-TrafficTypeDiagnostic: PAXPR08MB7334:EE_|VE1EUR03FT064:EE_|VE1PR08MB5629:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 7Y2ny2GXUNkZyZgbAoV0Zc01v8cVXU+I/OgmjraqZzRZXSH9CCtC6K0+PxM2A9OVBJ/H3AwxXrouOJwF7oKzYgNb9Be00Dd//xu8GYmbM/SIRNa9cIuGyfInBXtOdJdiBVYgIqe1hdy/kC+CaGuWBt3sNmKkbjyglyaWoN2mCAHpkKJFq0h7DNQf6fR0bdO116QB60PrQNRt7+CtiCzwFVVkZkP5GR8rdJXchaLk2kpE8a4TOfrb7/nKuEvrrW3d1AAhWrocOxY3y34+otXhSCQBUAbPMHuUqUmyNAgHmFBYDS7TjR77fdDq8Qunf0ksV+/m1UdltHZqW+wNR8TEovoGxcrKigPOsvvxsjmk4lmZZD/KH6LOHD+GYEIM1KkdkmxYJdibXwIChHs+7yU5Eow6wpPnkNhGpwwBeUcJO8Nqs3faoW8am1cYrJucYZy+UncwtBHBaQs2YME6LDqRnZQ2mNu5ALexNia8RW+e6qr7c4LwQ09pXOXGJG6SsaNQJYf2Hbp0kjPuea+rQr/j8e542uZHBCO+LKQqrtdTHq9NdybD0uBYpST/wxz/2fjYdlFcFWZi/dhbfpDTacQkKanAnuqVi96rYVpC45stQA8+1b5XlHm8KoauhxUEC78+98zv5u6zizX9ZNW7WMeDgka5R8YuPFoISy2e1vY7gzKBHTxakTXi63RTCeaXu2MTLCyIyJCvL3dIpnuVOZXSGAteLjDIk5m6rnIL4OqdqM5cBTq03BMmkZOnmZFeeMBl5yRtgluvqqDfLaBAqqeueHdQIlrUo8lw3QI737UuZRhHo7xxfIqVxM2XW7zLIwnh78opjVboKfjerHIoj1gJsA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(38100700002)(44144004)(6506007)(26005)(33964004)(2906002)(6512007)(2616005)(4743002)(235185007)(36756003)(44832011)(8936002)(4326008)(5660300002)(316002)(66556008)(84970400001)(8676002)(6916009)(66476007)(86362001)(66946007)(6666004)(508600001)(186003)(6486002)(2700100001)(473944003)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB7334 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e15f89d6-99d0-4858-f57f-08da4f85e0fd X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: pQFMtDQRuaYr4o81jXTBWsVGV4VC6jiDsyDTWTgXBqp+5HY8h0z+RMsogZotNYSr0FUGT53JXCTd6CaqTONIVrWhKkjQgWh9/wl9ZXRFqoO4gA52e1d+euhjkxAZDa/LBTY9yge81c1QKvuYVQot0MfJljgrwLXmg+vD46Jxg1Cd+t84rCCcl2/XypAC2b9i6Y30QR+6dBd9kYfVIDnxE/VYXLziBboELyPjsw7qtxkA85SgGRIi8dTxLasCBnA9mRpgB3pd40BTCZDwHF5fpsiJ+K81w+wKQDYQsfdo4GwRLCAQwIsjCr6gqCf+GJIRmDQRSnwolWSVbDirc1CoUmWta1UsIOWW975fcYk4iIkMtgeST6Dj/YJt3NygGvVrKoaL+uIqIz5UXjr3JhNWZwr9DmWlrkRIMOdZ0z2VxHf6q+/mWTAHtzrcJ6nTTO2atmXsV/LUVti2GVI/YEKBcDtF3b8CL5LKgVQTxmRkbBm+0toDcM7WCgVb5idCcMLt74qTk7W/10T2N8T/+CY6E5hL6zfIqNWnaU8oU/gyPlDWirr4PGzcHcyjdE5M3yOXzjVGwvmRamB/FGttHCcoLZ0jaWFB8pTlHjMieaw9EnUI6AOhaEJqEFEi43KE9eYIEbFS28C4S3sVHmo/x4IIvhrr8SDWlFYNWavd6NQ/Ll8YxU6MCM35rSyCZO5ys2EWmB8ZuTxDRQy19t3GLqldSphv8OvPafDlod85PQzaJ5bxsaNpgXAFQzdeRrA928yWyfTZxMFVXsYPy63X3rOqzL1PUw9D29gA8u+BXnYUDRz7t7CFUrlzzE/RrkJj8I2VLSjr0/taGVMI8L0+SgtHSHl7pzAMEYG9USjv3efHHTM= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230016)(4636009)(46966006)(40470700004)(36840700001)(84970400001)(6486002)(6506007)(6512007)(4743002)(44144004)(44832011)(508600001)(26005)(235185007)(5660300002)(86362001)(40460700003)(33964004)(8936002)(81166007)(6666004)(36860700001)(2906002)(356005)(82310400005)(47076005)(336012)(2616005)(186003)(316002)(36756003)(6916009)(4326008)(70206006)(70586007)(8676002)(2700100001)(473944003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2022 10:49:59.9659 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2c5e92ed-6717-48c0-203c-08da4f85f5e8 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT064.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5629 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, The usdot operation is common in video encoder and decoders including some of the most widely used ones. This patch adds a +dotprod version of the optab as a fallback for when you do have sdot but not usdot available. The fallback works by adding a bias to the unsigned argument to convert it to a signed value and then correcting for the bias later on. Essentially it relies on (x - 128)y + 128y == xy where x is unsigned and y is signed (assuming both are 8-bit values). Because the range of a signed byte is only to 127 we split the bias correction into: (x - 128)y + 127y + y Concretely for: #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, SIGNEDNESS_4 char *restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } we generate: f: ... mov z6.b, #0 mov z5.b, #127 mov z4.b, #1 mov z3.b, #-128 ptrue p1.b, all movi v0.4s, 0 .L2: ld1b z2.b, p0/z, [x1, x3] ld1b z1.b, p0/z, [x2, x3] incb x3 sel z1.b, p0, z1.b, z6.b whilelo p0.b, w3, w4 sub z1.b, z1.b, z3.b sdot z0.s, z1.b, z2.b sdot z0.s, z5.b, z2.b sdot z0.s, z4.b, z2.b b.any .L2 instead of: f: ... .L2: ld1sb z2.h, p0/z, [x1, x3] punpklo p1.h, p0.b ld1b z0.h, p0/z, [x2, x3] add x3, x3, x5 mul z0.h, p2/m, z0.h, z2.h sunpklo z2.s, z0.h sunpkhi z0.s, z0.h add z1.s, p1/m, z1.s, z2.s punpkhi p1.h, p0.b whilelo p0.h, w3, w4 add z1.s, p1/m, z1.s, z0.s b.any .L2 The new sequence is significantly faster as the operations it uses are well optimized. Note that execution tests are already in the mid-end testsuite. Thanks to James Greenhalgh for the tip-off. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@dot_prod): Generate fallback or call original isns ... (@dot_prod_insn): ...here. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/vusdot-autovec_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index bd60e65b0c3f05f1c931f03807170f3b9d699de5..ca60416e7d7b1d8848f4ec5a624ae479a12ae5bc 100644 --- diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index bd60e65b0c3f05f1c931f03807170f3b9d699de5..ca60416e7d7b1d8848f4ec5a624ae479a12ae5bc 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -6887,7 +6887,7 @@ (define_insn "@aarch64_dot_prod_lane" [(set_attr "movprfx" "*,yes")] ) -(define_insn "@dot_prod" +(define_insn "@dot_prod_insn" [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w") (plus:VNx4SI_ONLY (unspec:VNx4SI_ONLY @@ -6902,6 +6902,43 @@ (define_insn "@dot_prod" [(set_attr "movprfx" "*,yes")] ) +(define_expand "@dot_prod" + [(set (match_operand:VNx4SI_ONLY 0 "register_operand") + (plus:VNx4SI_ONLY + (unspec:VNx4SI_ONLY + [(match_operand: 1 "register_operand") + (match_operand: 2 "register_operand")] + DOTPROD_US_ONLY) + (match_operand:VNx4SI_ONLY 3 "register_operand")))] + "TARGET_SVE || TARGET_SVE_I8MM" +{ + if (TARGET_SVE_I8MM) + { + emit_insn (gen_usdot_prod_insn (operands[0], operands[1], + operands[2], operands[3])); + DONE; + } + + machine_mode elemmode = GET_MODE_INNER (mode); + HOST_WIDE_INT val = 1 << (GET_MODE_BITSIZE (elemmode).to_constant () - 1); + rtx signbit = gen_int_mode (val, elemmode); + rtx t1 = gen_reg_rtx (mode); + rtx t2 = gen_reg_rtx (mode); + rtx tmp = gen_reg_rtx (mode); + rtx c1 = gen_const_vec_duplicate (mode, + gen_int_mode (val - 1, elemmode)); + rtx c2 = gen_const_vec_duplicate (mode, gen_int_mode (1, elemmode)); + rtx dup = gen_const_vec_duplicate (mode, signbit); + c1 = force_reg (mode, c1); + c2 = force_reg (mode, c2); + dup = force_reg (mode, dup); + emit_insn (gen_sub3 (tmp, operands[1], dup)); + emit_insn (gen_sdot_prod (t1, tmp, operands[2], operands[3])); + emit_insn (gen_sdot_prod (t2, c1, operands[2], t1)); + emit_insn (gen_sdot_prod (operands[0], c2, operands[2], t2)); + DONE; +}) + (define_insn "@aarch64_dot_prod_lane" [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w") (plus:VNx4SI_ONLY diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cbe6b7eb7bef5a5c4b8e5ac823ebdf1d309f8490 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#pragma GCC target "+noi8mm" + +#define N 480 +#define SIGNEDNESS_1 unsigned +#define SIGNEDNESS_2 signed +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 unsigned + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, + SIGNEDNESS_4 char *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + int av = a[i]; + int bv = b[i]; + SIGNEDNESS_2 short mult = av * bv; + res += mult; + } + return res; +} + +/* { dg-final { scan-assembler-not {\tusdot\t} } } */ +/* { dg-final { scan-assembler-times {\tsdot\t} 3 } } */ --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -6887,7 +6887,7 @@ (define_insn "@aarch64_dot_prod_lane" [(set_attr "movprfx" "*,yes")] ) -(define_insn "@dot_prod" +(define_insn "@dot_prod_insn" [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w") (plus:VNx4SI_ONLY (unspec:VNx4SI_ONLY @@ -6902,6 +6902,43 @@ (define_insn "@dot_prod" [(set_attr "movprfx" "*,yes")] ) +(define_expand "@dot_prod" + [(set (match_operand:VNx4SI_ONLY 0 "register_operand") + (plus:VNx4SI_ONLY + (unspec:VNx4SI_ONLY + [(match_operand: 1 "register_operand") + (match_operand: 2 "register_operand")] + DOTPROD_US_ONLY) + (match_operand:VNx4SI_ONLY 3 "register_operand")))] + "TARGET_SVE || TARGET_SVE_I8MM" +{ + if (TARGET_SVE_I8MM) + { + emit_insn (gen_usdot_prod_insn (operands[0], operands[1], + operands[2], operands[3])); + DONE; + } + + machine_mode elemmode = GET_MODE_INNER (mode); + HOST_WIDE_INT val = 1 << (GET_MODE_BITSIZE (elemmode).to_constant () - 1); + rtx signbit = gen_int_mode (val, elemmode); + rtx t1 = gen_reg_rtx (mode); + rtx t2 = gen_reg_rtx (mode); + rtx tmp = gen_reg_rtx (mode); + rtx c1 = gen_const_vec_duplicate (mode, + gen_int_mode (val - 1, elemmode)); + rtx c2 = gen_const_vec_duplicate (mode, gen_int_mode (1, elemmode)); + rtx dup = gen_const_vec_duplicate (mode, signbit); + c1 = force_reg (mode, c1); + c2 = force_reg (mode, c2); + dup = force_reg (mode, dup); + emit_insn (gen_sub3 (tmp, operands[1], dup)); + emit_insn (gen_sdot_prod (t1, tmp, operands[2], operands[3])); + emit_insn (gen_sdot_prod (t2, c1, operands[2], t1)); + emit_insn (gen_sdot_prod (operands[0], c2, operands[2], t2)); + DONE; +}) + (define_insn "@aarch64_dot_prod_lane" [(set (match_operand:VNx4SI_ONLY 0 "register_operand" "=w, ?&w") (plus:VNx4SI_ONLY diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cbe6b7eb7bef5a5c4b8e5ac823ebdf1d309f8490 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vusdot-autovec_2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +#pragma GCC target "+noi8mm" + +#define N 480 +#define SIGNEDNESS_1 unsigned +#define SIGNEDNESS_2 signed +#define SIGNEDNESS_3 signed +#define SIGNEDNESS_4 unsigned + +SIGNEDNESS_1 int __attribute__ ((noipa)) +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, + SIGNEDNESS_4 char *restrict b) +{ + for (__INTPTR_TYPE__ i = 0; i < N; ++i) + { + int av = a[i]; + int bv = b[i]; + SIGNEDNESS_2 short mult = av * bv; + res += mult; + } + return res; +} + +/* { dg-final { scan-assembler-not {\tusdot\t} } } */ +/* { dg-final { scan-assembler-times {\tsdot\t} 3 } } */