From patchwork Fri Nov 10 00:45:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1862281 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=K6stdgww; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SRKr21H7tz1yQl for ; Fri, 10 Nov 2023 11:45:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0B904385828D for ; Fri, 10 Nov 2023 00:45:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by sourceware.org (Postfix) with ESMTPS id 9B8253858D33 for ; Fri, 10 Nov 2023 00:45:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B8253858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9B8253858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::d2e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699577140; cv=none; b=N4aUFBDmBITo8GJWlPr8mcKMTCgWp4SsLgICJZ2RDSeHuh86H1b1ONm8+iH2sodpo5N7olJlkvfwF2ngGc8TVkHbYGumYnvKWDCizG4Dt65ezHKqHZbNBVjsK69AkflKJUN4yg9gHii4wlLNP5lV+zZH4lc645d0ykb62EF1tDU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699577140; c=relaxed/simple; bh=KAAfCaXbb4fZewKvHxhP7vMT6FKiWHP9YbDVDtCfe00=; h=DKIM-Signature:Message-ID:Date:MIME-Version:To:From:Subject; b=CCngz329DLrGitO8+JF83mpxxDWKdFG4n9vJ3qtwpGhhAupmb4QFOzc1YHfWY71cD3fkTQq2Y2BGndPgoNdavwdaLOvBRqfifVfMdzO4FrAIJ0NllUkN7hN6U/qg331EhUCm8xLBVI4N0NwRLk1kxwc+KvQjm2T6TliwYMJrt0E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-io1-xd2e.google.com with SMTP id ca18e2360f4ac-7afff3ea94dso14707739f.3 for ; Thu, 09 Nov 2023 16:45:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699577137; x=1700181937; darn=gcc.gnu.org; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=tlRCzDsuMlyzL77bkId3B5zCdokTMdjX2mtkLBF1nVI=; b=K6stdgwwp8FIiZWtaDFOoi118AEA/kzNuVa7ZwDRP5zajbvCZVFsY0Tpl0QbNZCTmT pGdmFMhshyujmAC3CxUhhiHoWsDgQwJn0dFnDlVROfGVvOllbslM7ahD2OcO79QwgMD8 q772YF26h4dJuLhSILGAUUdEP74VO9M6dHILdw4C3DnvR1bZvMc9psWIUrEHDDhkVh6p KftR9nRXsezKZWQEnoZ3vLobRMJG7QLZM9QDoMPnu5SYxwlnHe/gcrg4/Bg8Gdu6VxjF Ou/MFbF6NAqcKXCskvFbzdcQfbAtG9f4Y+jAxTcwmW8tt30HUXMmOkiX/fKGrwCY1smM YAYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699577137; x=1700181937; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tlRCzDsuMlyzL77bkId3B5zCdokTMdjX2mtkLBF1nVI=; b=g3olptD/oGu/pEkf3Z58TScVg6RIk05clYJtwEP9RtxpX30KAID14/e7NxmdSfkpjD kimisa3erZXnjH1aXFBEUa8OyxyuYJBkjUJdGUCDSjZyUp2krFhFzQ3X18NEibrYOGgP iUFHvrPRPEOhNgO7uy3DT4muhujmqo6mQc4OlORw2biP2+MHvfkW9JnbRN6jjJ+yiCka +Cdqf+k9NaTFqvR+TByZy1nT1qbYIXap4LLQLkUbK3RE9+NswxRu+bESOMKgxMXBzsHK Plii0YLSYquG+22IO3z+aRq2T8T7xH1LMgTgikt/sHICwI1LncWfTMqnzFY8d5Lb0+kH FHIw== X-Gm-Message-State: AOJu0YwgPSjQhDVhLhP+LsP4SQIrxIMXjkOHZkvt1oLA4VNQsir4qI+X faUybUiBuhyMkNlKuSMoFfuT/cqXO5NaeQ== X-Google-Smtp-Source: AGHT+IE6tL3aiKNUIJ3VJ8pYH+4bUP+TNMYb5Gf7z+RuOhKDs/aPySkbEkJoOaoEy2HfzYp89Rhzjw== X-Received: by 2002:a5d:8605:0:b0:795:2274:1f3f with SMTP id f5-20020a5d8605000000b0079522741f3fmr3545522iol.3.1699577136617; Thu, 09 Nov 2023 16:45:36 -0800 (PST) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id t9-20020a6b5f09000000b007a951cf9bf4sm4743414iob.26.2023.11.09.16.45.35 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Nov 2023 16:45:35 -0800 (PST) Message-ID: <94983cfd-2bf5-429c-8775-737a3b08820c@gmail.com> Date: Thu, 9 Nov 2023 17:45:35 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: "gcc-patches@gcc.gnu.org" From: Jeff Law Subject: [committed] Improve single bit zero extraction on H8. X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org When zero extracting a single bit bitfield from bits 16..31 on the H8 we currently generate some pretty bad code. The fundamental issue is we can't shift efficiently and there's no trivial way to extract a single bit out of the high half word of an SImode value. What usually happens is we use a synthesized right shift to get the single bit into the desired position, then a bit-and to mask off everything we don't care about. The shifts are expensive, even using tricks like half and quarter word moves to implement shift-by-16 and shift-by-8. Additionally a logical right shift must clear out the upper bits which is redundant since we're going to mask things with &1 later. This patch provides a consistently better sequence for such extractions. The general form moves the high half into the low half, a bit extraction into C, clear the destination, then move C into the destination with a few special cases. This also avoids all the shenanigans for H8/SX which has a much more capable shifter. It's not single cycle, but it is reasonably efficient. This has been regression tested on the H8 without issues. Pushing to the trunk momentarily. jeff ps. Yes, supporting extraction of multi-bit fields might be improvable as well. But I've already spent more time on this than I can reasonably justify. commit 57dbc02d261bb833f6ef287187eb144321dd595c Author: Jeff Law Date: Thu Nov 9 17:34:01 2023 -0700 [committed] Improve single bit zero extraction on H8. When zero extracting a single bit bitfield from bits 16..31 on the H8 we currently generate some pretty bad code. The fundamental issue is we can't shift efficiently and there's no trivial way to extract a single bit out of the high half word of an SImode value. What usually happens is we use a synthesized right shift to get the single bit into the desired position, then a bit-and to mask off everything we don't care about. The shifts are expensive, even using tricks like half and quarter word moves to implement shift-by-16 and shift-by-8. Additionally a logical right shift must clear out the upper bits which is redundant since we're going to mask things with &1 later. This patch provides a consistently better sequence for such extractions. The general form moves the high half into the low half, a bit extraction into C, clear the destination, then move C into the destination with a few special cases. This also avoids all the shenanigans for H8/SX which has a much more capable shifter. It's not single cycle, but it is reasonably efficient. This has been regression tested on the H8 without issues. Pushing to the trunk momentarily. jeff ps. Yes, supporting zero extraction of multi-bit fields might be improvable as well. But I've already spent more time on this than I can reasonably justify. gcc/ * config/h8300/combiner.md (single bit sign_extract): Avoid recently added patterns for H8/SX. (single bit zero_extract): New patterns. diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md index 2f7faf77c93..e1179b5fea6 100644 --- a/gcc/config/h8300/combiner.md +++ b/gcc/config/h8300/combiner.md @@ -1278,7 +1278,7 @@ (define_insn_and_split "" (sign_extract:SI (match_operand:QHSI 1 "register_operand" "0") (const_int 1) (match_operand 2 "immediate_operand")))] - "" + "!TARGET_H8300SX" "#" "&& reload_completed" [(parallel [(set (match_dup 0) @@ -1291,7 +1291,7 @@ (define_insn "" (const_int 1) (match_operand 2 "immediate_operand"))) (clobber (reg:CC CC_REG))] - "" + "!TARGET_H8300SX" { int position = INTVAL (operands[2]); @@ -1359,3 +1359,69 @@ (define_insn "" return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0"; } [(set_attr "length" "10")]) + +;; For shift counts >= 16 we can always do better than the +;; generic sequences. Other patterns handle smaller counts. +(define_insn_and_split "" + [(set (match_operand:SI 0 "register_operand" "=r") + (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0") + (match_operand 2 "immediate_operand" "n")) + (const_int 1)))] + "!TARGET_H8300SX && INTVAL (operands[2]) >= 16" + "#" + "&& reload_completed" + [(parallel [(set (match_dup 0) (and:SI (lshiftrt:SI (match_dup 0) (match_dup 2)) + (const_int 1))) + (clobber (reg:CC CC_REG))])]) + +(define_insn "" + [(set (match_operand:SI 0 "register_operand" "=r") + (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0") + (match_operand 2 "immediate_operand" "n")) + (const_int 1))) + (clobber (reg:CC CC_REG))] + "!TARGET_H8300SX && INTVAL (operands[2]) >= 16" +{ + int position = INTVAL (operands[2]); + + /* If the bit we want is the highest bit we can just rotate it into position + and mask off everything else. */ + if (position == 31) + { + output_asm_insn ("rotl.l\t%0", operands); + return "and.l\t#1,%0"; + } + + /* Special case for H8/S. Similar to bit 31. */ + if (position == 30 && TARGET_H8300S) + return "rotl.l\t#2,%0\;and.l\t#1,%0"; + + if (position <= 30 && position >= 17) + { + /* Shift 16 bits, without worrying about extensions. */ + output_asm_insn ("mov.w\t%e1,%f0", operands); + + /* Get the bit we want into C. */ + operands[2] = GEN_INT (position % 8); + if (position >= 24) + output_asm_insn ("bld\t%2,%t0", operands); + else + output_asm_insn ("bld\t%2,%s0", operands); + + /* xor + rotate to clear the destination, then rotate + the C into position. */ + return "xor.l\t%0,%0\;rotxl.l\t%0"; + } + + if (position == 16) + { + /* Shift 16 bits, without worrying about extensions. */ + output_asm_insn ("mov.w\t%e1,%f0", operands); + + /* And finally, mask out everything we don't want. */ + return "and.l\t#1,%0"; + } + + gcc_unreachable (); +} + [(set_attr "length" "10")])