From patchwork Tue Mar 5 11:15:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Georg-Johann Lay X-Patchwork-Id: 1908142 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gjlay.de header.i=@gjlay.de header.a=rsa-sha256 header.s=strato-dkim-0002 header.b=N4IXlPDi; dkim=pass header.d=gjlay.de header.i=@gjlay.de header.a=ed25519-sha256 header.s=strato-dkim-0003 header.b=gTg00n0q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TptKc582wz23cm for ; Tue, 5 Mar 2024 22:16:03 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F467385843E for ; Tue, 5 Mar 2024 11:16:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [81.169.146.219]) by sourceware.org (Postfix) with ESMTPS id C85573858C41 for ; Tue, 5 Mar 2024 11:15:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C85573858C41 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gjlay.de Authentication-Results: sourceware.org; spf=none smtp.mailfrom=gjlay.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C85573858C41 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=81.169.146.219 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1709637340; cv=pass; b=dpFWzHDnJVT63ABIKFgR2MnC1HeAUZMwtEm7EOEJFeIhC1pkkGmplD7gRCEc1MFREBPBV+AiSgNUQ3ruc3ayrQDjZAYTC37J4aTxAxqxNDYv9RS/J1HtHaaBTgJN7IwPi/zRGaheJncJskCOT48vZLQ6Xsw6tlbxtua3GuWWczo= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1709637340; c=relaxed/simple; bh=bOv+NV2AQTs9hy2HRHgNARUiEiHXnIMwSLnbY7UT+BM=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version: Subject:To:From; b=e8oDmZEcQF+rF+nKI952BPLKAi4sUrvmr7vTVDBboV7A0eaNX3idZDtxsgzO4odXta5b1YSEBT7nJp5ieh/HvWWli/YBvDjBWTsk/QBx5BFz76s3pEKdixXdvOXdPB/kbIJP5v2UmN2RxwkWQ9Fb4f9E1sOJ/B+nsVzkVNzgS2k= ARC-Authentication-Results: i=2; server2.sourceware.org ARC-Seal: i=1; a=rsa-sha256; t=1709637335; cv=none; d=strato.com; s=strato-dkim-0002; b=sOURHYQUh8fnNzgiOF69RuTWQZt+TVjvw2yI+8Lp4wA5bQDRgPg5P0gDQPMd8oQFSV 903fB5PgNhcR3VnzYgDdPzbykHKo8RBjCoJJNkHxQ4iOGWo7cJgwqrkM1nbzeO+FMHj/ oPqG3+yuypTm6EQJK7Ao336XrKbBTpA033K64re7hPyEff/5CTlVXeZa9xVnUgvLIP5N CxQ7CaUQb23scu/ecj/s9znBt036S4iodsCsMG7hxLCiMMad7Z3/L0cTr7BYa9WeOae0 937qxw3LXRN9hrNK+C1PIwWb/qWAzh73kP0VqN8WOhBwapBfOxyswoukGi0xyX0AfZOH oIZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1709637335; s=strato-dkim-0002; d=strato.com; h=In-Reply-To:From:References:To:Subject:Date:Message-ID:Cc:Date:From: Subject:Sender; bh=NdNKQGw0C2JcqwZNCz5uDTelJT0wXb26gtJhoUMPkLg=; b=Yp1DA6vu8hKpDU53yuy+pwUNh6mpZSaVt/OCCjVHX+DioB7NJ7ZA7/56qXzBVXJNtA ufsIY66LO5VErLvanX2ML/vIcbsGQnqg5qa328l1DAHHU0eFZARG/FsRLe8YbDzySPck Ytedks8nGZtqAlVnKaSa0rxSNU5rXYa0r419fjUtw4SVjaaEYqW/tPnMbquJXvnXqREK httKEdS1VTEg4+X7jQ/FEa2lO0jtUAETpIH9ipNOJ2Uy+o6DIVUnwUyoYjSfZkyPil2l MrWlCh9OAZqXUzzaVx0aS5wykcAjPQW0P1yXKIzrVCZGGV5wVEoZJ7TJZSUp8V4T6pFc VW2Q== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1709637335; s=strato-dkim-0002; d=gjlay.de; h=In-Reply-To:From:References:To:Subject:Date:Message-ID:Cc:Date:From: Subject:Sender; bh=NdNKQGw0C2JcqwZNCz5uDTelJT0wXb26gtJhoUMPkLg=; b=N4IXlPDil0wofQWESfRawAfhEjKf4bWeF9DwLTjREvBhihTd0Fs7hBEeN3u9FtkHyZ TPsa0ABXx8Fpp2Jah0ET/g86NKPOmxXr1kUGJAXpCFzDEAy5iMqowh7PK3PUCz2FnRX0 UMLXg3tNMVe/2l4kh/WZWEX2t2bgW8mFPwAmuY4a5Kw0sn8UO4CXWUs0QDV5MTP/1yBW zmfghioxGkqV+yf+3BxIxEM9v/h8sL8UMMykQNfuoPR6NWEdYZ2bic1NLQ8NJlYAky2V HueM7THUhzqFUf8irP8iIIfPc722eaNgbnsegGpmKOvq3nZus3hAv9QLzLN+jmQVQ8pQ m/Tw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1709637335; s=strato-dkim-0003; d=gjlay.de; h=In-Reply-To:From:References:To:Subject:Date:Message-ID:Cc:Date:From: Subject:Sender; bh=NdNKQGw0C2JcqwZNCz5uDTelJT0wXb26gtJhoUMPkLg=; b=gTg00n0qih70N4DlplpZ4g1v0fn5XRVL8RtiMYkt8yWy5/0z93KpchlUcOu1Jd38iv ybaaZfDRpC0+PSIhCKCQ== X-RZG-AUTH: ":LXoWVUeid/7A29J/hMvvT3koxZnKT7Qq0xotTetVnKkSgcSjpmy9KwoMJ/K0VA==" Received: from [192.168.2.102] by smtp.strato.de (RZmta 50.2.0 DYNA|AUTH) with ESMTPSA id Jdfd10025BFYMfq (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Tue, 5 Mar 2024 12:15:34 +0100 (CET) Message-ID: Date: Tue, 5 Mar 2024 12:15:33 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [patch,avr,applied] Improve output of insn "*insv.any_shift.". Content-Language: en-US To: Roger Sayle , gcc-patches@gcc.gnu.org References: <025901da0d82$cd5aa3f0$680febd0$@nextmovesoftware.com> From: Georg-Johann Lay In-Reply-To: <025901da0d82$cd5aa3f0$680febd0$@nextmovesoftware.com> X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Applied Roger's proposed improvements with some changes: Lengthy code is more convenient in avr.cc than in an insn output function, and it makes it easy to work out the exact instruction length. Moreover, the code can handle shifts with offset zero (cases of *and3 insns). Passed with no new regressions on ATmega128. Applied as https://gcc.gnu.org/r14-9317 Johann --- AVR: Improve output of insn "*insv.any_shift._split". The instructions printed by insn "*insv.any_shift._split" were sub-optimal. The code to print the improved output is lengthy and performed by new function avr_out_insv. As it turns out, the function can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3" and "*andsi3". Thus, these tree insns get a new 3-operand alternative where the 3rd operand is an exact power of 2. gcc/ * config/avr/avr-protos.h (avr_out_insv): New proto. * config/avr/avr.cc (avr_out_insv): New function. (avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case. (avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs. * config/avr/avr.md (define_attr "adjust_len") Add insv. (andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3): Add constraint alternative where the 3rd operand is a power of 2, and the source register may differ from the destination. (*insv.any_shift._split): Call avr_out_insv to output instructions. Set attr "length" to "insv". * config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints. gcc/testsuite/ * gcc.target/avr/torture/insv-anyshift-hi.c: New test. * gcc.target/avr/torture/insv-anyshift-si.c: New test. commit 49a1a340ea0eef681f23b6861f3cdb6840aadd99 Author: Roger Sayle Date: Tue Mar 5 11:06:17 2024 +0100 AVR: Improve output of insn "*insv.any_shift._split". The instructions printed by insn "*insv.any_shift._split" were sub-optimal. The code to print the improved output is lengthy and performed by new function avr_out_insv. As it turns out, the function can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3" and "*andsi3". Thus, these tree insns get a new 3-operand alternative where the 3rd operand is an exact power of 2. gcc/ * config/avr/avr-protos.h (avr_out_insv): New proto. * config/avr/avr.cc (avr_out_insv): New function. (avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case. (avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs. * config/avr/avr.md (define_attr "adjust_len") Add insv. (andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3): Add constraint alternative where the 3rd operand is a power of 2, and the source register may differ from the destination. (*insv.any_shift._split): Call avr_out_insv to output instructions. Set attr "length" to "insv". * config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints. gcc/testsuite/ * gcc.target/avr/torture/insv-anyshift-hi.c: New test. * gcc.target/avr/torture/insv-anyshift-si.c: New test. diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h index 3e19409d636..bb680312117 100644 --- a/gcc/config/avr/avr-protos.h +++ b/gcc/config/avr/avr-protos.h @@ -58,6 +58,7 @@ extern const char *ret_cond_branch (rtx x, int len, int reverse); extern const char *avr_out_movpsi (rtx_insn *, rtx*, int*); extern const char *avr_out_sign_extend (rtx_insn *, rtx*, int*); extern const char *avr_out_insert_notbit (rtx_insn *, rtx*, int*); +extern const char *avr_out_insv (rtx_insn *, rtx*, int*); extern const char *avr_out_extr (rtx_insn *, rtx*, int*); extern const char *avr_out_extr_not (rtx_insn *, rtx*, int*); extern const char *avr_out_plus_set_ZN (rtx*, int*); diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index c8b2b504e3f..36995e05cbe 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -9795,6 +9795,178 @@ avr_out_insert_notbit (rtx_insn *insn, rtx op[], int *plen) } +/* Output instructions for XOP[0] = (XOP[1] XOP[2]) & XOP[3] where + - XOP[0] and XOP[1] have the same mode which is one of: QI, HI, PSI, SI. + - XOP[3] is an exact const_int power of 2. + - XOP[2] and XOP[3] are const_int. + - is any of: ASHIFT, LSHIFTRT, ASHIFTRT. + - The result depends on XOP[1]. + or XOP[0] = XOP[1] & XOP[2] where + - XOP[0] and XOP[1] have the same mode which is one of: HI, PSI, SI. + - XOP[2] is an exact const_int power of 2. + Returns "". + PLEN != 0: Set *PLEN to the code length in words. Don't output anything. + PLEN == 0: Output instructions. */ + +const char* +avr_out_insv (rtx_insn *insn, rtx xop[], int *plen) +{ + machine_mode mode = GET_MODE (xop[0]); + int n_bytes = GET_MODE_SIZE (mode); + rtx xsrc = SET_SRC (single_set (insn)); + + gcc_assert (AND == GET_CODE (xsrc)); + + rtx xop2 = xop[2]; + rtx xop3 = xop[3]; + + if (REG_P (XEXP (xsrc, 0))) + { + // This function can also handle AND with an exact power of 2, + // which can be regarded as a XOP[1] shift with offset 0. + rtx xshift = gen_rtx_ASHIFT (mode, xop[1], const0_rtx); + xsrc = gen_rtx_AND (mode, xshift, xop[2]); + xop3 = xop[2]; + xop2 = const0_rtx; + } + + // Any of ASHIFT, LSHIFTRT, ASHIFTRT. + enum rtx_code code = GET_CODE (XEXP (xsrc, 0)); + int shift = code == ASHIFT ? INTVAL (xop2) : -INTVAL (xop2); + + // Determines the position of the output bit. + unsigned mask = GET_MODE_MASK (mode) & INTVAL (xop3); + + // Position of the output / input bit, respectively. + int obit = exact_log2 (mask); + int ibit = obit - shift; + + gcc_assert (IN_RANGE (obit, 0, GET_MODE_BITSIZE (mode) - 1)); + gcc_assert (IN_RANGE (ibit, 0, GET_MODE_BITSIZE (mode) - 1)); + + // In the remainder, use the sub-bytes that hold the bits. + rtx op[4] = + { + // Output + simplify_gen_subreg (QImode, xop[0], mode, obit / 8), + GEN_INT (obit & 7), + // Input + simplify_gen_subreg (QImode, xop[1], mode, ibit / 8), + GEN_INT (ibit & 7) + }; + obit &= 7; + ibit &= 7; + + // The length of the default sequence at the end of this function. + // We only emit anything other than the default when we find a sequence + // that is strictly shorter than the default sequence; which is: + // BST + + BLD. + const int len0 = 2 + n_bytes - (n_bytes == 4 && AVR_HAVE_MOVW); + + // Finding something shorter than the default sequence implies that there + // must be at most 2 instructions that deal with the bytes containing the + // relevant bits. In addition, we need N_BYTES - 1 instructions to clear + // the remaining result bytes. + + const int n_clr = n_bytes - 1; + bool clr_p = false; + bool andi_p = false; + + if (plen) + *plen = 0; + + if (REGNO (op[0]) == REGNO (op[2]) + // Output reg allows ANDI. + && test_hard_reg_class (LD_REGS, op[0])) + { + if (1 + n_clr < len0 + // Same byte and bit: A single ANDI will do. + && obit == ibit) + { + clr_p = andi_p = true; + } + else if (2 + n_clr < len0 + // |obit - ibit| = 4: SWAP + ANDI will do. + && (obit == ibit + 4 || obit == ibit - 4)) + { + avr_asm_len ("swap %0", op, plen, 1); + clr_p = andi_p = true; + } + else if (2 + n_clr < len0 + // LSL + ANDI will do. + && obit == ibit + 1) + { + avr_asm_len ("lsl %0", op, plen, 1); + clr_p = andi_p = true; + } + else if (2 + n_clr < len0 + // LSR + ANDI will do. + && obit == ibit - 1) + { + avr_asm_len ("lsr %0", op, plen, 1); + clr_p = andi_p = true; + } + } + + if (REGNO (op[0]) != REGNO (op[2]) + && obit == ibit) + { + if (2 + n_clr < len0 + // Same bit but different byte: MOV + ANDI will do. + && test_hard_reg_class (LD_REGS, op[0])) + { + avr_asm_len ("mov %0,%2", op, plen, 1); + clr_p = andi_p = true; + } + else if (2 + n_clr < len0 + // Same bit but different byte: We can use ANDI + MOV, + // but only if the input byte is LD_REGS and unused after. + && test_hard_reg_class (LD_REGS, op[2]) + && reg_unused_after (insn, op[2])) + { + avr_asm_len ("andi %2,1<<%3" CR_TAB + "mov %0,%2", op, plen, 2); + clr_p = true; + } + } + + // Output remaining instructions of the shorter sequence. + + if (andi_p) + avr_asm_len ("andi %0,1<<%1", op, plen, 1); + + if (clr_p) + { + for (int b = 0; b < n_bytes; ++b) + { + rtx byte = simplify_gen_subreg (QImode, xop[0], mode, b); + if (REGNO (byte) != REGNO (op[0])) + avr_asm_len ("clr %0", &byte, plen, 1); + } + + // CLR_P means we found a shorter sequence, so we are done now. + return ""; + } + + // No shorter sequence found, just emit BST, CLR*, BLD sequence. + + avr_asm_len ("bst %2,%3", op, plen, -1); + + if (n_bytes == 4 && AVR_HAVE_MOVW) + avr_asm_len ("clr %A0" CR_TAB + "clr %B0" CR_TAB + "movw %C0,%A0", xop, plen, 3); + else + for (int b = 0; b < n_bytes; ++b) + { + rtx byte = simplify_gen_subreg (QImode, xop[0], mode, b); + avr_asm_len ("clr %0", &byte, plen, 1); + } + + return avr_asm_len ("bld %0,%1", op, plen, 1); +} + + /* Output instructions to extract a bit to 8-bit register XOP[0]. The input XOP[1] is a register or an 8-bit MEM in the lower I/O range. XOP[2] is the const_int bit position. Return "". @@ -10721,6 +10893,7 @@ avr_adjust_insn_length (rtx_insn *insn, int len) case ADJUST_LEN_OUT_BITOP: avr_out_bitop (insn, op, &len); break; case ADJUST_LEN_EXTR_NOT: avr_out_extr_not (insn, op, &len); break; case ADJUST_LEN_EXTR: avr_out_extr (insn, op, &len); break; + case ADJUST_LEN_INSV: avr_out_insv (insn, op, &len); break; case ADJUST_LEN_PLUS: avr_out_plus (insn, op, &len); break; case ADJUST_LEN_ADDTO_SP: avr_out_addto_sp (op, &len); break; @@ -12206,6 +12379,14 @@ avr_cbranch_cost (rtx x) return COSTS_N_INSNS (size + 1 + 1); } + if (GET_CODE (xreg) == ZERO_EXTRACT + && XEXP (xreg, 1) == const1_rtx) + { + // Branch on a single bit, with an additional edge due to less + // register pressure. + return (int) COSTS_N_INSNS (1.5); + } + bool reg_p = register_operand (xreg, mode); bool reg_or_0_p = reg_or_0_operand (xval, mode); diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index 6606837b5f7..6bdf4682fab 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -170,7 +170,7 @@ (define_attr "adjust_len" ashlhi, ashrhi, lshrhi, ashlsi, ashrsi, lshrsi, ashlpsi, ashrpsi, lshrpsi, - insert_bits, insv_notbit, + insert_bits, insv_notbit, insv, add_set_ZN, cmp_uext, cmp_sext, no" (const_string "no")) @@ -4380,10 +4380,10 @@ (define_insn "*andqi3" [(set_attr "length" "1,1,2")]) (define_insn_and_split "andhi3" - [(set (match_operand:HI 0 "register_operand" "=??r,d,d,r ,r") - (and:HI (match_operand:HI 1 "register_operand" "%0,0,0,0 ,0") - (match_operand:HI 2 "nonmemory_operand" "r,s,n,Ca2,n"))) - (clobber (match_scratch:QI 3 "=X,X,X,X ,&d"))] + [(set (match_operand:HI 0 "register_operand" "=??r,d,d,r ,r ,r") + (and:HI (match_operand:HI 1 "register_operand" "%0,0,0,0 ,r ,0") + (match_operand:HI 2 "nonmemory_operand" "r,s,n,Ca2,Cb2,n"))) + (clobber (match_scratch:QI 3 "=X,X,X,X ,X ,&d"))] "" "#" "&& reload_completed" @@ -4394,10 +4394,10 @@ (define_insn_and_split "andhi3" (clobber (reg:CC REG_CC))])]) (define_insn "*andhi3" - [(set (match_operand:HI 0 "register_operand" "=??r,d,d,r ,r") - (and:HI (match_operand:HI 1 "register_operand" "%0,0,0,0 ,0") - (match_operand:HI 2 "nonmemory_operand" "r,s,n,Ca2,n"))) - (clobber (match_scratch:QI 3 "=X,X,X,X ,&d")) + [(set (match_operand:HI 0 "register_operand" "=??r,d,d,r ,r ,r") + (and:HI (match_operand:HI 1 "register_operand" "%0,0,0,0 ,r ,0") + (match_operand:HI 2 "nonmemory_operand" "r,s,n,Ca2,Cb2,n"))) + (clobber (match_scratch:QI 3 "=X,X,X,X ,X ,&d")) (clobber (reg:CC REG_CC))] "reload_completed" { @@ -4405,17 +4405,19 @@ (define_insn "*andhi3" return "and %A0,%A2\;and %B0,%B2"; else if (which_alternative == 1) return "andi %A0,lo8(%2)\;andi %B0,hi8(%2)"; + else if (which_alternative == 4) + return avr_out_insv (insn, operands, NULL); return avr_out_bitop (insn, operands, NULL); } - [(set_attr "length" "2,2,2,4,4") - (set_attr "adjust_len" "*,*,out_bitop,out_bitop,out_bitop")]) + [(set_attr "length" "2,2,2,4,4,4") + (set_attr "adjust_len" "*,*,out_bitop,out_bitop,insv,out_bitop")]) (define_insn_and_split "andpsi3" - [(set (match_operand:PSI 0 "register_operand" "=??r,d,r ,r") - (and:PSI (match_operand:PSI 1 "register_operand" "%0,0,0 ,0") - (match_operand:PSI 2 "nonmemory_operand" "r,n,Ca3,n"))) - (clobber (match_scratch:QI 3 "=X,X,X ,&d"))] + [(set (match_operand:PSI 0 "register_operand" "=??r,d,r ,r ,r") + (and:PSI (match_operand:PSI 1 "register_operand" "%0,0,0 ,r ,0") + (match_operand:PSI 2 "nonmemory_operand" "r,n,Ca3,Cb3,n"))) + (clobber (match_scratch:QI 3 "=X,X,X ,X ,&d"))] "" "#" "&& reload_completed" @@ -4426,10 +4428,10 @@ (define_insn_and_split "andpsi3" (clobber (reg:CC REG_CC))])]) (define_insn "*andpsi3" - [(set (match_operand:PSI 0 "register_operand" "=??r,d,r ,r") - (and:PSI (match_operand:PSI 1 "register_operand" "%0,0,0 ,0") - (match_operand:PSI 2 "nonmemory_operand" "r,n,Ca3,n"))) - (clobber (match_scratch:QI 3 "=X,X,X ,&d")) + [(set (match_operand:PSI 0 "register_operand" "=??r,d,r ,r ,r") + (and:PSI (match_operand:PSI 1 "register_operand" "%0,0,0 ,r ,0") + (match_operand:PSI 2 "nonmemory_operand" "r,n,Ca3,Cb3,n"))) + (clobber (match_scratch:QI 3 "=X,X,X ,X ,&d")) (clobber (reg:CC REG_CC))] "reload_completed" { @@ -4438,16 +4440,19 @@ (define_insn "*andpsi3" "and %B0,%B2" CR_TAB "and %C0,%C2"; + if (which_alternative == 3) + return avr_out_insv (insn, operands, NULL); + return avr_out_bitop (insn, operands, NULL); } - [(set_attr "length" "3,3,6,6") - (set_attr "adjust_len" "*,out_bitop,out_bitop,out_bitop")]) + [(set_attr "length" "3,3,6,5,6") + (set_attr "adjust_len" "*,out_bitop,out_bitop,insv,out_bitop")]) (define_insn_and_split "andsi3" - [(set (match_operand:SI 0 "register_operand" "=??r,d,r ,r") - (and:SI (match_operand:SI 1 "register_operand" "%0,0,0 ,0") - (match_operand:SI 2 "nonmemory_operand" "r,n,Ca4,n"))) - (clobber (match_scratch:QI 3 "=X,X,X ,&d"))] + [(set (match_operand:SI 0 "register_operand" "=??r,d,r ,r ,r") + (and:SI (match_operand:SI 1 "register_operand" "%0,0,0 ,r ,0") + (match_operand:SI 2 "nonmemory_operand" "r,n,Ca4,Cb4,n"))) + (clobber (match_scratch:QI 3 "=X,X,X ,X ,&d"))] "" "#" "&& reload_completed" @@ -4458,10 +4463,10 @@ (define_insn_and_split "andsi3" (clobber (reg:CC REG_CC))])]) (define_insn "*andsi3" - [(set (match_operand:SI 0 "register_operand" "=??r,d,r ,r") - (and:SI (match_operand:SI 1 "register_operand" "%0,0,0 ,0") - (match_operand:SI 2 "nonmemory_operand" "r,n,Ca4,n"))) - (clobber (match_scratch:QI 3 "=X,X,X ,&d")) + [(set (match_operand:SI 0 "register_operand" "=??r,d,r ,r ,r") + (and:SI (match_operand:SI 1 "register_operand" "%0,0,0 ,r ,0") + (match_operand:SI 2 "nonmemory_operand" "r,n,Ca4,Cb4,n"))) + (clobber (match_scratch:QI 3 "=X,X,X ,X ,&d")) (clobber (reg:CC REG_CC))] "reload_completed" { @@ -4471,10 +4476,13 @@ (define_insn "*andsi3" "and %C0,%C2" CR_TAB "and %D0,%D2"; + if (which_alternative == 3) + return avr_out_insv (insn, operands, NULL); + return avr_out_bitop (insn, operands, NULL); } - [(set_attr "length" "4,4,8,8") - (set_attr "adjust_len" "*,out_bitop,out_bitop,out_bitop")]) + [(set_attr "length" "4,4,8,6,8") + (set_attr "adjust_len" "*,out_bitop,out_bitop,insv,out_bitop")]) (define_peephole2 ; andi [(parallel [(set (match_operand:QI 0 "d_register_operand" "") @@ -9852,6 +9860,12 @@ (define_insn_and_split "*extzv.io.lsr7" (const_int 1) (const_int 7)))]) +;; This insn serves as a combine bridge because insn combine will only +;; combine so much (3) insns at most. It's not actually an open coded +;; bit-insertion but just a part of it. It may occur in other contexts +;; than INSV though, and in such a case the code may be worse than without +;; this pattern. We still have to emit code for it in that case because +;; we cannot roll back. (define_insn_and_split "*insv.any_shift._split" [(set (match_operand:QISI 0 "register_operand" "=r") (and:QISI (any_shift:QISI (match_operand:QISI 1 "register_operand" "r") @@ -9874,27 +9888,9 @@ (define_insn "*insv.any_shift." (clobber (reg:CC REG_CC))] "reload_completed" { - int shift = == ASHIFT ? INTVAL (operands[2]) : -INTVAL (operands[2]); - int mask = GET_MODE_MASK (mode) & INTVAL (operands[3]); - // Position of the output / input bit, respectively. - int obit = exact_log2 (mask); - int ibit = obit - shift; - gcc_assert (IN_RANGE (obit, 0, )); - gcc_assert (IN_RANGE (ibit, 0, )); - operands[3] = GEN_INT (obit); - operands[2] = GEN_INT (ibit); - - if ( == 1) return "bst %T1%T2\;clr %0\;" "bld %T0%T3"; - if ( == 2) return "bst %T1%T2\;clr %A0\;clr %B0\;" "bld %T0%T3"; - if ( == 3) return "bst %T1%T2\;clr %A0\;clr %B0\;clr %C0\;bld %T0%T3"; - return AVR_HAVE_MOVW - ? "bst %T1%T2\;clr %A0\;clr %B0\;movw %C0,%A0\;" "bld %T0%T3" - : "bst %T1%T2\;clr %A0\;clr %B0\;clr %C0\;clr %D0\;bld %T0%T3"; + return avr_out_insv (insn, operands, nullptr); } - [(set (attr "length") - (minus (symbol_ref "2 + ") - ; One less if we can use a MOVW to clear. - (symbol_ref " == 4 && AVR_HAVE_MOVW")))]) + [(set_attr "adjust_len" "insv")]) (define_insn_and_split "*extzv.hi2" diff --git a/gcc/config/avr/constraints.md b/gcc/config/avr/constraints.md index 81ed63db2cc..fac54da17db 100644 --- a/gcc/config/avr/constraints.md +++ b/gcc/config/avr/constraints.md @@ -188,6 +188,21 @@ (define_constraint "Co4" (and (match_code "const_int") (match_test "avr_popcount_each_byte (op, 4, (1<<0) | (1<<1) | (1<<8))"))) +(define_constraint "Cb2" + "Constant 2-byte integer that has exactly 1 bit set." + (and (match_code "const_int") + (match_test "single_one_operand (op, HImode)"))) + +(define_constraint "Cb3" + "Constant 3-byte integer that has exactly 1 bit set." + (and (match_code "const_int") + (match_test "single_one_operand (op, PSImode)"))) + +(define_constraint "Cb4" + "Constant 4-byte integer that has exactly 1 bit set." + (and (match_code "const_int") + (match_test "single_one_operand (op, SImode)"))) + (define_constraint "Cx2" "Constant 2-byte integer that allows XOR without clobber register." (and (match_code "const_int") diff --git a/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-hi.c b/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-hi.c new file mode 100644 index 00000000000..7ee5c04813a --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-hi.c @@ -0,0 +1,141 @@ +/* { dg-do run } */ +/* { dg-additional-options { -fno-split-wide-types } } */ + +typedef __UINT16_TYPE__ uint16_t; + +/* Testing inlined and completely folded versions of functions + against their non-inlined, non-folded counnterparts. */ + +#define MK_FUN1(OBIT, LSR) \ + static __inline__ __attribute__((__always_inline__)) \ + uint16_t fun1_lsr_##OBIT##_##LSR##_ai (int x, uint16_t a) \ + { \ + (void) x; \ + return (a >> LSR) & (1u << OBIT); \ + } \ + \ + __attribute__((__noinline__,__noclone__)) \ + uint16_t fun1_lsr_##OBIT##_##LSR##_ni (int x, uint16_t a) \ + { \ + return fun1_lsr_##OBIT##_##LSR##_ai (x, a); \ + } \ + \ + void test_fun1_lsr_##OBIT##_##LSR (void) \ + { \ + if (fun1_lsr_##OBIT##_##LSR##_ni (0, 1u << (OBIT + LSR)) \ + != fun1_lsr_##OBIT##_##LSR##_ai (0, 1u << (OBIT + LSR))) \ + __builtin_abort(); \ + \ + if (fun1_lsr_##OBIT##_##LSR##_ni (0, 1u << (OBIT + LSR)) \ + != fun1_lsr_##OBIT##_##LSR##_ai (0, -1u)) \ + __builtin_abort(); \ + } + +#define MK_FUN3(OBIT, LSR) \ + static __inline__ __attribute__((__always_inline__)) \ + uint16_t fun3_lsr_##OBIT##_##LSR##_ai (uint16_t a) \ + { \ + return (a >> LSR) & (1u << OBIT); \ + } \ + \ + __attribute__((__noinline__,__noclone__)) \ + uint16_t fun3_lsr_##OBIT##_##LSR##_ni (uint16_t a) \ + { \ + return fun3_lsr_##OBIT##_##LSR##_ai (a); \ + } \ + \ + void test_fun3_lsr_##OBIT##_##LSR (void) \ + { \ + if (fun3_lsr_##OBIT##_##LSR##_ni (1u << (OBIT + LSR)) \ + != fun3_lsr_##OBIT##_##LSR##_ai (1u << (OBIT + LSR))) \ + __builtin_abort(); \ + \ + if (fun3_lsr_##OBIT##_##LSR##_ni (1u << (OBIT + LSR)) \ + != fun3_lsr_##OBIT##_##LSR##_ai (-1u)) \ + __builtin_abort(); \ + } + + +#define MK_FUN2(OBIT, LSL) \ + static __inline__ __attribute__((__always_inline__)) \ + uint16_t fun2_lsl_##OBIT##_##LSL##_ai (uint16_t a) \ + { \ + return (a << LSL) & (1u << OBIT); \ + } \ + \ + __attribute__((__noinline__,__noclone__)) \ + uint16_t fun2_lsl_##OBIT##_##LSL##_ni (uint16_t a) \ + { \ + return fun2_lsl_##OBIT##_##LSL##_ai (a); \ + } \ + \ + void test_fun2_lsl_##OBIT##_##LSL (void) \ + { \ + if (fun2_lsl_##OBIT##_##LSL##_ni (1u << (OBIT - LSL)) \ + != fun2_lsl_##OBIT##_##LSL##_ai (1u << (OBIT - LSL))) \ + __builtin_abort(); \ + \ + if (fun2_lsl_##OBIT##_##LSL##_ni (1u << (OBIT - LSL)) \ + != fun2_lsl_##OBIT##_##LSL##_ai (-1u)) \ + __builtin_abort(); \ + } + + +MK_FUN1 (10, 4) +MK_FUN1 (6, 1) +MK_FUN1 (1, 5) +MK_FUN1 (0, 8) +MK_FUN1 (0, 4) +MK_FUN1 (0, 1) +MK_FUN1 (0, 0) + +MK_FUN3 (10, 4) +MK_FUN3 (6, 1) +MK_FUN3 (1, 5) +MK_FUN3 (0, 8) +MK_FUN3 (0, 4) +MK_FUN3 (0, 1) +MK_FUN3 (0, 0) + +MK_FUN2 (12, 8) +MK_FUN2 (15, 15) +MK_FUN2 (14, 12) +MK_FUN2 (8, 8) +MK_FUN2 (7, 4) +MK_FUN2 (5, 4) +MK_FUN2 (5, 1) +MK_FUN2 (4, 0) +MK_FUN2 (1, 0) +MK_FUN2 (0, 0) + +int main (void) +{ + test_fun1_lsr_10_4 (); + test_fun1_lsr_6_1 (); + test_fun1_lsr_1_5 (); + test_fun1_lsr_0_8 (); + test_fun1_lsr_0_4 (); + test_fun1_lsr_0_1 (); + test_fun1_lsr_0_0 (); + + test_fun3_lsr_10_4 (); + test_fun3_lsr_6_1 (); + test_fun3_lsr_1_5 (); + test_fun3_lsr_0_8 (); + test_fun3_lsr_0_4 (); + test_fun3_lsr_0_1 (); + test_fun3_lsr_0_0 (); + + test_fun2_lsl_12_8 (); + test_fun2_lsl_15_15 (); + test_fun2_lsl_14_12 (); + test_fun2_lsl_8_8 (); + test_fun2_lsl_7_4 (); + test_fun2_lsl_5_4 (); + test_fun2_lsl_5_1 (); + test_fun2_lsl_4_0 (); + test_fun2_lsl_1_0 (); + test_fun2_lsl_0_0 (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-si.c b/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-si.c new file mode 100644 index 00000000000..f52593cf0a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/insv-anyshift-si.c @@ -0,0 +1,89 @@ +/* { dg-do run } */ + +typedef __UINT32_TYPE__ uint32_t; + +/* Testing inlined and completely folded versions of functions + against their non-inlined, non-folded counnterparts. */ + +#define MK_FUN1(OBIT, LSR) \ + static __inline__ __attribute__((__always_inline__)) \ + uint32_t fun1_lsr_##OBIT##_##LSR##_ai (int x, uint32_t a) \ + { \ + (void) x; \ + return (a >> LSR) & (1ul << OBIT); \ + } \ + \ + __attribute__((__noinline__,__noclone__)) \ + uint32_t fun1_lsr_##OBIT##_##LSR##_ni (int x, uint32_t a) \ + { \ + return fun1_lsr_##OBIT##_##LSR##_ai (x, a); \ + } \ + \ + void test_fun1_lsr_##OBIT##_##LSR (void) \ + { \ + if (fun1_lsr_##OBIT##_##LSR##_ni (0, 1ul << (OBIT + LSR)) \ + != fun1_lsr_##OBIT##_##LSR##_ai (0, 1ul << (OBIT + LSR))) \ + __builtin_abort(); \ + \ + if (fun1_lsr_##OBIT##_##LSR##_ni (0, 1ul << (OBIT + LSR)) \ + != fun1_lsr_##OBIT##_##LSR##_ai (0, -1ul)) \ + __builtin_abort(); \ + } + + +#define MK_FUN2(OBIT, LSL) \ + static __inline__ __attribute__((__always_inline__)) \ + uint32_t fun2_lsl_##OBIT##_##LSL##_ai (int x, uint32_t a) \ + { \ + (void) x; \ + return (a << LSL) & (1ul << OBIT); \ + } \ + \ + __attribute__((__noinline__,__noclone__)) \ + uint32_t fun2_lsl_##OBIT##_##LSL##_ni (int x, uint32_t a) \ + { \ + return fun2_lsl_##OBIT##_##LSL##_ai (x, a); \ + } \ + \ + void test_fun2_lsl_##OBIT##_##LSL (void) \ + { \ + if (fun2_lsl_##OBIT##_##LSL##_ni (0, 1ul << (OBIT - LSL)) \ + != fun2_lsl_##OBIT##_##LSL##_ai (0, 1ul << (OBIT - LSL))) \ + __builtin_abort(); \ + \ + if (fun2_lsl_##OBIT##_##LSL##_ni (0, 1ul << (OBIT - LSL)) \ + != fun2_lsl_##OBIT##_##LSL##_ai (0, -1ul)) \ + __builtin_abort(); \ + } + + +MK_FUN1 (13, 15) +MK_FUN1 (13, 16) +MK_FUN1 (13, 17) +MK_FUN1 (13, 12) +MK_FUN1 (0, 31) +MK_FUN1 (0, 8) +MK_FUN1 (0, 0) + +MK_FUN2 (12, 8) +MK_FUN2 (13, 8) +MK_FUN2 (16, 8) +MK_FUN2 (16, 0) + +int main (void) +{ + test_fun1_lsr_13_15 (); + test_fun1_lsr_13_16 (); + test_fun1_lsr_13_17 (); + test_fun1_lsr_13_12 (); + test_fun1_lsr_0_31 (); + test_fun1_lsr_0_8 (); + test_fun1_lsr_0_0 (); + + test_fun2_lsl_12_8 (); + test_fun2_lsl_13_8 (); + test_fun2_lsl_16_8 (); + test_fun2_lsl_16_0 (); + + return 0; +}