From patchwork Fri Aug 30 12:09:57 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Georg-Johann Lay <avr@gjlay.de>
X-Patchwork-Id: 1978962
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=gjlay.de header.i=@gjlay.de header.a=rsa-sha256
 header.s=strato-dkim-0002 header.b=RLzgkoFi;
	dkim=pass header.d=gjlay.de header.i=@gjlay.de header.a=ed25519-sha256
 header.s=strato-dkim-0003 header.b=Dwr+e9Rk;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4WwH696tFCz1yfn
	for <incoming@patchwork.ozlabs.org>; Fri, 30 Aug 2024 22:10:25 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C996D3857011
	for <incoming@patchwork.ozlabs.org>; Fri, 30 Aug 2024 12:10:23 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de
 [81.169.146.221])
 by sourceware.org (Postfix) with ESMTPS id 125CA385842A
 for <gcc-patches@gcc.gnu.org>; Fri, 30 Aug 2024 12:09:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 125CA385842A
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=gjlay.de
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=gjlay.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 125CA385842A
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=81.169.146.221
ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1725019802; cv=pass;
 b=TK8C4y9HOpxwZK1p8IfKrdXPG4LDpH9hZ7OLp6IEV4gTi3glJDZVwAzlHxOj0ibrL+h1dX0vz2nug4jaKwcuq+BQilYoXlbKh79xDohzlHtgre+IYS5kTdNXu+W2G/4VEqkuziv32KyrYAAIJ9knX/pdI6IMD/El05vnaQD0vwg=
ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key;
 t=1725019802; c=relaxed/simple;
 bh=tAbwzf1c6mtiY7EqwCpbqul760Z9AISCtX2vdMd6O/g=;
 h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version:From:
 To:Subject;
 b=lFbphmPdqTH5WyDkWXn4ppLmXTp4SsCe2h9WVxOuKtALcLIUD1NhE4Pkn2OZHUWtjdxtZnClDMss2hcGVnSPiaBEbTLXoXydbvpxB/d2Yb0EkI3Z8oalr8AF7EsHJtV+HBXa2beee/BCW4F0AvxZdidTy5dwfB4vdQ8srelwPw0=
ARC-Authentication-Results: i=2; server2.sourceware.org
ARC-Seal: i=1; a=rsa-sha256; t=1725019798; cv=none;
 d=strato.com; s=strato-dkim-0002;
 b=LOKvUd/IiM+FwB3ZjQAAqYx+BxNRY+55HqPXrFvQzexdhW+l1bfwNaV61vIAWsaoev
 iR9NH12qqgUm3Al3z/KD+KuyTgwraVc1squ7FzoIgoRWEGe2j/hIrDBqXI2usdCxAvd3
 IHbIlAYHTpqQ096uz6K5NV5UTmynqQgQCs1KnagAb0cxY3JL1ZiIIv5Oyo/eI2In7wt3
 dNRjM/9go7XKyvAXcPlTazgndPzs86H9xS3Fr4B9IlqpaKDTfhuGbVxo/d6xOYEi12wR
 rIsajludb4KeQn0D8YwspljPfGyVpzLNrNMJ1ZoeU4vtMFLp33rG8SfrrVI3wWpfhh95
 5zpA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1725019798;
 s=strato-dkim-0002; d=strato.com;
 h=Subject:Cc:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender;
 bh=fKtg5W0Aa5FQ2KvUBBFTgYdDcpgg83xr7cXuIVk6TbE=;
 b=fnR2+wtz5kEyA6HucELa/9FeUME7MxsspJLnwTnsWgz0l/b5ZIbPSSfC76J6GfB9U7
 cowg/SYslEmnMl2eXOYWigr4WBqHglYB1hYUWPmeJY5nocwe9vs6F4ZfESU8VuMS5ArQ
 u1gcgllApXmwZtaKPPQchsA2q/WK8ZrSK5Z0jexZZgj/Lx81UPK+vNeBoUh6vElVMXN6
 4q+7SKm0MxMprl1CHBpLfR0tm45naD4c/e1fwMGC4XSMkwbvuDEeqTNStvDMEZ/qnJ8C
 tF69acgIN/C9l9PAZ4p6V95m+WWiTPzIHjc4N958CHjBcPtgy9SHT4/Z+f3U/29ENbB0
 jSsQ==
ARC-Authentication-Results: i=1; strato.com;
    arc=none;
    dkim=none
X-RZG-CLASS-ID: mo00
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1725019798;
 s=strato-dkim-0002; d=gjlay.de;
 h=Subject:Cc:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender;
 bh=fKtg5W0Aa5FQ2KvUBBFTgYdDcpgg83xr7cXuIVk6TbE=;
 b=RLzgkoFi6TVXFSr6mPt7Yb+Faose5nr5LejM3e5tvSRdBYkABUujzkt1MhSwvDtdGY
 DbwpMadJE4AVcqrzZlxjThTp/ixKYbcXiJ3AtIKjwwRTXzPgP4mH6I6qYhgv5zU5g9KB
 9bPxGW10xss8JR28HuWNlqVSnOwFS02/Huo+Vc840s+l8NltQ4oSrXqkGz6BYRrIAAPS
 MEbG9xKoXOpbnoBDcQimIeByA+/qnYZWlhFgE5tMA20e/2FiN+4LIKkqSLDPxGnEYG+g
 lQYE9IXsZ3FfTC8vS0Hr4JnKkRprEd9TOK9JKh9hH/u2X4EKkRwp+8+9CTvn4AcfcrNM
 vJaA==
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1725019798;
 s=strato-dkim-0003; d=gjlay.de;
 h=Subject:Cc:To:From:Date:Message-ID:Cc:Date:From:Subject:Sender;
 bh=fKtg5W0Aa5FQ2KvUBBFTgYdDcpgg83xr7cXuIVk6TbE=;
 b=Dwr+e9Rk+Xm0K2nLUxslCS5XF+JQv8awUZHF4+7Cha+R59kKjzWds4nY2DWHNeqGB8
 f0og7KGjAUSonvrz0RCg==
X-RZG-AUTH: ":LXoWVUeid/7A29J/hMvvT3koxZnKXKoq0dKoR0vetzhr/2IDlGFRklUq"
Received: from [192.168.2.102] by smtp.strato.de (RZmta 51.2.3 DYNA|AUTH)
 with ESMTPSA id xccbe707UC9v9pG
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits))
 (Client did not present a certificate);
 Fri, 30 Aug 2024 14:09:57 +0200 (CEST)
Message-ID: <ebce7b58-329c-4a43-b949-309bac619280@gjlay.de>
Date: Fri, 30 Aug 2024 14:09:57 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Georg-Johann Lay <avr@gjlay.de>
Content-Language: en-US
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Denis Chertykov <chertykov@gmail.com>
Subject: [patch,avr] Run pass avr-fuse-add a second time
X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

There are cases, where opportunities to use POST_INC addressing
only occur very late in the compilation process.  Take for example
the following function from AVR-LibC's qsort:

void swapfunc (char *a, char *b, int n)
{
     do
     {
         char t = *a;
         *a++ = *b;
         *b++ = t;
     } while (--n > 0);
}


which  -mmcu=avrtiny -S -Os -dp  compiles to:

swapfunc:
	push r28		 ;  72	[c=4 l=1]  pushqi1/0
	push r29		 ;  73	[c=4 l=1]  pushqi1/0
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
	mov r26,r24	 ;  66	[c=4 l=1]  movqi_insn/0
	mov r27,r25	 ;  67	[c=4 l=1]  movqi_insn/0
	mov r30,r22	 ;  68	[c=4 l=1]  movqi_insn/0
	mov r31,r23	 ;  69	[c=4 l=1]  movqi_insn/0
	mov r22,r20	 ;  70	[c=4 l=1]  movqi_insn/0
	mov r23,r21	 ;  71	[c=4 l=1]  movqi_insn/0
.L2:
	ld r20,X		 ;  55	[c=4 l=1]  movqi_insn/3
	ld r21,Z		 ;  57	[c=4 l=1]  movqi_insn/3
	st X,r21		 ;  58	[c=4 l=1]  movqi_insn/2
	subi r26,-1	 ;  59	[c=4 l=2]  *addhi3_clobber/0
	sbci r27,-1
	st Z,r20		 ;  61	[c=4 l=1]  movqi_insn/2
	subi r30,-1	 ;  62	[c=4 l=2]  *addhi3_clobber/0
	sbci r31,-1
	subi r22, 1	 ;  81	[c=4 l=2]  *add.for.cczn.hi/0
	sbci r23, 0
	breq .+2	 ;  82	[c=8 l=2]  branch_ZN
	brpl .L2
/* epilogue start */
	pop r29		 ;  76	[c=4 l=1]  popqi
	pop r28		 ;  77	[c=4 l=1]  popqi
	ret		 ;  78	[c=0 l=1]  return_from_epilogue

Insn 56+57 and insns 61+62 are post-inc stores.  They are not recognized
because the code prior to cprop_hardreg is a bit of a mess that moves
addresses back and forth (including Y).  Only after cprop_hardreg the
code is simple enough so post-inc can be detected by avr-fuse-add.

Hence this patch runs a 2nd instance of that pass late after
cprop_hardreg (the 1st instance runs prior to RTL peephole).

It renames avr_split_tiny_move to avr_split_fake_addressing_move
because that function also splits some insns on non-avrtiny.

The patch removes a define_split that's no more needed because
such splits are performed by avr_split_fake_addressing_move.

Passed without new regressions.  Ok for trunk?

Johann

p.s. post-inc etc. optimizations is basically non-existent in GCC.
One reasons seems to be SSA because each SSA variable gets its own
register, which results in a jungle of required registers which
even pass auto-inc-dec cannot penetrate...

Take for example the following code, that would require 12 instructions
as indicated by the comments:

void add4 (uint8_t *aa, const __flash uint8_t *bb, uint8_t nn)
{
     // Set Z (R30) to bb (1 MOVW or 2 MOVs)
     // Set X (R26) to aa (1 MOVW or 2 MOVs)
     do
     {
         uint8_t sum = 0;
         sum += *bb++; // 1 instruction: POST_INC load
         sum += *bb++; // 2 instructions: POST_INC load + add
         sum += *bb++; // 2 instructions: POST_INC load + add
         sum += *bb++; // 2 instructions: POST_INC load + add
         *aa++ = sum;  // 1 instruction: POST_INC store
     } while (--nn);   // 2 instructions: dec + branch
}

But  -mmcu=avr4 -Os -S -dp  compiles this to madness that
requires more than a dozen registers and computes each
intermediate in its own 16-bit register, coming out with
code that requires 34 instructions instead of 12.
---

AVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg.

gcc/
	* config/avr/avr-protos.h (avr_split_tiny_move): Rename to
	avr_split_fake_addressing_move.
	* config/avr/avr-passes.cc: Same.
	(avr_pass_data_fuse_add) <tv_id>: Set to TV_MACH_DEP.
	(avr_pass_fuse_add) <clone>: Override.
	* config/avr/avr-passes.def (avr_pass_fuse_add): Run again
	after pass_cprop_hardreg.
	* config/avr/avr.md (split-lpmx): Remove a define_split.  Such
	splits are performed by avr_split_fake_addressing_move.

diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index 8b018ff6a05..8a71b57ada1 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -1009,7 +1009,7 @@ static const pass_data avr_pass_data_fuse_add =
   RTL_PASS,	    // type
   "",		    // name (will be patched)
   OPTGROUP_NONE,    // optinfo_flags
-  TV_DF_SCAN,	    // tv_id
+  TV_MACH_DEP,	    // tv_id
   0,		    // properties_required
   0,		    // properties_provided
   0,		    // properties_destroyed
@@ -1026,6 +1026,13 @@ public:
     this->name = name;
   }
 
+  // Cloning is required because we are running one instance of the pass
+  // before peephole2. and a second one after cprop_hardreg.
+  opt_pass * clone () final override
+  {
+    return make_avr_pass_fuse_add (m_ctxt);
+  }
+
   bool gate (function *) final override
   {
     return optimize && avr_fuse_add > 0;
@@ -1503,8 +1510,8 @@ avr_pass_fuse_add::fuse_mem_add (Mem_Insn &mem, Add_Insn &add)
    - PLUS insn of that kind.
    - Indirect loads and stores.
    In almost all cases, combine opportunities arise from the preparation
-   done by `avr_split_tiny_move', but in some rare cases combinations are
-   found for the ordinary cores, too.
+   done by `avr_split_fake_addressing_move', but in some rare cases combinations
+   are found for the ordinary cores, too.
       As we consider at most one Mem insn per try, there may still be missed
    optimizations like  POST_INC + PLUS + POST_INC  might be performed
    as  PRE_DEC + PRE_DEC  for two adjacent locations.  */
@@ -1714,7 +1721,7 @@ public:
    core's capabilities.  This sets the stage for pass .avr-fuse-add.  */
 
 bool
-avr_split_tiny_move (rtx_insn * /*insn*/, rtx *xop)
+avr_split_fake_addressing_move (rtx_insn * /*insn*/, rtx *xop)
 {
   bool store_p = false;
   rtx mem, reg_or_0;
diff --git a/gcc/config/avr/avr-passes.def b/gcc/config/avr/avr-passes.def
index 748260edaef..cd89d673727 100644
--- a/gcc/config/avr/avr-passes.def
+++ b/gcc/config/avr/avr-passes.def
@@ -20,12 +20,33 @@
 /* A post reload optimization pass that fuses PLUS insns with CONST_INT
    addend with a load or store insn to get POST_INC or PRE_DEC addressing.
    It can also fuse two PLUSes to a single one, which may occur due to
-   splits from `avr_split_tiny_move'.  We do this in an own pass because
-   it can find more cases than peephole2, for example when there are
-   unrelated insns between the interesting ones.  */
+   splits from `avr_split_fake_addressing_move'.  We do this in an own
+   pass because it can find more cases than peephole2, for example when
+   there are unrelated insns between the interesting ones.  */
 
 INSERT_PASS_BEFORE (pass_peephole2, 1, avr_pass_fuse_add);
 
+/* There are cases where avr-fuse-add doesn't find POST_INC cases because
+   the RTL code at that time is too long-winded, and moves registers back and
+   forth (which seems to be the same reason for why pass auto_inc_dec cannot
+   find POST_INC, either).  Some of that long-windedness is cleaned up very
+   late in pass cprop_hardreg, which opens up new opportunities to find post
+   increments.  An example is the following function from AVR-LibC's qsort:
+
+   void swapfunc (char *a, char *b, int n)
+   {
+       do
+       {
+           char tmp = *a;
+           *a++ = *b;
+           *b++ = tmp;
+       } while (--n > 0);
+   }
+
+   Hence, run avr-fuse-add twice; the second time after cprop_hardreg.  */
+
+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, avr_pass_fuse_add);
+
 /* An analysis pass that runs prior to prologue / epilogue generation.
    Computes cfun->machine->gasisr.maybe which is used in prologue and
    epilogue generation provided -mgas-isr-prologues is on.  */
@@ -47,9 +68,9 @@ INSERT_PASS_BEFORE (pass_free_cfg, 1, avr_pass_recompute_notes);
    tries to fix such situations by operating on the original mode.  This
    reduces code size and register pressure.
 
-   The assertion is that the code generated by casesi is unaltered and a
+   The assertion is that the code generated by casesi is unaltered and
    a sign-extend or zero-extend from QImode or HImode precedes the casesi
-   insns withaout any insns in between.  */
+   insns without any insns in between.  */
 
 INSERT_PASS_AFTER (pass_expand, 1, avr_pass_casesi);
 
diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 289c80cab55..d6f4abbac66 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -179,7 +179,7 @@ extern rtl_opt_pass *make_avr_pass_casesi (gcc::context *);
 extern rtl_opt_pass *make_avr_pass_ifelse (gcc::context *);
 #ifdef RTX_CODE
 extern bool avr_casei_sequence_check_operands (rtx *xop);
-extern bool avr_split_tiny_move (rtx_insn *insn, rtx *operands);
+extern bool avr_split_fake_addressing_move (rtx_insn *insn, rtx *operands);
 #endif /* RTX_CODE */
 
 /* From avr-log.cc */
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index f477ac170ea..520f1fe41a2 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -993,41 +993,10 @@ (define_peephole2
               (clobber (reg:CC REG_CC))])])
 
 
-;; For LPM loads from AS1 we split
-;;    R = *Z
-;; to
-;;    R = *Z++
-;;    Z = Z - sizeof (R)
-;;
-;; so that the second instruction can be optimized out.
-
-(define_split ; "split-lpmx"
-  [(set (match_operand:HISI 0 "register_operand" "")
-        (match_operand:HISI 1 "memory_operand" ""))]
-  "reload_completed
-   && AVR_HAVE_LPMX
-   && avr_mem_flash_p (operands[1])
-   && REG_P (XEXP (operands[1], 0))
-   && !reg_overlap_mentioned_p (XEXP (operands[1], 0), operands[0])"
-  [(set (match_dup 0)
-        (match_dup 2))
-   (set (match_dup 3)
-        (plus:HI (match_dup 3)
-                 (match_dup 4)))]
-  {
-     rtx addr = XEXP (operands[1], 0);
-
-    operands[2] = replace_equiv_address (operands[1],
-                                         gen_rtx_POST_INC (Pmode, addr));
-    operands[3] = addr;
-    operands[4] = gen_int_mode (-<SIZE>, HImode);
-  })
-
-
 ;; Legitimate address and stuff allows way more addressing modes than
 ;; Reduced Tiny actually supports.  Split them now so that we get
 ;; closer to real instructions which may result in some optimization
-;; opportunities.
+;; opportunities.  This applies also to fake X + offset addressing.
 (define_split
   [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand")
                    (match_operand:MOVMODE 1 "general_operand"))
@@ -1040,7 +1009,7 @@ (define_split
    && (MEM_P (operands[0]) || MEM_P (operands[1]))"
   [(scratch)]
   {
-    if (avr_split_tiny_move (curr_insn, operands))
+    if (avr_split_fake_addressing_move (curr_insn, operands))
       DONE;
     FAIL;
   })