diff mbox series

[mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)]

Message ID 20171114160235.GA30036@arm.com
State New
Headers show
Series [mid-end] Allow larger copies when target supports unaligned access [Patch (1/2)] | expand

Commit Message

Tamar Christina Nov. 14, 2017, 4:02 p.m. UTC
Hi All,

This patch allows larger bitsizes to be used as copy size
when the target does not have SLOW_UNALIGNED_ACCESS.

fun3:
	adrp	x2, .LANCHOR0
	add	x2, x2, :lo12:.LANCHOR0
	mov	x0, 0
	sub	sp, sp, #16
	ldrh	w1, [x2, 16]
	ldrb	w2, [x2, 18]
	add	sp, sp, 16
	bfi	x0, x1, 0, 8
	ubfx	x1, x1, 8, 8
	bfi	x0, x1, 8, 8
	bfi	x0, x2, 16, 8
	ret

is turned into

fun3:
	adrp	x0, .LANCHOR0
	add	x0, x0, :lo12:.LANCHOR0
	sub	sp, sp, #16
	ldrh	w1, [x0, 16]
	ldrb	w0, [x0, 18]
	strh	w1, [sp, 8]
	strb	w0, [sp, 10]
	ldr	w0, [sp, 8]
	add	sp, sp, 16
	ret

which avoids the bfi's for a simple 3 byte struct copy.

Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no regressions.

This patch is just splitting off from the previous combined patch with AArch64 and adding
a testcase.

I assume Jeff's ACK from https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
valid as the code did not change.

Thanks,
Tamar


gcc/
2017-11-14  Tamar Christina  <tamar.christina@arm.com>

	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
	with fast unaligned access.
	* doc/sourcebuild.texi (no_slow_unalign): New.
	
gcc/testsuite/
2017-11-14  Tamar Christina  <tamar.christina@arm.com>

	* gcc.dg/struct-simple.c: New.
	* lib/target-supports.exp
	(check_effective_target_no_slow_unalign): New.

--

Comments

Richard Biener Nov. 15, 2017, 8:23 a.m. UTC | #1
On Tue, 14 Nov 2017, Tamar Christina wrote:

> Hi All,
> 
> This patch allows larger bitsizes to be used as copy size
> when the target does not have SLOW_UNALIGNED_ACCESS.
> 
> fun3:
> 	adrp	x2, .LANCHOR0
> 	add	x2, x2, :lo12:.LANCHOR0
> 	mov	x0, 0
> 	sub	sp, sp, #16
> 	ldrh	w1, [x2, 16]
> 	ldrb	w2, [x2, 18]
> 	add	sp, sp, 16
> 	bfi	x0, x1, 0, 8
> 	ubfx	x1, x1, 8, 8
> 	bfi	x0, x1, 8, 8
> 	bfi	x0, x2, 16, 8
> 	ret
> 
> is turned into
> 
> fun3:
> 	adrp	x0, .LANCHOR0
> 	add	x0, x0, :lo12:.LANCHOR0
> 	sub	sp, sp, #16
> 	ldrh	w1, [x0, 16]
> 	ldrb	w0, [x0, 18]
> 	strh	w1, [sp, 8]
> 	strb	w0, [sp, 10]
> 	ldr	w0, [sp, 8]
> 	add	sp, sp, 16
> 	ret
> 
> which avoids the bfi's for a simple 3 byte struct copy.
> 
> Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no regressions.
> 
> This patch is just splitting off from the previous combined patch with AArch64 and adding
> a testcase.
> 
> I assume Jeff's ACK from https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> valid as the code did not change.

Given your no_slow_unalign isn't mode specific can't you use the existing
non_strict_align?

Otherwise the expr.c change looks ok.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> 
> gcc/
> 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> 	with fast unaligned access.
> 	* doc/sourcebuild.texi (no_slow_unalign): New.
> 	
> gcc/testsuite/
> 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* gcc.dg/struct-simple.c: New.
> 	* lib/target-supports.exp
> 	(check_effective_target_no_slow_unalign): New.
> 
>
Tamar Christina Nov. 15, 2017, 11:32 a.m. UTC | #2
> -----Original Message-----
> From: Richard Biener [mailto:rguenther@suse.de]
> Sent: Wednesday, November 15, 2017 08:24
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> ian@airs.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Tue, 14 Nov 2017, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch allows larger bitsizes to be used as copy size when the
> > target does not have SLOW_UNALIGNED_ACCESS.
> >
> > fun3:
> > 	adrp	x2, .LANCHOR0
> > 	add	x2, x2, :lo12:.LANCHOR0
> > 	mov	x0, 0
> > 	sub	sp, sp, #16
> > 	ldrh	w1, [x2, 16]
> > 	ldrb	w2, [x2, 18]
> > 	add	sp, sp, 16
> > 	bfi	x0, x1, 0, 8
> > 	ubfx	x1, x1, 8, 8
> > 	bfi	x0, x1, 8, 8
> > 	bfi	x0, x2, 16, 8
> > 	ret
> >
> > is turned into
> >
> > fun3:
> > 	adrp	x0, .LANCHOR0
> > 	add	x0, x0, :lo12:.LANCHOR0
> > 	sub	sp, sp, #16
> > 	ldrh	w1, [x0, 16]
> > 	ldrb	w0, [x0, 18]
> > 	strh	w1, [sp, 8]
> > 	strb	w0, [sp, 10]
> > 	ldr	w0, [sp, 8]
> > 	add	sp, sp, 16
> > 	ret
> >
> > which avoids the bfi's for a simple 3 byte struct copy.
> >
> > Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and
> no regressions.
> >
> > This patch is just splitting off from the previous combined patch with
> > AArch64 and adding a testcase.
> >
> > I assume Jeff's ACK from
> > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still valid as
> the code did not change.
> 
> Given your no_slow_unalign isn't mode specific can't you use the existing
> non_strict_align?

No because non_strict_align checks if the target supports unaligned access at all,

This no_slow_unalign corresponds instead to the target slow_unaligned_access
which checks that the access you want to make has a greater cost than doing an
aligned access. ARM for instance always return 1 (value of STRICT_ALIGNMENT)
for slow_unaligned_access while for non_strict_align it may return 0 or 1 based
on the options provided to the compiler.

The problem is I have no way to test STRICT_ALIGNMENT or slow_unaligned_access
So I had to hardcode some targets that I know it does work on.

Thanks,
Tamar
> 
> Otherwise the expr.c change looks ok.
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> >
> > gcc/
> > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> >
> > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > 	with fast unaligned access.
> > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> >
> > gcc/testsuite/
> > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> >
> > 	* gcc.dg/struct-simple.c: New.
> > 	* lib/target-supports.exp
> > 	(check_effective_target_no_slow_unalign): New.
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)
Richard Biener Nov. 15, 2017, 12:49 p.m. UTC | #3
On Wed, 15 Nov 2017, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener [mailto:rguenther@suse.de]
> > Sent: Wednesday, November 15, 2017 08:24
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> > ian@airs.com
> > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > supports unaligned access [Patch (1/2)]
> > 
> > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch allows larger bitsizes to be used as copy size when the
> > > target does not have SLOW_UNALIGNED_ACCESS.
> > >
> > > fun3:
> > > 	adrp	x2, .LANCHOR0
> > > 	add	x2, x2, :lo12:.LANCHOR0
> > > 	mov	x0, 0
> > > 	sub	sp, sp, #16
> > > 	ldrh	w1, [x2, 16]
> > > 	ldrb	w2, [x2, 18]
> > > 	add	sp, sp, 16
> > > 	bfi	x0, x1, 0, 8
> > > 	ubfx	x1, x1, 8, 8
> > > 	bfi	x0, x1, 8, 8
> > > 	bfi	x0, x2, 16, 8
> > > 	ret
> > >
> > > is turned into
> > >
> > > fun3:
> > > 	adrp	x0, .LANCHOR0
> > > 	add	x0, x0, :lo12:.LANCHOR0
> > > 	sub	sp, sp, #16
> > > 	ldrh	w1, [x0, 16]
> > > 	ldrb	w0, [x0, 18]
> > > 	strh	w1, [sp, 8]
> > > 	strb	w0, [sp, 10]
> > > 	ldr	w0, [sp, 8]
> > > 	add	sp, sp, 16
> > > 	ret
> > >
> > > which avoids the bfi's for a simple 3 byte struct copy.
> > >
> > > Regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and
> > no regressions.
> > >
> > > This patch is just splitting off from the previous combined patch with
> > > AArch64 and adding a testcase.
> > >
> > > I assume Jeff's ACK from
> > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still valid as
> > the code did not change.
> > 
> > Given your no_slow_unalign isn't mode specific can't you use the existing
> > non_strict_align?
> 
> No because non_strict_align checks if the target supports unaligned access at all,
> 
> This no_slow_unalign corresponds instead to the target slow_unaligned_access
> which checks that the access you want to make has a greater cost than doing an
> aligned access. ARM for instance always return 1 (value of STRICT_ALIGNMENT)
> for slow_unaligned_access while for non_strict_align it may return 0 or 1 based
> on the options provided to the compiler.
> 
> The problem is I have no way to test STRICT_ALIGNMENT or slow_unaligned_access
> So I had to hardcode some targets that I know it does work on.

I see.  But then the slow_unaligned_access implementation should use
non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is defaulted
to STRICT_ALIGN.

Given that SLOW_UNALIGNED_ACCESS has different values for different modes
it would also make sense to be more specific for the testcase in question,
like word_mode_slow_unaligned_access to tell this only applies to 
word_mode?

Thanks,
Richard.

> Thanks,
> Tamar
> > 
> > Otherwise the expr.c change looks ok.
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > >
> > >
> > > gcc/
> > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > >
> > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > 	with fast unaligned access.
> > > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> > >
> > > gcc/testsuite/
> > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > >
> > > 	* gcc.dg/struct-simple.c: New.
> > > 	* lib/target-supports.exp
> > > 	(check_effective_target_no_slow_unalign): New.
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > HRB 21284 (AG Nuernberg)
> 
>
Tamar Christina Nov. 15, 2017, 3:52 p.m. UTC | #4
> -----Original Message-----
> From: Richard Biener [mailto:rguenther@suse.de]
> Sent: Wednesday, November 15, 2017 12:50
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> ian@airs.com
> Subject: RE: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Wed, 15 Nov 2017, Tamar Christina wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener [mailto:rguenther@suse.de]
> > > Sent: Wednesday, November 15, 2017 08:24
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> > > ian@airs.com
> > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > > supports unaligned access [Patch (1/2)]
> > >
> > > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch allows larger bitsizes to be used as copy size when the
> > > > target does not have SLOW_UNALIGNED_ACCESS.
> > > >
> > > > fun3:
> > > > 	adrp	x2, .LANCHOR0
> > > > 	add	x2, x2, :lo12:.LANCHOR0
> > > > 	mov	x0, 0
> > > > 	sub	sp, sp, #16
> > > > 	ldrh	w1, [x2, 16]
> > > > 	ldrb	w2, [x2, 18]
> > > > 	add	sp, sp, 16
> > > > 	bfi	x0, x1, 0, 8
> > > > 	ubfx	x1, x1, 8, 8
> > > > 	bfi	x0, x1, 8, 8
> > > > 	bfi	x0, x2, 16, 8
> > > > 	ret
> > > >
> > > > is turned into
> > > >
> > > > fun3:
> > > > 	adrp	x0, .LANCHOR0
> > > > 	add	x0, x0, :lo12:.LANCHOR0
> > > > 	sub	sp, sp, #16
> > > > 	ldrh	w1, [x0, 16]
> > > > 	ldrb	w0, [x0, 18]
> > > > 	strh	w1, [sp, 8]
> > > > 	strb	w0, [sp, 10]
> > > > 	ldr	w0, [sp, 8]
> > > > 	add	sp, sp, 16
> > > > 	ret
> > > >
> > > > which avoids the bfi's for a simple 3 byte struct copy.
> > > >
> > > > Regression tested on aarch64-none-linux-gnu and
> > > > x86_64-pc-linux-gnu and
> > > no regressions.
> > > >
> > > > This patch is just splitting off from the previous combined patch
> > > > with
> > > > AArch64 and adding a testcase.
> > > >
> > > > I assume Jeff's ACK from
> > > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> > > > valid as
> > > the code did not change.
> > >
> > > Given your no_slow_unalign isn't mode specific can't you use the
> > > existing non_strict_align?
> >
> > No because non_strict_align checks if the target supports unaligned
> > access at all,
> >
> > This no_slow_unalign corresponds instead to the target
> > slow_unaligned_access which checks that the access you want to make
> > has a greater cost than doing an aligned access. ARM for instance
> > always return 1 (value of STRICT_ALIGNMENT) for slow_unaligned_access
> > while for non_strict_align it may return 0 or 1 based on the options
> provided to the compiler.
> >
> > The problem is I have no way to test STRICT_ALIGNMENT or
> > slow_unaligned_access So I had to hardcode some targets that I know it
> does work on.
> 
> I see.  But then the slow_unaligned_access implementation should use
> non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> defaulted to STRICT_ALIGN.
> 
> Given that SLOW_UNALIGNED_ACCESS has different values for different
> modes it would also make sense to be more specific for the testcase in
> question, like word_mode_slow_unaligned_access to tell this only applies to
> word_mode?

Ah, that's fair enough. I've updated the patch and the new changelog is:


gcc/
2017-11-15  Tamar Christina  <tamar.christina@arm.com>

	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
	with fast unaligned access.
	* doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
	
gcc/testsuite/
2017-11-15  Tamar Christina  <tamar.christina@arm.com>

	* gcc.dg/struct-simple.c: New.
	* lib/target-supports.exp
	(check_effective_target_word_mode_no_slow_unalign): New.

Ok for trunk?

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> > >
> > > Otherwise the expr.c change looks ok.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > >
> > > > gcc/
> > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > >
> > > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > 	with fast unaligned access.
> > > > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> > > >
> > > > gcc/testsuite/
> > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > >
> > > > 	* gcc.dg/struct-simple.c: New.
> > > > 	* lib/target-supports.exp
> > > > 	(check_effective_target_no_slow_unalign): New.
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > Norton, HRB 21284 (AG Nuernberg)
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nuernberg)
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a899991c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
+    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-gdb@prep.ai.mit.edu  */
+
+#include <stdio.h>
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+     struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..72641b6e0f4b9453c413ad1c89a8d8053b1abd4e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,31 @@ proc check_effective_target_unaligned_stack { } {
     return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access when using word size accesses.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_word_mode_no_slow_unalign { } {
+    global et_word_mode_no_slow_unalign_saved
+    global et_index
+
+    if [info exists et_word_mode_no_slow_unalign_saved($et_index)] {
+        verbose "check_effective_target_word_mode_no_slow_unalign: \
+                 using cached result" 2
+    } else {
+        set et_word_mode_no_slow_unalign_saved($et_index) 0
+        if { [istarget x86_64-*-*]
+             || [istarget aarch64*-*-*]
+           } {
+            set et_word_mode_no_slow_unalign_saved($et_index) 1
+        }
+    }
+    verbose "check_effective_target_word_mode_no_slow_unalign:\
+             returning $et_word_mode_no_slow_unalign_saved($et_index)" 2
+    return $et_word_mode_no_slow_unalign_saved($et_index)
+}
+
 # Return 1 if the target plus current options does not support a vector
 # alignment mechanism, 0 otherwise.
 #
Richard Biener Nov. 16, 2017, 7:58 a.m. UTC | #5
On Wed, 15 Nov 2017, tamar.christina@arm.com wrote:

> > -----Original Message-----
> > From: Richard Biener [mailto:rguenther@suse.de]
> > Sent: Wednesday, November 15, 2017 12:50
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> > ian@airs.com
> > Subject: RE: [PATCH][GCC][mid-end] Allow larger copies when target
> > supports unaligned access [Patch (1/2)]
> > 
> > On Wed, 15 Nov 2017, Tamar Christina wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener [mailto:rguenther@suse.de]
> > > > Sent: Wednesday, November 15, 2017 08:24
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; law@redhat.com;
> > > > ian@airs.com
> > > > Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> > > > supports unaligned access [Patch (1/2)]
> > > >
> > > > On Tue, 14 Nov 2017, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch allows larger bitsizes to be used as copy size when the
> > > > > target does not have SLOW_UNALIGNED_ACCESS.
> > > > >
> > > > > fun3:
> > > > > 	adrp	x2, .LANCHOR0
> > > > > 	add	x2, x2, :lo12:.LANCHOR0
> > > > > 	mov	x0, 0
> > > > > 	sub	sp, sp, #16
> > > > > 	ldrh	w1, [x2, 16]
> > > > > 	ldrb	w2, [x2, 18]
> > > > > 	add	sp, sp, 16
> > > > > 	bfi	x0, x1, 0, 8
> > > > > 	ubfx	x1, x1, 8, 8
> > > > > 	bfi	x0, x1, 8, 8
> > > > > 	bfi	x0, x2, 16, 8
> > > > > 	ret
> > > > >
> > > > > is turned into
> > > > >
> > > > > fun3:
> > > > > 	adrp	x0, .LANCHOR0
> > > > > 	add	x0, x0, :lo12:.LANCHOR0
> > > > > 	sub	sp, sp, #16
> > > > > 	ldrh	w1, [x0, 16]
> > > > > 	ldrb	w0, [x0, 18]
> > > > > 	strh	w1, [sp, 8]
> > > > > 	strb	w0, [sp, 10]
> > > > > 	ldr	w0, [sp, 8]
> > > > > 	add	sp, sp, 16
> > > > > 	ret
> > > > >
> > > > > which avoids the bfi's for a simple 3 byte struct copy.
> > > > >
> > > > > Regression tested on aarch64-none-linux-gnu and
> > > > > x86_64-pc-linux-gnu and
> > > > no regressions.
> > > > >
> > > > > This patch is just splitting off from the previous combined patch
> > > > > with
> > > > > AArch64 and adding a testcase.
> > > > >
> > > > > I assume Jeff's ACK from
> > > > > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01523.html is still
> > > > > valid as
> > > > the code did not change.
> > > >
> > > > Given your no_slow_unalign isn't mode specific can't you use the
> > > > existing non_strict_align?
> > >
> > > No because non_strict_align checks if the target supports unaligned
> > > access at all,
> > >
> > > This no_slow_unalign corresponds instead to the target
> > > slow_unaligned_access which checks that the access you want to make
> > > has a greater cost than doing an aligned access. ARM for instance
> > > always return 1 (value of STRICT_ALIGNMENT) for slow_unaligned_access
> > > while for non_strict_align it may return 0 or 1 based on the options
> > provided to the compiler.
> > >
> > > The problem is I have no way to test STRICT_ALIGNMENT or
> > > slow_unaligned_access So I had to hardcode some targets that I know it
> > does work on.
> > 
> > I see.  But then the slow_unaligned_access implementation should use
> > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > defaulted to STRICT_ALIGN.
> > 
> > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > modes it would also make sense to be more specific for the testcase in
> > question, like word_mode_slow_unaligned_access to tell this only applies to
> > word_mode?
> 
> Ah, that's fair enough. I've updated the patch and the new changelog is:

Did you attach the old patch? I don't see strict_aling being tested in
the word_mode_np_slow_unalign test.

Richard.

> 
> gcc/
> 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> 	with fast unaligned access.
> 	* doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> 	
> gcc/testsuite/
> 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> 
> 	* gcc.dg/struct-simple.c: New.
> 	* lib/target-supports.exp
> 	(check_effective_target_word_mode_no_slow_unalign): New.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Tamar
> > > >
> > > > Otherwise the expr.c change looks ok.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > >
> > > > > gcc/
> > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > >
> > > > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > 	with fast unaligned access.
> > > > > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> > > > >
> > > > > gcc/testsuite/
> > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > >
> > > > > 	* gcc.dg/struct-simple.c: New.
> > > > > 	* lib/target-supports.exp
> > > > > 	(check_effective_target_no_slow_unalign): New.
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > Norton, HRB 21284 (AG Nuernberg)
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > HRB 21284 (AG Nuernberg)
>
Tamar Christina Nov. 16, 2017, 11:42 a.m. UTC | #6
> > > 
> > > I see.  But then the slow_unaligned_access implementation should use
> > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > defaulted to STRICT_ALIGN.
> > > 
> > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > modes it would also make sense to be more specific for the testcase in
> > > question, like word_mode_slow_unaligned_access to tell this only applies to
> > > word_mode?
> > 
> > Ah, that's fair enough. I've updated the patch and the new changelog is:
> 
> Did you attach the old patch? I don't see strict_aling being tested in
> the word_mode_np_slow_unalign test.
> 

Sorry! I misunderstood your previous email. I've added the check accordingly.

But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
in a way that uses only internal state to determine the value where STRICT_ALIGNMENT
is essentially ignored. e.g. PowerPC and riscv.

The code generation *might* change for them but the tests won't run. I see now way to
make the test accurate (as in, runs in all cases where the codegen changed)
unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.

Would this be the way to go?

Thanks,
Tamar

> Richard.
> 
> > 
> > gcc/
> > 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> > 
> > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > 	with fast unaligned access.
> > 	* doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > 	
> > gcc/testsuite/
> > 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> > 
> > 	* gcc.dg/struct-simple.c: New.
> > 	* lib/target-supports.exp
> > 	(check_effective_target_word_mode_no_slow_unalign): New.
> > 
> > Ok for trunk?
> > 
> > Thanks,
> > Tamar
> > 
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > > Thanks,
> > > > Tamar
> > > > >
> > > > > Otherwise the expr.c change looks ok.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > >
> > > > > > gcc/
> > > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > > >
> > > > > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > 	with fast unaligned access.
> > > > > > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > > >
> > > > > > 	* gcc.dg/struct-simple.c: New.
> > > > > > 	* lib/target-supports.exp
> > > > > > 	(check_effective_target_no_slow_unalign): New.
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > Norton, HRB 21284 (AG Nuernberg)
> > > >
> > > >
> > > 
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > HRB 21284 (AG Nuernberg)
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

--
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a899991c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
+    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-gdb@prep.ai.mit.edu  */
+
+#include <stdio.h>
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+     struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..e1bd9da20b671eb19ad5a96e3463a516d01ceb83 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,32 @@ proc check_effective_target_unaligned_stack { } {
     return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access when using word size accesses.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_word_mode_no_slow_unalign { } {
+    global et_word_mode_no_slow_unalign_saved
+    global et_index
+
+    if [info exists et_word_mode_no_slow_unalign_saved($et_index)] {
+        verbose "check_effective_target_word_mode_no_slow_unalign: \
+                 using cached result" 2
+    } else {
+        set et_word_mode_no_slow_unalign_saved($et_index) 0
+        if { ([istarget x86_64-*-*]
+              || [istarget aarch64*-*-*])
+	     && [is-effective-target non_strict_align]
+           } {
+            set et_word_mode_no_slow_unalign_saved($et_index) 1
+        }
+    }
+    verbose "check_effective_target_word_mode_no_slow_unalign:\
+             returning $et_word_mode_no_slow_unalign_saved($et_index)" 2
+    return $et_word_mode_no_slow_unalign_saved($et_index)
+}
+
 # Return 1 if the target plus current options does not support a vector
 # alignment mechanism, 0 otherwise.
 #
Richard Biener Nov. 16, 2017, 11:57 a.m. UTC | #7
On Thu, 16 Nov 2017, Tamar Christina wrote:

> > > > 
> > > > I see.  But then the slow_unaligned_access implementation should use
> > > > non_strict_align as default somehow as SLOW_UNALIGNED_ACCESS is
> > > > defaulted to STRICT_ALIGN.
> > > > 
> > > > Given that SLOW_UNALIGNED_ACCESS has different values for different
> > > > modes it would also make sense to be more specific for the testcase in
> > > > question, like word_mode_slow_unaligned_access to tell this only applies to
> > > > word_mode?
> > > 
> > > Ah, that's fair enough. I've updated the patch and the new changelog is:
> > 
> > Did you attach the old patch? I don't see strict_aling being tested in
> > the word_mode_np_slow_unalign test.
> > 
> 
> Sorry! I misunderstood your previous email. I've added the check accordingly.

+        if { ([istarget x86_64-*-*]
+              || [istarget aarch64*-*-*])
+            && [is-effective-target non_strict_align]
+           } {
+            set et_word_mode_no_slow_unalign_saved($et_index) 1
+        }

I'd have made it

  if { ([is-effective-target non_strict_align]
        && ! ( [istarget ...] || ....))

thus default it to 1 for non-strict-align targets.

> But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
> in a way that uses only internal state to determine the value where STRICT_ALIGNMENT
> is essentially ignored. e.g. PowerPC and riscv.
> 
> The code generation *might* change for them but the tests won't run. I see now way to
> make the test accurate (as in, runs in all cases where the codegen changed)
> unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> 
> Would this be the way to go?

I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
a certain alignment.

Richard.

> Thanks,
> Tamar
> 
> > Richard.
> > 
> > > 
> > > gcc/
> > > 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> > > 
> > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > 	with fast unaligned access.
> > > 	* doc/sourcebuild.texi (word_mode_no_slow_unalign): New.
> > > 	
> > > gcc/testsuite/
> > > 2017-11-15  Tamar Christina  <tamar.christina@arm.com>
> > > 
> > > 	* gcc.dg/struct-simple.c: New.
> > > 	* lib/target-supports.exp
> > > 	(check_effective_target_word_mode_no_slow_unalign): New.
> > > 
> > > Ok for trunk?
> > > 
> > > Thanks,
> > > Tamar
> > > 
> > > > 
> > > > Thanks,
> > > > Richard.
> > > > 
> > > > > Thanks,
> > > > > Tamar
> > > > > >
> > > > > > Otherwise the expr.c change looks ok.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > >
> > > > > > > gcc/
> > > > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > > > >
> > > > > > > 	* expr.c (copy_blkmode_to_reg): Fix bitsize for targets
> > > > > > > 	with fast unaligned access.
> > > > > > > 	* doc/sourcebuild.texi (no_slow_unalign): New.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > > 2017-11-14  Tamar Christina  <tamar.christina@arm.com>
> > > > > > >
> > > > > > > 	* gcc.dg/struct-simple.c: New.
> > > > > > > 	* lib/target-supports.exp
> > > > > > > 	(check_effective_target_no_slow_unalign): New.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de>
> > > > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> > > > > > Norton, HRB 21284 (AG Nuernberg)
> > > > >
> > > > >
> > > > 
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> > > > HRB 21284 (AG Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
>
Tamar Christina Nov. 16, 2017, 3:27 p.m. UTC | #8
Hi Richard,

> 
> I'd have made it
> 
>   if { ([is-effective-target non_strict_align]
>         && ! ( [istarget ...] || ....))
> 
> thus default it to 1 for non-strict-align targets.
> 

Fair, I've switched it to a black list and have excluded the only one I know
should not work. Most of the rest will get blocked by non_strict_align and for the
few others I'll adjust the testcase accordingly if there are any issues.

> > But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
> > in a way that uses only internal state to determine the value where STRICT_ALIGNMENT
> > is essentially ignored. e.g. PowerPC and riscv.
> > 
> > The code generation *might* change for them but the tests won't run. I see now way to
> > make the test accurate (as in, runs in all cases where the codegen changed)
> > unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> > 
> > Would this be the way to go?
> 
> I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
> a certain alignment.
> 

Ah, right! that slipped my mind for a bit.

Ok for trunk?

Thanks for the review,
Tamar
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..3b200964462a82ebbe68bbe798cc91ed27337034 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@ Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item word_mode_no_slow_unalign
+Target does not have slow unaligned access when doing word size accesses.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a899991c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
+    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b956022e4efb37044c7a74cc8baa9fb779221a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@
+/* { dg-do-run } */
+/* { dg-require-effective-target word_mode_no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-gdb@prep.ai.mit.edu  */
+
+#include <stdio.h>
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+     struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..03413c323d00e88872879a741ab3c015e052311d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,31 @@ proc check_effective_target_unaligned_stack { } {
     return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access when using word size accesses.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_word_mode_no_slow_unalign { } {
+    global et_word_mode_no_slow_unalign_saved
+    global et_index
+
+    if [info exists et_word_mode_no_slow_unalign_saved($et_index)] {
+        verbose "check_effective_target_word_mode_no_slow_unalign: \
+                 using cached result" 2
+    } else {
+        set et_word_mode_no_slow_unalign_saved($et_index) 0
+        if { [is-effective-target non_strict_align]
+	     && !([istarget arm*-*-*])
+           } {
+            set et_word_mode_no_slow_unalign_saved($et_index) 1
+        }
+    }
+    verbose "check_effective_target_word_mode_no_slow_unalign:\
+             returning $et_word_mode_no_slow_unalign_saved($et_index)" 2
+    return $et_word_mode_no_slow_unalign_saved($et_index)
+}
+
 # Return 1 if the target plus current options does not support a vector
 # alignment mechanism, 0 otherwise.
 #
Richard Biener Nov. 17, 2017, 9:58 a.m. UTC | #9
On Thu, 16 Nov 2017, Tamar Christina wrote:

> Hi Richard,
> 
> > 
> > I'd have made it
> > 
> >   if { ([is-effective-target non_strict_align]
> >         && ! ( [istarget ...] || ....))
> > 
> > thus default it to 1 for non-strict-align targets.
> > 
> 
> Fair, I've switched it to a black list and have excluded the only one I know
> should not work. Most of the rest will get blocked by non_strict_align and for the
> few others I'll adjust the testcase accordingly if there are any issues.
> 
> > > But this also raises a question, some targets have defined SLOW_UNALIGNED_ACCESS
> > > in a way that uses only internal state to determine the value where STRICT_ALIGNMENT
> > > is essentially ignored. e.g. PowerPC and riscv.
> > > 
> > > The code generation *might* change for them but the tests won't run. I see now way to
> > > make the test accurate (as in, runs in all cases where the codegen changed)
> > > unless I expose SLOW_UNALIGNED_ACCESS as a define so I can test for it.
> > > 
> > > Would this be the way to go?
> > 
> > I don't think so.  SLOW_UNALIGNED_ACCESS is per mode and specific to
> > a certain alignment.
> > 
> 
> Ah, right! that slipped my mind for a bit.
> 
> Ok for trunk?

Ok.

Richard.
Alan Modra March 29, 2018, 2:35 p.m. UTC | #10
On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>  
>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>    dst_words = XALLOCAVEC (rtx, n_regs);
> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> +  bitsize = BITS_PER_WORD;
> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>  
>    /* Copy the structure BITSIZE bits at a time.  */
>    for (bitpos = 0, xbitpos = padding_correction;

I believe this patch is wrong.  Please revert.  See the PR84762 testcase.

There are two problems.  Firstly, if padding_correction is non-zero,
then xbitpos % BITS_PER_WORD is non-zero and in

      store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
		       0, 0, word_mode,
		       extract_bit_field (src_word, bitsize,
					  bitpos % BITS_PER_WORD, 1,
					  NULL_RTX, word_mode, word_mode,
					  false, NULL),
		       false);

the stored bit-field exceeds the destination register size.  You could
fix that by making bitsize the gcd of bitsize and padding_correction.

However, that doesn't fix the second problem which is that the
extracted bit-field can exceed the source size.  That will result in
rubbish being read into a register.
Peter Bergner March 30, 2018, 3:35 p.m. UTC | #11
On 3/29/18 9:35 AM, Alan Modra wrote:
> On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>>  
>>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>>    dst_words = XALLOCAVEC (rtx, n_regs);
>> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>> +  bitsize = BITS_PER_WORD;
>> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
>> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>>  
>>    /* Copy the structure BITSIZE bits at a time.  */
>>    for (bitpos = 0, xbitpos = padding_correction;
> 
> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.

I'm seeing this patch cause an error too on different test case:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85123#c2

I've closed PR85123 as a dup of PR84762, even though the test case
in PR84762 is failing for -m32 while the test case in PR85123 is
failing for -m64, since they're both caused by the same bug.

Peter
Peter Bergner March 30, 2018, 4:16 p.m. UTC | #12
On 3/29/18 9:35 AM, Alan Modra wrote:
> On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
>>  
>>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
>>    dst_words = XALLOCAVEC (rtx, n_regs);
>> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>> +  bitsize = BITS_PER_WORD;
>> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
>> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
>>  
>>    /* Copy the structure BITSIZE bits at a time.  */
>>    for (bitpos = 0, xbitpos = padding_correction;
> 
> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> 
> There are two problems.  Firstly, if padding_correction is non-zero,
> then xbitpos % BITS_PER_WORD is non-zero and in
> 
>       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> 		       0, 0, word_mode,
> 		       extract_bit_field (src_word, bitsize,
> 					  bitpos % BITS_PER_WORD, 1,
> 					  NULL_RTX, word_mode, word_mode,
> 					  false, NULL),
> 		       false);
> 
> the stored bit-field exceeds the destination register size.  You could
> fix that by making bitsize the gcd of bitsize and padding_correction.
> 
> However, that doesn't fix the second problem which is that the
> extracted bit-field can exceed the source size.  That will result in
> rubbish being read into a register.

FYI, I received an automated response saying Tamar is away on vacation
with no return date specified.  That means he won't be able to revert
the patch.  What do we do now?

Peter
Richard Biener April 3, 2018, 11:01 a.m. UTC | #13
On Fri, 30 Mar 2018, Peter Bergner wrote:

> On 3/29/18 9:35 AM, Alan Modra wrote:
> > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> >> --- a/gcc/expr.c
> >> +++ b/gcc/expr.c
> >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> >>  
> >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> >>    dst_words = XALLOCAVEC (rtx, n_regs);
> >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> >> +  bitsize = BITS_PER_WORD;
> >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
> >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> >>  
> >>    /* Copy the structure BITSIZE bits at a time.  */
> >>    for (bitpos = 0, xbitpos = padding_correction;
> > 
> > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > 
> > There are two problems.  Firstly, if padding_correction is non-zero,
> > then xbitpos % BITS_PER_WORD is non-zero and in
> > 
> >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > 		       0, 0, word_mode,
> > 		       extract_bit_field (src_word, bitsize,
> > 					  bitpos % BITS_PER_WORD, 1,
> > 					  NULL_RTX, word_mode, word_mode,
> > 					  false, NULL),
> > 		       false);
> > 
> > the stored bit-field exceeds the destination register size.  You could
> > fix that by making bitsize the gcd of bitsize and padding_correction.
> > 
> > However, that doesn't fix the second problem which is that the
> > extracted bit-field can exceed the source size.  That will result in
> > rubbish being read into a register.
> 
> FYI, I received an automated response saying Tamar is away on vacation
> with no return date specified.  That means he won't be able to revert
> the patch.  What do we do now?

The code before the change already looks fishy to me.

  x = expand_normal (src);

so what do we expect this to expand to in general?  Fortunately
it seems there are exactly two callers so hopefully a
gcc_assert (MEM_P (x)) would work?

The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).

In case x is a MEM we should use MEM_ALIGN and if not then we
shouldn't do anything but just use word_mode here.
Alan Modra April 3, 2018, 12:25 p.m. UTC | #14
On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> On Fri, 30 Mar 2018, Peter Bergner wrote:
> 
> > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> > >> --- a/gcc/expr.c
> > >> +++ b/gcc/expr.c
> > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > >>  
> > >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > >>    dst_words = XALLOCAVEC (rtx, n_regs);
> > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >> +  bitsize = BITS_PER_WORD;
> > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
> > >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > >>  
> > >>    /* Copy the structure BITSIZE bits at a time.  */
> > >>    for (bitpos = 0, xbitpos = padding_correction;
> > > 
> > > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > > 
> > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > 
> > >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > 		       0, 0, word_mode,
> > > 		       extract_bit_field (src_word, bitsize,
> > > 					  bitpos % BITS_PER_WORD, 1,
> > > 					  NULL_RTX, word_mode, word_mode,
> > > 					  false, NULL),
> > > 		       false);
> > > 
> > > the stored bit-field exceeds the destination register size.  You could
> > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > 
> > > However, that doesn't fix the second problem which is that the
> > > extracted bit-field can exceed the source size.  That will result in
> > > rubbish being read into a register.
> > 
> > FYI, I received an automated response saying Tamar is away on vacation
> > with no return date specified.  That means he won't be able to revert
> > the patch.  What do we do now?
> 
> The code before the change already looks fishy to me.

Yes, it is fishy so far as the code in the loop relies on alignment
determining a small enough bitsize.  Adding

  if (bytes % UNITS_PER_WORD != 0)
    bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);

after bitsize is calculated from alignment would make the code
correct, I believe.  But I think that will fail the testcase Tamar
added.

>   x = expand_normal (src);
> 
> so what do we expect this to expand to in general?  Fortunately
> it seems there are exactly two callers so hopefully a
> gcc_assert (MEM_P (x)) would work?
> 
> The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> 
> In case x is a MEM we should use MEM_ALIGN and if not then we
> shouldn't do anything but just use word_mode here.

No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
quoted above.
Richard Biener April 3, 2018, 12:30 p.m. UTC | #15
On Tue, 3 Apr 2018, Alan Modra wrote:

> On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > 
> > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> > > >> --- a/gcc/expr.c
> > > >> +++ b/gcc/expr.c
> > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > > >>  
> > > >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > >>    dst_words = XALLOCAVEC (rtx, n_regs);
> > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > >> +  bitsize = BITS_PER_WORD;
> > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
> > > >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > >>  
> > > >>    /* Copy the structure BITSIZE bits at a time.  */
> > > >>    for (bitpos = 0, xbitpos = padding_correction;
> > > > 
> > > > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > > > 
> > > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > > 
> > > >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > 		       0, 0, word_mode,
> > > > 		       extract_bit_field (src_word, bitsize,
> > > > 					  bitpos % BITS_PER_WORD, 1,
> > > > 					  NULL_RTX, word_mode, word_mode,
> > > > 					  false, NULL),
> > > > 		       false);
> > > > 
> > > > the stored bit-field exceeds the destination register size.  You could
> > > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > > 
> > > > However, that doesn't fix the second problem which is that the
> > > > extracted bit-field can exceed the source size.  That will result in
> > > > rubbish being read into a register.
> > > 
> > > FYI, I received an automated response saying Tamar is away on vacation
> > > with no return date specified.  That means he won't be able to revert
> > > the patch.  What do we do now?
> > 
> > The code before the change already looks fishy to me.
> 
> Yes, it is fishy so far as the code in the loop relies on alignment
> determining a small enough bitsize.  Adding
> 
>   if (bytes % UNITS_PER_WORD != 0)
>     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> 
> after bitsize is calculated from alignment would make the code
> correct, I believe.  But I think that will fail the testcase Tamar
> added.
> 
> >   x = expand_normal (src);
> > 
> > so what do we expect this to expand to in general?  Fortunately
> > it seems there are exactly two callers so hopefully a
> > gcc_assert (MEM_P (x)) would work?
> > 
> > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > 
> > In case x is a MEM we should use MEM_ALIGN and if not then we
> > shouldn't do anything but just use word_mode here.
> 
> No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> quoted above.

Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN, otherwise
it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
incorrectly as well?

Richard.
Tamar Christina April 3, 2018, 12:37 p.m. UTC | #16
Hi all,

> I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
>

I'm happy to revert it given the regression and GCC 8 release being imminent,
but looking at the code again it seems to me that the original code may also not
be fully correct? Particularly I'm wondering what happens if you overalign the struct.

> There are two problems.  Firstly, if padding_correction is non-zero,
> then xbitpos % BITS_PER_WORD is non-zero and in
>
>      store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
>                       0, 0, word_mode,
>                       extract_bit_field (src_word, bitsize,
>                                          bitpos % BITS_PER_WORD, 1,
>                                          NULL_RTX, word_mode, word_mode,
>                                          false, NULL),
>                       false);
>
> the stored bit-field exceeds the destination register size.  You could
> fix that by making bitsize the gcd of bitsize and padding_correction.
>
> However, that doesn't fix the second problem which is that the
> extracted bit-field can exceed the source size.  That will result in
> rubbish being read into a register.

Yes, this is because I misunderstood how extract_bit_field works, I had though
that the word_mode was the size of the load and that the bitsize, bitpos is just
used to determine the mask & shift. But it seems to do the load based on bitsize
and word_mode is just the mode you want the results in?

In which case, wouldn't just adjusting bitsize to be the largest size smaller than the number of
bits we want to copy in each loop iteration work? alignment permitted.

Regards,
Tamar
Alan Modra April 3, 2018, 1:16 p.m. UTC | #17
On Tue, Apr 03, 2018 at 02:30:23PM +0200, Richard Biener wrote:
> On Tue, 3 Apr 2018, Alan Modra wrote:
> 
> > On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > > 
> > > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> > > > >> --- a/gcc/expr.c
> > > > >> +++ b/gcc/expr.c
> > > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode mode, tree src)
> > > > >>  
> > > > >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > > >>    dst_words = XALLOCAVEC (rtx, n_regs);
> > > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > > >> +  bitsize = BITS_PER_WORD;
> > > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
> > > > >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
> > > > >>  
> > > > >>    /* Copy the structure BITSIZE bits at a time.  */
> > > > >>    for (bitpos = 0, xbitpos = padding_correction;
> > > > > 
> > > > > I believe this patch is wrong.  Please revert.  See the PR84762 testcase.
> > > > > 
> > > > > There are two problems.  Firstly, if padding_correction is non-zero,
> > > > > then xbitpos % BITS_PER_WORD is non-zero and in
> > > > > 
> > > > >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > > 		       0, 0, word_mode,
> > > > > 		       extract_bit_field (src_word, bitsize,
> > > > > 					  bitpos % BITS_PER_WORD, 1,
> > > > > 					  NULL_RTX, word_mode, word_mode,
> > > > > 					  false, NULL),
> > > > > 		       false);
> > > > > 
> > > > > the stored bit-field exceeds the destination register size.  You could
> > > > > fix that by making bitsize the gcd of bitsize and padding_correction.
> > > > > 
> > > > > However, that doesn't fix the second problem which is that the
> > > > > extracted bit-field can exceed the source size.  That will result in
> > > > > rubbish being read into a register.
> > > > 
> > > > FYI, I received an automated response saying Tamar is away on vacation
> > > > with no return date specified.  That means he won't be able to revert
> > > > the patch.  What do we do now?
> > > 
> > > The code before the change already looks fishy to me.
> > 
> > Yes, it is fishy so far as the code in the loop relies on alignment
> > determining a small enough bitsize.  Adding
> > 
> >   if (bytes % UNITS_PER_WORD != 0)
> >     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> > 
> > after bitsize is calculated from alignment would make the code
> > correct, I believe.  But I think that will fail the testcase Tamar
> > added.
> > 
> > >   x = expand_normal (src);
> > > 
> > > so what do we expect this to expand to in general?  Fortunately
> > > it seems there are exactly two callers so hopefully a
> > > gcc_assert (MEM_P (x)) would work?
> > > 
> > > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > > 
> > > In case x is a MEM we should use MEM_ALIGN and if not then we
> > > shouldn't do anything but just use word_mode here.
> > 
> > No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> > quoted above.
> 
> Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN

Not that either.  We already have a byte count..

> otherwise
> it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
> incorrectly as well?

Yes, that case is mishandled too, even before Tamar's patch.  Peter's
pr85123 testcase with vfloat1 aligned(8) is an example.

If we want correct code, then

  if (bytes % UNITS_PER_WORD != 0)
    bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);

will fix the algorithm inside the copy_blkmode_to_reg loop.  Otherwise
the loop itself needs changes.
Tamar Christina April 4, 2018, 9:06 a.m. UTC | #18
> -----Original Message-----
> From: Alan Modra <amodra@gmail.com>
> Sent: Tuesday, April 3, 2018 14:16
> To: Richard Biener <rguenther@suse.de>
> Cc: Peter Bergner <bergner@vnet.ibm.com>; Tamar Christina
> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> law@redhat.com; ian@airs.com
> Subject: Re: [PATCH][GCC][mid-end] Allow larger copies when target
> supports unaligned access [Patch (1/2)]
> 
> On Tue, Apr 03, 2018 at 02:30:23PM +0200, Richard Biener wrote:
> > On Tue, 3 Apr 2018, Alan Modra wrote:
> >
> > > On Tue, Apr 03, 2018 at 01:01:23PM +0200, Richard Biener wrote:
> > > > On Fri, 30 Mar 2018, Peter Bergner wrote:
> > > >
> > > > > On 3/29/18 9:35 AM, Alan Modra wrote:
> > > > > > On Thu, Nov 16, 2017 at 03:27:01PM +0000, Tamar Christina wrote:
> > > > > >> --- a/gcc/expr.c
> > > > > >> +++ b/gcc/expr.c
> > > > > >> @@ -2769,7 +2769,9 @@ copy_blkmode_to_reg (machine_mode
> mode,
> > > > > >> tree src)
> > > > > >>
> > > > > >>    n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> > > > > >>    dst_words = XALLOCAVEC (rtx, n_regs);
> > > > > >> -  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> BITS_PER_WORD);
> > > > > >> +  bitsize = BITS_PER_WORD;
> > > > > >> +  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN
> (TREE_TYPE (src))))
> > > > > >> +    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)),
> > > > > >> + BITS_PER_WORD);
> > > > > >>
> > > > > >>    /* Copy the structure BITSIZE bits at a time.  */
> > > > > >>    for (bitpos = 0, xbitpos = padding_correction;
> > > > > >
> > > > > > I believe this patch is wrong.  Please revert.  See the PR84762
> testcase.
> > > > > >
> > > > > > There are two problems.  Firstly, if padding_correction is
> > > > > > non-zero, then xbitpos % BITS_PER_WORD is non-zero and in
> > > > > >
> > > > > >       store_bit_field (dst_word, bitsize, xbitpos % BITS_PER_WORD,
> > > > > > 		       0, 0, word_mode,
> > > > > > 		       extract_bit_field (src_word, bitsize,
> > > > > > 					  bitpos % BITS_PER_WORD, 1,
> > > > > > 					  NULL_RTX, word_mode,
> word_mode,
> > > > > > 					  false, NULL),
> > > > > > 		       false);
> > > > > >
> > > > > > the stored bit-field exceeds the destination register size.
> > > > > > You could fix that by making bitsize the gcd of bitsize and
> padding_correction.
> > > > > >
> > > > > > However, that doesn't fix the second problem which is that the
> > > > > > extracted bit-field can exceed the source size.  That will
> > > > > > result in rubbish being read into a register.
> > > > >
> > > > > FYI, I received an automated response saying Tamar is away on
> > > > > vacation with no return date specified.  That means he won't be
> > > > > able to revert the patch.  What do we do now?
> > > >
> > > > The code before the change already looks fishy to me.
> > >
> > > Yes, it is fishy so far as the code in the loop relies on alignment
> > > determining a small enough bitsize.  Adding
> > >
> > >   if (bytes % UNITS_PER_WORD != 0)
> > >     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) *
> > > BITS_PER_UNIT);
> > >
> > > after bitsize is calculated from alignment would make the code
> > > correct, I believe.  But I think that will fail the testcase Tamar
> > > added.
> > >
> > > >   x = expand_normal (src);
> > > >
> > > > so what do we expect this to expand to in general?  Fortunately it
> > > > seems there are exactly two callers so hopefully a gcc_assert
> > > > (MEM_P (x)) would work?
> > > >
> > > > The fishy part is looking at TYPE_ALIGN (TREE_TYPE (src)).
> > > >
> > > > In case x is a MEM we should use MEM_ALIGN and if not then we
> > > > shouldn't do anything but just use word_mode here.
> > >
> > > No!  You can't use a bitsize of BITS_PER_WORD here.  See the code I
> > > quoted above.
> >
> > Ah, so it should possibly use TYPE_SIZE instead of TYPE_ALIGN
> 
> Not that either.  We already have a byte count..
> 
> > otherwise
> > it will handle the over-aligned case (align to 8 bytes, size 4 bytes)
> > incorrectly as well?
> 
> Yes, that case is mishandled too, even before Tamar's patch.  Peter's
> pr85123 testcase with vfloat1 aligned(8) is an example.
> 
> If we want correct code, then
> 
>   if (bytes % UNITS_PER_WORD != 0)
>     bitsize = gcd (bitsize, (bytes % UNITS_PER_WORD) * BITS_PER_UNIT);
> 

If bitsize is still initially calculated based on alignment, unless you over align it's
still just going to do 8 bit copies as it did before, which isn't an improvement at all.

Now that I know how the loads are done, I have a patch should be both correct and generate better code in most cases.
It just calculates bitsize inside the loop and does the copying in the largest mode possible that's equal or less than the bits
That are to be copied. This avoids the issue with reading too much, honors padding and alignment and still generates better code
In most cases.

I'm running the regression tests and should have a final version soon.

> will fix the algorithm inside the copy_blkmode_to_reg loop.  Otherwise the
> loop itself needs changes.
> 
> --
> Alan Modra
> Australia Development Lab, IBM
Peter Bergner April 4, 2018, 12:43 p.m. UTC | #19
On 4/4/18 4:06 AM, Tamar Christina wrote:
> Now that I know how the loads are done, I have a patch should be both correct and generate better code in most cases.
> It just calculates bitsize inside the loop and does the copying in the largest mode possible that's equal or less than the bits
> That are to be copied. This avoids the issue with reading too much, honors padding and alignment and still generates better code
> In most cases.
> 
> I'm running the regression tests and should have a final version soon.

If you give me your patch when it's ready, I'll do bootstrap and regression
testing on powerpc*-linux and verify it fixes the issues we hit.

Peter
Jeff Law April 9, 2018, 10:47 p.m. UTC | #20
On 04/04/2018 06:43 AM, Peter Bergner wrote:
> On 4/4/18 4:06 AM, Tamar Christina wrote:
>> Now that I know how the loads are done, I have a patch should be both correct and generate better code in most cases.
>> It just calculates bitsize inside the loop and does the copying in the largest mode possible that's equal or less than the bits
>> That are to be copied. This avoids the issue with reading too much, honors padding and alignment and still generates better code
>> In most cases.
>>
>> I'm running the regression tests and should have a final version soon.
> 
> If you give me your patch when it's ready, I'll do bootstrap and regression
> testing on powerpc*-linux and verify it fixes the issues we hit.
Similarly, I've got a jenkins instance here were I can get a bootstrap
and regression test on the usual targets like aarch64, armv7, i686,
powerpc (32, 64, 64le), s390x, sparc64, x86_64.  But more interestingly
it'll also do a bootstrap test on alpha, hppa, m68k, sh4 and other
oddballs like aarch64-be.

A patch for the tip of the trunk is all I need.

It doesn't run the testsuite, but the ability to bootstrap on the lesser
used targets gives a level of code generator validation that is helpful.

Takes about 24hrs to cycle through everything...

jeff
diff mbox series

Patch

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 1646d0a99911aa7b2e66762e5907fbb0454ed00d..1df7a82fbd516b9bf07908bb800e441110b28ca4 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2178,8 +2178,12 @@  Target supports @code{wchar_t} that is compatible with @code{char32_t}.
 
 @item comdat_group
 Target uses comdat groups.
+
+@item no_slow_unalign
+Target does not have slow unaligned access.
 @end table
 
+
 @subsubsection Local to tests in @code{gcc.target/i386}
 
 @table @code
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f8432d92ccac17c0a548faf4a16eff0656cef1b..afcea8fef58155d0a899991c10cd485ba8af888d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -2769,7 +2769,9 @@  copy_blkmode_to_reg (machine_mode mode, tree src)
 
   n_regs = (bytes + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
   dst_words = XALLOCAVEC (rtx, n_regs);
-  bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
+  bitsize = BITS_PER_WORD;
+  if (targetm.slow_unaligned_access (word_mode, TYPE_ALIGN (TREE_TYPE (src))))
+    bitsize = MIN (TYPE_ALIGN (TREE_TYPE (src)), BITS_PER_WORD);
 
   /* Copy the structure BITSIZE bits at a time.  */
   for (bitpos = 0, xbitpos = padding_correction;
diff --git a/gcc/testsuite/gcc.dg/struct-simple.c b/gcc/testsuite/gcc.dg/struct-simple.c
new file mode 100644
index 0000000000000000000000000000000000000000..9f218851d897421b217b0926d29845b6192982fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/struct-simple.c
@@ -0,0 +1,52 @@ 
+/* { dg-do-run } */
+/* { dg-require-effective-target no_slow_unalign } */
+/* { dg-additional-options "-fdump-rtl-final" } */
+
+/* Copyright 1996, 1999, 2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   Please email any bugs, comments, and/or additions to this file to:
+   bug-gdb@prep.ai.mit.edu  */
+
+#include <stdio.h>
+
+struct struct3 { char a, b, c; };
+struct struct3 foo3 = { 'A', 'B', 'C'},  L3;
+
+struct struct3  fun3()
+{
+  return foo3;
+}
+
+#ifdef PROTOTYPES
+void Fun3(struct struct3 foo3)
+#else
+void Fun3(foo3)
+     struct struct3 foo3;
+#endif
+{
+  L3 = foo3;
+}
+
+int main()
+{
+  struct struct3 x = fun3();
+
+  printf("a:%c, b:%c, c:%c\n", x.a, x.b, x.c);
+}
+
+/* { dg-final { scan-rtl-dump-not {zero_extract:.+\[\s*foo3\s*\]} "final" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b6f9e51c4817cf8235c8e33b14e2763308eb482a..690d8960002a3c38462bbe5524b419c205ba4da9 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6037,6 +6037,30 @@  proc check_effective_target_unaligned_stack { } {
     return $et_unaligned_stack_saved
 }
 
+# Return 1 if the target plus current options does not have
+# slow unaligned access.
+#
+# This won't change for different subtargets so cache the result.
+
+proc check_effective_target_no_slow_unalign { } {
+    global et_no_slow_unalign_saved
+    global et_index
+
+    if [info exists et_no_slow_unalign_saved($et_index)] {
+        verbose "check_effective_target_no_slow_unalign: using cached result" 2
+    } else {
+        set et_no_slow_unalign_saved($et_index) 0
+        if { [istarget x86_64-*-*]
+             || [istarget aarch64*-*-*]
+           } {
+            set et_no_slow_unalign_saved($et_index) 1
+        }
+    }
+    verbose "check_effective_target_no_slow_unalign:\
+             returning $et_no_slow_unalign_saved($et_index)" 2
+    return $et_no_slow_unalign_saved($et_index)
+}
+
 # Return 1 if the target plus current options does not support a vector
 # alignment mechanism, 0 otherwise.
 #