Message ID | 20170413174129.h6mavi5jweqpzw3k@virgil.suse.cz |
---|---|
State | New |
Headers | show |
On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote: >Hi, > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: >> On Wed, 12 Apr 2017, Martin Jambor wrote: >> >> > Hi, >> > >> > the patch below is an attempt to deal with PR 80293 as >non-invasively >> > as possible. Basically, it switches off total SRA scalarization of >> > any local aggregates which contains an array of elements that have >one >> > byte (or less). >> > >> > The logic behind this is that accessing such arrays element-wise >> > usually results in poor code and that such char arrays are often >used >> > for non-statically-typed content anyway, and we do not want to copy >> > that byte per byte. >> > >> > Alan, do you think this could impact your constant pool >scalarization >> > too severely? >> >> Hmm, isn't one of the issues that we have >> >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) >> { >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) >> <= max_scalarization_size) >> { >> create_total_scalarization_access (var); >> >> which limits the size of scalarizable vars but not the number >> of accesses we create for total scalarization? > >Well, yes, but at least I understood from your comment #4 in the bug >that you did not want to limit the number of replacements for gcc 7? > >> >> Is scalarizable_type_p only used in contexts where we have no hint >> of the actual accesses? > >There are should_scalarize_away_bitmap and >cannot_scalarize_away_bitmap hints. > >Total scalarization is a hack process where we chop up small >aggregates according to their types - as opposed to actual accesses, >meaning it is an alien process to the rest of SRA - knowing that they >will go completely away afterwards (that is ensured by >cannot_scalarize_away_bitmap) and that this will facilitate copy >propagation (this is indicated by should_scalarize_away_bitmap, which >has a bit set if there is a non-scalar assignment _from_ (a part of) >the aggregate). OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want? Or is total scalarization only done for x = y; z = x;? Thus no further accesses to x? >> That is, for the constant pool case we >> usually have >> >> x = .LC0; >> .. = x[2]; >> >> so we have a "hint" that accesses on x are those we'd want to >> optimize to accesses to .LC0. > >You don't need total scalarization for this, just the existence of >read from x[2] would be sufficient but thanks for pointing me to the >right direction. For constant pool decl scalarization, it is not >important to introduce artificial accesses for x but for .LC0. >Therefore, I have adapted the patch to allow char arrays for const >decls only and verified that it scalarizes a const-pool array of chars >on Aarch64. The (otherwise yet untested) result is below. > >What do you think? Why special case char arrays? I'd simply disallow total scalarization of non-const arrays completely. >Martin > > >2017-04-13 Martin Jambor <mjambor@suse.cz> > > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make > char arrays not totally scalarizable if it is false. > (analyze_all_variable_accesses): Pass correct value in the new > parameter. > >testsuite/ > * g++.dg/tree-ssa/pr80293.C: New test. >--- >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 >+++++++++++++++++++++++++++++++++ > gcc/tree-sra.c | 21 ++++++++++----- > 2 files changed, 60 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >new file mode 100644 >index 00000000000..7faf35ae983 >--- /dev/null >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >@@ -0,0 +1,45 @@ >+// { dg-do compile } >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ >+ >+#include <array> >+ >+// Return a copy of the underlying memory of an arbitrary value. >+template < >+ typename T, >+ typename = typename >std::enable_if<std::is_trivially_copyable<T>::value>::type >+> >+auto getMem( >+ T const & value >+) -> std::array<char, sizeof(T)> { >+ auto ret = std::array<char, sizeof(T)>{}; >+ __builtin_memcpy(ret.data(), &value, sizeof(T)); >+ return ret; >+} >+ >+template < >+ typename T, >+ typename = typename >std::enable_if<std::is_trivially_copyable<T>::value>::type >+> >+auto fromMem( >+ std::array<char, sizeof(T)> const & buf >+) -> T { >+ auto ret = T{}; >+ __builtin_memcpy(&ret, buf.data(), sizeof(T)); >+ return ret; >+} >+ >+double foo1(std::uint64_t arg) { >+ return fromMem<double>(getMem(arg)); >+} >+ >+double foo2(std::uint64_t arg) { >+ return *reinterpret_cast<double*>(&arg); >+} >+ >+double foo3(std::uint64_t arg) { >+ double ret; >+ __builtin_memcpy(&ret, &arg, sizeof(arg)); >+ return ret; >+} >+ >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >index 02453d3ed9a..ab06be66131 100644 >--- a/gcc/tree-sra.c >+++ b/gcc/tree-sra.c >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool >write) > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or >fixed-length >ARRAY_TYPE with fields that are either of gimple register types >(excluding >- bit-fields) or (recursively) scalarizable types. */ >+ bit-fields) or (recursively) scalarizable types. CONST_DECL must >be true if >+ we are considering a decl from constant pool. If it is false, char >arrays >+ will be refused. */ > > static bool >-scalarizable_type_p (tree type) >+scalarizable_type_p (tree type, bool const_decl) > { > gcc_assert (!is_gimple_reg_type (type)); > if (type_contains_placeholder_p (type)) >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type) > return false; > > if (!is_gimple_reg_type (ft) >- && !scalarizable_type_p (ft)) >+ && !scalarizable_type_p (ft, const_decl)) > return false; > } > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type) > > case ARRAY_TYPE: > { >+ HOST_WIDE_INT min_elem_size; >+ if (const_decl) >+ min_elem_size = 0; >+ else >+ min_elem_size = BITS_PER_UNIT; >+ > if (TYPE_DOMAIN (type) == NULL_TREE > || !tree_fits_shwi_p (TYPE_SIZE (type)) > || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) >- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) >+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) > || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > return false; > if (tree_to_shwi (TYPE_SIZE (type)) == 0 >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) > > tree elem = TREE_TYPE (type); > if (!is_gimple_reg_type (elem) >- && !scalarizable_type_p (elem)) >+ && !scalarizable_type_p (elem, const_decl)) > return false; > return true; > } >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) > { > tree var = candidate (i); > >- if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) >+ if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), >+ constant_decl_p (var))) > { > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > <= max_scalarization_size)
Hi, On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote: > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote: > >Hi, > > > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: > >> On Wed, 12 Apr 2017, Martin Jambor wrote: > >> > >> > Hi, > >> > > >> > the patch below is an attempt to deal with PR 80293 as > >non-invasively > >> > as possible. Basically, it switches off total SRA scalarization of > >> > any local aggregates which contains an array of elements that have > >one > >> > byte (or less). > >> > > >> > The logic behind this is that accessing such arrays element-wise > >> > usually results in poor code and that such char arrays are often > >used > >> > for non-statically-typed content anyway, and we do not want to copy > >> > that byte per byte. > >> > > >> > Alan, do you think this could impact your constant pool > >scalarization > >> > too severely? > >> > >> Hmm, isn't one of the issues that we have > >> > >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > >> { > >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > >> <= max_scalarization_size) > >> { > >> create_total_scalarization_access (var); > >> > >> which limits the size of scalarizable vars but not the number > >> of accesses we create for total scalarization? > > > >Well, yes, but at least I understood from your comment #4 in the bug > >that you did not want to limit the number of replacements for gcc 7? > > > >> > >> Is scalarizable_type_p only used in contexts where we have no hint > >> of the actual accesses? > > > >There are should_scalarize_away_bitmap and > >cannot_scalarize_away_bitmap hints. > > > >Total scalarization is a hack process where we chop up small > >aggregates according to their types - as opposed to actual accesses, > >meaning it is an alien process to the rest of SRA - knowing that they > >will go completely away afterwards (that is ensured by > >cannot_scalarize_away_bitmap) and that this will facilitate copy > >propagation (this is indicated by should_scalarize_away_bitmap, which > >has a bit set if there is a non-scalar assignment _from_ (a part of) > >the aggregate). > > OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want? Or is total scalarization only done for x = y; z = x;? > Thus no further accesses to x? Total scalarization adds artificial accesses only to y, but (in both cases of total and "natural" scalarization) for all aggregate assignments between SRA candidates, SRA attempts to recreate all accesses covering RHS to LHS. Transitively. So the artificial accesses created for y will then get created for x and z even if they would not be candidates for total scalarization. So e.g. if z cannot go away because it is passed to a function, it will be loaded piece-wise from y. > > >> That is, for the constant pool case we > >> usually have > >> > >> x = .LC0; > >> .. = x[2]; > >> > >> so we have a "hint" that accesses on x are those we'd want to > >> optimize to accesses to .LC0. > > > >You don't need total scalarization for this, just the existence of > >read from x[2] would be sufficient but thanks for pointing me to the > >right direction. For constant pool decl scalarization, it is not > >important to introduce artificial accesses for x but for .LC0. > >Therefore, I have adapted the patch to allow char arrays for const > >decls only and verified that it scalarizes a const-pool array of chars > >on Aarch64. The (otherwise yet untested) result is below. > > > >What do you think? > > Why special case char arrays? I'd simply disallow total scalarization of non-const arrays completely. Well, currently we will get element-wise copy propagation for things like "int rgb[3];" (possibly embeded in a small struct). If I remove it, someone will complain. Maybe the correct limit is SI mode size or BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though. I just wanted to be conservative at this point in the release cycle. Martin > > >Martin > > > > > >2017-04-13 Martin Jambor <mjambor@suse.cz> > > > > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make > > char arrays not totally scalarizable if it is false. > > (analyze_all_variable_accesses): Pass correct value in the new > > parameter. > > > >testsuite/ > > * g++.dg/tree-ssa/pr80293.C: New test. > >--- > >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 > >+++++++++++++++++++++++++++++++++ > > gcc/tree-sra.c | 21 ++++++++++----- > > 2 files changed, 60 insertions(+), 6 deletions(-) > > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >new file mode 100644 > >index 00000000000..7faf35ae983 > >--- /dev/null > >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >@@ -0,0 +1,45 @@ > >+// { dg-do compile } > >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ > >+ > >+#include <array> > >+ > >+// Return a copy of the underlying memory of an arbitrary value. > >+template < > >+ typename T, > >+ typename = typename > >std::enable_if<std::is_trivially_copyable<T>::value>::type > >+> > >+auto getMem( > >+ T const & value > >+) -> std::array<char, sizeof(T)> { > >+ auto ret = std::array<char, sizeof(T)>{}; > >+ __builtin_memcpy(ret.data(), &value, sizeof(T)); > >+ return ret; > >+} > >+ > >+template < > >+ typename T, > >+ typename = typename > >std::enable_if<std::is_trivially_copyable<T>::value>::type > >+> > >+auto fromMem( > >+ std::array<char, sizeof(T)> const & buf > >+) -> T { > >+ auto ret = T{}; > >+ __builtin_memcpy(&ret, buf.data(), sizeof(T)); > >+ return ret; > >+} > >+ > >+double foo1(std::uint64_t arg) { > >+ return fromMem<double>(getMem(arg)); > >+} > >+ > >+double foo2(std::uint64_t arg) { > >+ return *reinterpret_cast<double*>(&arg); > >+} > >+ > >+double foo3(std::uint64_t arg) { > >+ double ret; > >+ __builtin_memcpy(&ret, &arg, sizeof(arg)); > >+ return ret; > >+} > >+ > >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > >index 02453d3ed9a..ab06be66131 100644 > >--- a/gcc/tree-sra.c > >+++ b/gcc/tree-sra.c > >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool > >write) > > > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or > >fixed-length > >ARRAY_TYPE with fields that are either of gimple register types > >(excluding > >- bit-fields) or (recursively) scalarizable types. */ > >+ bit-fields) or (recursively) scalarizable types. CONST_DECL must > >be true if > >+ we are considering a decl from constant pool. If it is false, char > >arrays > >+ will be refused. */ > > > > static bool > >-scalarizable_type_p (tree type) > >+scalarizable_type_p (tree type, bool const_decl) > > { > > gcc_assert (!is_gimple_reg_type (type)); > > if (type_contains_placeholder_p (type)) > >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type) > > return false; > > > > if (!is_gimple_reg_type (ft) > >- && !scalarizable_type_p (ft)) > >+ && !scalarizable_type_p (ft, const_decl)) > > return false; > > } > > > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type) > > > > case ARRAY_TYPE: > > { > >+ HOST_WIDE_INT min_elem_size; > >+ if (const_decl) > >+ min_elem_size = 0; > >+ else > >+ min_elem_size = BITS_PER_UNIT; > >+ > > if (TYPE_DOMAIN (type) == NULL_TREE > > || !tree_fits_shwi_p (TYPE_SIZE (type)) > > || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) > >- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) > >+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) > > || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > > return false; > > if (tree_to_shwi (TYPE_SIZE (type)) == 0 > >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) > > > > tree elem = TREE_TYPE (type); > > if (!is_gimple_reg_type (elem) > >- && !scalarizable_type_p (elem)) > >+ && !scalarizable_type_p (elem, const_decl)) > > return false; > > return true; > > } > >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) > > { > > tree var = candidate (i); > > > >- if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > >+ if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > >+ constant_decl_p (var))) > > { > > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > > <= max_scalarization_size) >
On Mon, 17 Apr 2017, Martin Jambor wrote: > Hi, > > On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote: > > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote: > > >Hi, > > > > > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: > > >> On Wed, 12 Apr 2017, Martin Jambor wrote: > > >> > > >> > Hi, > > >> > > > >> > the patch below is an attempt to deal with PR 80293 as > > >non-invasively > > >> > as possible. Basically, it switches off total SRA scalarization of > > >> > any local aggregates which contains an array of elements that have > > >one > > >> > byte (or less). > > >> > > > >> > The logic behind this is that accessing such arrays element-wise > > >> > usually results in poor code and that such char arrays are often > > >used > > >> > for non-statically-typed content anyway, and we do not want to copy > > >> > that byte per byte. > > >> > > > >> > Alan, do you think this could impact your constant pool > > >scalarization > > >> > too severely? > > >> > > >> Hmm, isn't one of the issues that we have > > >> > > >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > > >> { > > >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > > >> <= max_scalarization_size) > > >> { > > >> create_total_scalarization_access (var); > > >> > > >> which limits the size of scalarizable vars but not the number > > >> of accesses we create for total scalarization? > > > > > >Well, yes, but at least I understood from your comment #4 in the bug > > >that you did not want to limit the number of replacements for gcc 7? > > > > > >> > > >> Is scalarizable_type_p only used in contexts where we have no hint > > >> of the actual accesses? > > > > > >There are should_scalarize_away_bitmap and > > >cannot_scalarize_away_bitmap hints. > > > > > >Total scalarization is a hack process where we chop up small > > >aggregates according to their types - as opposed to actual accesses, > > >meaning it is an alien process to the rest of SRA - knowing that they > > >will go completely away afterwards (that is ensured by > > >cannot_scalarize_away_bitmap) and that this will facilitate copy > > >propagation (this is indicated by should_scalarize_away_bitmap, which > > >has a bit set if there is a non-scalar assignment _from_ (a part of) > > >the aggregate). > > > > OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want? Or is total scalarization only done for x = y; z = x;? > > Thus no further accesses to x? > > Total scalarization adds artificial accesses only to y, but > (in both cases of total and "natural" scalarization) for all aggregate > assignments between SRA candidates, SRA attempts to recreate all > accesses covering RHS to LHS. Transitively. So the artificial > accesses created for y will then get created for x and z even if they > would not be candidates for total scalarization. So e.g. if z cannot > go away because it is passed to a function, it will be loaded > piece-wise from y. So what I was trying to figure out is whether there is anything that fores us to totally scalarize using "natural" elements rather than using larger accesses? For x = y; z = x; the best pieces to chop x to depends on the write accesses to y and the read accesses from z (we eventually want to easily match them up for CSE). I see we first create total scalarization accesses and only then doing propagate_all_subaccesses. > > > > >> That is, for the constant pool case we > > >> usually have > > >> > > >> x = .LC0; > > >> .. = x[2]; > > >> > > >> so we have a "hint" that accesses on x are those we'd want to > > >> optimize to accesses to .LC0. > > > > > >You don't need total scalarization for this, just the existence of > > >read from x[2] would be sufficient but thanks for pointing me to the > > >right direction. For constant pool decl scalarization, it is not > > >important to introduce artificial accesses for x but for .LC0. > > >Therefore, I have adapted the patch to allow char arrays for const > > >decls only and verified that it scalarizes a const-pool array of chars > > >on Aarch64. The (otherwise yet untested) result is below. > > > > > >What do you think? > > > > Why special case char arrays? I'd simply disallow total scalarization of non-const arrays completely. > > Well, currently we will get element-wise copy propagation for things > like "int rgb[3];" (possibly embeded in a small struct). If I remove > it, someone will complain. Maybe the correct limit is SI mode size or > BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though. > I just wanted to be conservative at this point in the release cycle. Sure but this total scalarization was added to be able to defer ctor "lowering" at gimplification time to SRA. So it would be ok (IMHO) to limit it to constant initializers. But agreed for to be conservative at this point. I'd say we do this once stage1 opens on trunk and then backport for 7.2. Maybe there's input from Alan as well then. Thanks, Richard. > Martin > > > > > >Martin > > > > > > > > >2017-04-13 Martin Jambor <mjambor@suse.cz> > > > > > > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make > > > char arrays not totally scalarizable if it is false. > > > (analyze_all_variable_accesses): Pass correct value in the new > > > parameter. > > > > > >testsuite/ > > > * g++.dg/tree-ssa/pr80293.C: New test. > > >--- > > >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 > > >+++++++++++++++++++++++++++++++++ > > > gcc/tree-sra.c | 21 ++++++++++----- > > > 2 files changed, 60 insertions(+), 6 deletions(-) > > > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > > > > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > >new file mode 100644 > > >index 00000000000..7faf35ae983 > > >--- /dev/null > > >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C > > >@@ -0,0 +1,45 @@ > > >+// { dg-do compile } > > >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ > > >+ > > >+#include <array> > > >+ > > >+// Return a copy of the underlying memory of an arbitrary value. > > >+template < > > >+ typename T, > > >+ typename = typename > > >std::enable_if<std::is_trivially_copyable<T>::value>::type > > >+> > > >+auto getMem( > > >+ T const & value > > >+) -> std::array<char, sizeof(T)> { > > >+ auto ret = std::array<char, sizeof(T)>{}; > > >+ __builtin_memcpy(ret.data(), &value, sizeof(T)); > > >+ return ret; > > >+} > > >+ > > >+template < > > >+ typename T, > > >+ typename = typename > > >std::enable_if<std::is_trivially_copyable<T>::value>::type > > >+> > > >+auto fromMem( > > >+ std::array<char, sizeof(T)> const & buf > > >+) -> T { > > >+ auto ret = T{}; > > >+ __builtin_memcpy(&ret, buf.data(), sizeof(T)); > > >+ return ret; > > >+} > > >+ > > >+double foo1(std::uint64_t arg) { > > >+ return fromMem<double>(getMem(arg)); > > >+} > > >+ > > >+double foo2(std::uint64_t arg) { > > >+ return *reinterpret_cast<double*>(&arg); > > >+} > > >+ > > >+double foo3(std::uint64_t arg) { > > >+ double ret; > > >+ __builtin_memcpy(&ret, &arg, sizeof(arg)); > > >+ return ret; > > >+} > > >+ > > >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } > > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > > >index 02453d3ed9a..ab06be66131 100644 > > >--- a/gcc/tree-sra.c > > >+++ b/gcc/tree-sra.c > > >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool > > >write) > > > > > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or > > >fixed-length > > >ARRAY_TYPE with fields that are either of gimple register types > > >(excluding > > >- bit-fields) or (recursively) scalarizable types. */ > > >+ bit-fields) or (recursively) scalarizable types. CONST_DECL must > > >be true if > > >+ we are considering a decl from constant pool. If it is false, char > > >arrays > > >+ will be refused. */ > > > > > > static bool > > >-scalarizable_type_p (tree type) > > >+scalarizable_type_p (tree type, bool const_decl) > > > { > > > gcc_assert (!is_gimple_reg_type (type)); > > > if (type_contains_placeholder_p (type)) > > >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type) > > > return false; > > > > > > if (!is_gimple_reg_type (ft) > > >- && !scalarizable_type_p (ft)) > > >+ && !scalarizable_type_p (ft, const_decl)) > > > return false; > > > } > > > > > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type) > > > > > > case ARRAY_TYPE: > > > { > > >+ HOST_WIDE_INT min_elem_size; > > >+ if (const_decl) > > >+ min_elem_size = 0; > > >+ else > > >+ min_elem_size = BITS_PER_UNIT; > > >+ > > > if (TYPE_DOMAIN (type) == NULL_TREE > > > || !tree_fits_shwi_p (TYPE_SIZE (type)) > > > || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) > > >- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) > > >+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) > > > || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > > > return false; > > > if (tree_to_shwi (TYPE_SIZE (type)) == 0 > > >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) > > > > > > tree elem = TREE_TYPE (type); > > > if (!is_gimple_reg_type (elem) > > >- && !scalarizable_type_p (elem)) > > >+ && !scalarizable_type_p (elem, const_decl)) > > > return false; > > > return true; > > > } > > >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) > > > { > > > tree var = candidate (i); > > > > > >- if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > > >+ if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > > >+ constant_decl_p (var))) > > > { > > > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > > > <= max_scalarization_size) > > > >
Hi, On Mon, Apr 17, 2017 at 09:35:18PM +0200, Martin Jambor wrote: > On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote: > > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote: ... > > >Therefore, I have adapted the patch to allow char arrays for const > > >decls only and verified that it scalarizes a const-pool array of chars > > >on Aarch64. The (otherwise yet untested) result is below. > > > > > >What do you think? > > > > Why special case char arrays? I'd simply disallow total scalarization of non-const arrays completely. > > Well, currently we will get element-wise copy propagation for things > like "int rgb[3];" (possibly embeded in a small struct). If I remove > it, someone will complain. Maybe the correct limit is SI mode size or > BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though. > I just wanted to be conservative at this point in the release cycle. > To gather some data for this, I have looked at SRA statistics gathered for SPEC 2006 benchmarks. Complete tables are below, let me comment on the diffs, though. The most interesting column is probably the third one (the second one with numbers). This is the difference between my patch that special-cases char arrays and disallowing any non-const-pool array total scalarization: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| -| 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | -| 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | +| 403.gcc - esra | 132 | 4 | 370 | 1274 | 13 | +| 403.gcc - sra | 43 | 8 | 140 | 459 | 12 | -| 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | -| 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | +| 447.dealII - esra | 12897 | 7790 | 16329 | 29046 | 1838 | +| 447.dealII - sra | 1079 | 553 | 2262 | 5479 | 388 | -| 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | +| 450.soplex - sra | 42 | 0 | 81 | 263 | 2 | -| 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | -| 453.povray - sra | 76 | 6 | 217 | 786 | 57 | +| 453.povray - esra | 264 | 8 | 806 | 3422 | 11 | +| 453.povray - sra | 76 | 5 | 215 | 786 | 57 | -| 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | +| 465.tonto - esra | 4024 | 0 | 8233 | 21351 | 0 | For reference, this is the difference between (2 week old) trunk and my proposed patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| -| 401.bzip2 - sra | 3 | 2 | 22 | 62 | 0 | +| 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | -| 403.gcc - esra | 138 | 13 | 458 | 1274 | 20 | -| 403.gcc - sra | 42 | 12 | 141 | 457 | 13 | +| 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | +| 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | -| 447.dealII - esra | 12899 | 7830 | 16387 | 29046 | 1840 | +| 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | -| 453.povray - esra | 266 | 11 | 822 | 3422 | 13 | +| 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | -| 454.calculix - sra | 17 | 2 | 81 | 305 | 5 | +| 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | -| 465.tonto - sra | 98 | 36 | 4652 | 374 | 37 | +| 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | -| 483.xalancbmk - esra | 17844 | 10648 | 19824 | 39699 | 1404 | +| 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | The following are the non-char non-const-pool arrays we totally scalarize now in individual benchmarks with some notes: GCC - struct output_state in diagnostic.c contains "int diagnostic_count[6]". I briefly inspected of a few affected functions, in all we did perform copy propagation and got rid of a useless aggregate intermediary. But that copy propagation was done integer-wise which is only 32bit. - ereal_negate in real.c has local "short unsigned int[6]" this actually looks harmful and that the size limit should not allow shorts either. ereal_ldexp looks like having the same thing (but is much longer). - struct realvaluetype in in build_real_from_int_cst in has int[3] in it, resulting in a copy propagation with PHI nodes. dealII - struct TableIndices is basically encapsulated int[2]. Total scalarization facilitates constant propagation of zeros quite a few times. I have not checked the optimized dump to see if ti matters though. - struct _ValueType is int[8[ followed by a single char field. Scalarization also only facilitates propagation of a zero initializer... using 32bit stores. struct CellData is the same thing. soplex - UnitVector of four pointers. Total scalarization does not change what normal SRA does. povray - struct BCYL_INT has two double[2] arrays. Facilitates quite a lot of zero initializer propagation of the first array and propagation of undefinedness of the second array. - struct BBOX is similar in structure but total scalarization is probably unnecessary. - TempPixelD.26340 contains float[5] and seems to facilitate nice copy propagation. tonto - struct array1_integer has a one element array of three long integers called stride, lbound and ubound. Total scalarization facilitates simple copy propagation of the form local = *p1; *p2 = local; - ditto for struct array2_real which has descriptors of two dimensions. - only 5 cases overall but in ipa-sra so they might be inlined to lots of places. So, I still think we should keep total scalarization of arrays with WORD-sized elements and perhaps of int-sized elements too. Martin All data: No patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 2 | 22 | 62 | 0 | | 403.gcc - esra | 138 | 13 | 458 | 1274 | 20 | | 403.gcc - sra | 42 | 12 | 141 | 457 | 13 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12899 | 7830 | 16387 | 29046 | 1840 | | 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | | 453.povray - esra | 266 | 11 | 822 | 3422 | 13 | | 453.povray - sra | 76 | 6 | 217 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 17 | 2 | 81 | 305 | 5 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | | 465.tonto - sra | 98 | 36 | 4652 | 374 | 37 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10648 | 19824 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 | With my patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | | 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | | 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | | 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | | 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | | 453.povray - sra | 76 | 6 | 217 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | | 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 | Disallowing any non-constant-pool arrays: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | | 403.gcc - esra | 132 | 4 | 370 | 1274 | 13 | | 403.gcc - sra | 43 | 8 | 140 | 459 | 12 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12897 | 7790 | 16329 | 29046 | 1838 | | 447.dealII - sra | 1079 | 553 | 2262 | 5479 | 388 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 0 | 81 | 263 | 2 | | 453.povray - esra | 264 | 8 | 806 | 3422 | 11 | | 453.povray - sra | 76 | 5 | 215 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4024 | 0 | 8233 | 21351 | 0 | | 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 |
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C new file mode 100644 index 00000000000..7faf35ae983 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C @@ -0,0 +1,45 @@ +// { dg-do compile } +// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ + +#include <array> + +// Return a copy of the underlying memory of an arbitrary value. +template < + typename T, + typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type +> +auto getMem( + T const & value +) -> std::array<char, sizeof(T)> { + auto ret = std::array<char, sizeof(T)>{}; + __builtin_memcpy(ret.data(), &value, sizeof(T)); + return ret; +} + +template < + typename T, + typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type +> +auto fromMem( + std::array<char, sizeof(T)> const & buf +) -> T { + auto ret = T{}; + __builtin_memcpy(&ret, buf.data(), sizeof(T)); + return ret; +} + +double foo1(std::uint64_t arg) { + return fromMem<double>(getMem(arg)); +} + +double foo2(std::uint64_t arg) { + return *reinterpret_cast<double*>(&arg); +} + +double foo3(std::uint64_t arg) { + double ret; + __builtin_memcpy(&ret, &arg, sizeof(arg)); + return ret; +} + +// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 02453d3ed9a..ab06be66131 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool write) /* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length ARRAY_TYPE with fields that are either of gimple register types (excluding - bit-fields) or (recursively) scalarizable types. */ + bit-fields) or (recursively) scalarizable types. CONST_DECL must be true if + we are considering a decl from constant pool. If it is false, char arrays + will be refused. */ static bool -scalarizable_type_p (tree type) +scalarizable_type_p (tree type, bool const_decl) { gcc_assert (!is_gimple_reg_type (type)); if (type_contains_placeholder_p (type)) @@ -970,7 +972,7 @@ scalarizable_type_p (tree type) return false; if (!is_gimple_reg_type (ft) - && !scalarizable_type_p (ft)) + && !scalarizable_type_p (ft, const_decl)) return false; } @@ -978,10 +980,16 @@ scalarizable_type_p (tree type) case ARRAY_TYPE: { + HOST_WIDE_INT min_elem_size; + if (const_decl) + min_elem_size = 0; + else + min_elem_size = BITS_PER_UNIT; + if (TYPE_DOMAIN (type) == NULL_TREE || !tree_fits_shwi_p (TYPE_SIZE (type)) || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) - || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) + || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) return false; if (tree_to_shwi (TYPE_SIZE (type)) == 0 @@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) tree elem = TREE_TYPE (type); if (!is_gimple_reg_type (elem) - && !scalarizable_type_p (elem)) + && !scalarizable_type_p (elem, const_decl)) return false; return true; } @@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) { tree var = candidate (i); - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) + if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), + constant_decl_p (var))) { if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) <= max_scalarization_size)