Message ID | ri6wo9ca0p7.fsf@suse.cz |
---|---|
State | New |
Headers | show |
Series | [1/4] SRA: Add verification of accesses | expand |
On Tue, 28 Jan 2020, Martin Jambor wrote: > Hi, > > PR 92486 shows that DSE, when seeing a "normal" gimple aggregate > assignment coming from a C struct assignment and one a representing a > folded memcpy, can kill the latter and keep in place only the former, > which does not copy padding - at least when SRA decides to totally > scalarize a least one of the aggregates (when not doing total > scalarization, SRA cares about padding) > > SRA would not totally scalarize an aggregate if it saw that it takes > part in a gimple assignment which is a folded memcpy (see how > type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't > because of the DSE decisions. > > I was asked to modify SRA to take padding into account - and to copy > it around - when totally scalarizing, which is what the patch below > does. I am not very happy about this, I am afraid it will lead to > performance regressions, but this has all been discussed (see > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and > https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html). > > I tried to alleviate the problem by not only inserting accesses for > padding but also by enlarging existing accesses whenever possible to > extend over padding - the extended access would get copied when in the > original IL an aggregate copy is replaced with SRA copies and a > BIT_FIELD_REF would be generated to replace a scalar access to a part > of the aggregate in the original IL. I have made it work in the sense > that the patch passed bootstrap and testing (see the git branch > refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127 > if you are interested), but this approach meant that each such > extended replacement which was written to (so all of them) could > potentially become only partially assigned to and so had to be marked > as addressable and could not become gimple register, meaning that > total scalarizatio would be creating addressable variables. Detecting > such cases is not easy, it would mean introducing yet another type of > write flag (written to exactly this access) and propagate that flag > across assignment sub-accesses. > > So I decided that was not the way to go and instead only extended > integer accesses, and that is what the atcg below does. Like in the > previous attempt, whatever padding could not be covered by extending > an access would be covered by extra artificial accesses. As you can > see, it adds a little complexity to various places of the pass which > are already not trivial, but hopefully it is manageable. > > Bootstrapped and tested on x86_64-linux, I'll curious about the > feedback. +static void +upgrade_integral_size_to_prec (struct access *access) +{ missing function comment. I'm almost sure that get_reg_access_replacement gets things wrong on non-litte-endian platforms. The function comment says something about bitfield refs but you simply create truncations. sra_type_for_size shouldn't use lang_hooks.types.type_for_size but build_nonstandard_integer_type. In total_scalarization_fill_padding avoid the loop figuring out the larger type and just up it to MIN (lastpos alignment, word-mode-size)? There's PR71460 and a more recent one on total scalarization creating FP accesses which is probably a bug we'd want to address first (and which might simplify this patch?). Overall the approach in the patch looks reasonable and code-generation even improves in some cases I tried. As said, I wonder how this integrates with a fix to avoid using FP types for total scalarization (or any access decomposition that was a non-FPmode before). Thanks, Richard. > Thanks, > > Martin > > > 2020-01-27 Martin Jambor <mjambor@suse.cz> > > PR tree-optimization/92486 > * tree-sra.c: Include langhooks.h > (struct access): New fields reg_size and reg_acc_type. > (dump_access): Print new fields. > (acc_size): New function. > (find_access_in_subtree): Use it, new parameter reg. > (get_var_base_offset_size_access): Pass true to > find_access_in_subtree. > (create_access_1): Initialize reg_size. > (create_artificial_child_access): Likewise. > (create_total_scalarization_access): Likewise. > (build_ref_for_model): Do not use model expr if reg_acc_size is > non-NULL. > (get_reg_access_replacement): New function. > (verify_sra_access_forest): Adjust verification for presence of > extended accesses covering padding. > (analyze_access_subtree): Undo extension over padding if total > scalarization failed, set grp_partial_lhs if we are going to introduce > a partial store to the new replacement, do not ignore holes when > totally scalarizing. > (sra_type_for_size): New function. > (total_scalarization_fill_padding): Likewise. > (total_should_skip_creating_access): Use it. > (totally_scalarize_subtree): Likewise. > (sra_modify_expr): Use get_reg_access_replacement instead of > get_access_replacement, adjust for reg_acc_type. > (sra_modify_assign): Likewise. > (load_assign_lhs_subreplacements): Pass false to > find_access_in_subtree. > > testsuite/ > * gcc.dg/tree-ssa/pr92486.c: New test. > --- > gcc/ChangeLog | 32 +++ > gcc/testsuite/ChangeLog | 5 + > gcc/testsuite/gcc.dg/tree-ssa/pr92486.c | 38 +++ > gcc/tree-sra.c | 368 +++++++++++++++++++++--- > 4 files changed, 396 insertions(+), 47 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3541c6638f9..34c60e6f2a3 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,35 @@ > +2020-01-26 Martin Jambor <mjambor@suse.cz> > + > + PR tree-optimization/92486 > + * tree-sra.c: Include langhooks.h > + (struct access): New fields reg_size and reg_acc_type. > + (dump_access): Print new fields. > + (acc_size): New function. > + (find_access_in_subtree): Use it, new parameter reg. > + (get_var_base_offset_size_access): Pass true to > + find_access_in_subtree. > + (create_access_1): Initialize reg_size. > + (create_artificial_child_access): Likewise. > + (create_total_scalarization_access): Likewise. > + (build_ref_for_model): Do not use model expr if reg_acc_size is > + non-NULL. > + (get_reg_access_replacement): New function. > + (verify_sra_access_forest): Adjust verification for presence of > + extended accesses covering padding. > + (analyze_access_subtree): Undo extension over padding if total > + scalarization failed, set grp_partial_lhs if we are going to introduce > + a partial store to the new replacement, do not ignore holes when > + totally scalarizing. > + (sra_type_for_size): New function. > + (total_scalarization_fill_padding): Likewise. > + (total_should_skip_creating_access): Use it. > + (totally_scalarize_subtree): Likewise. > + (sra_modify_expr): Use get_reg_access_replacement instead of > + get_access_replacement, adjust for reg_acc_type. > + (sra_modify_assign): Likewise. > + (load_assign_lhs_subreplacements): Pass false to > + find_access_in_subtree. > + > 2020-01-27 Martin Liska <mliska@suse.cz> > > PR driver/91220 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 22a37dd1ab2..7ccb5098224 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2020-01-24 Martin Jambor <mjambor@suse.cz> > + > + PR tree-optimization/92486 > + * gcc.dg/tree-ssa/pr92486.c: New test. > + > 2020-01-27 Martin Liska <mliska@suse.cz> > > PR target/93274 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c > new file mode 100644 > index 00000000000..77e84241eff > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c > @@ -0,0 +1,38 @@ > +/* { dg-do run } */ > +/* { dg-options "-O1" } */ > + > +struct s { > + char c; > + int i; > +}; > + > +__attribute__((noipa)) > +void f(struct s *p, struct s *q) > +{ > + struct s w; > + > + __builtin_memset(&w, 0, sizeof(struct s)); > + w = *q; > + > + __builtin_memset(p, 0, sizeof(struct s)); > + *p = w; > +} > + > +int main() > +{ > + struct s x; > + __builtin_memset(&x, 1, sizeof(struct s)); > + > + struct s y; > + __builtin_memset(&y, 2, sizeof(struct s)); > + > + f(&y, &x); > + > + for (unsigned char *p = (unsigned char *)&y; > + p < (unsigned char *)&y + sizeof(struct s); > + p++) > + if (*p != 1) > + __builtin_abort (); > + > + return 0; > +} > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index ea8594db193..bda342ffdf9 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see > #include "dbgcnt.h" > #include "builtins.h" > #include "tree-sra.h" > +#include "langhooks.h" > > > /* Enumeration of all aggregate reductions we can do. */ > @@ -130,11 +131,16 @@ struct assign_link; > > struct access > { > - /* Values returned by `get_ref_base_and_extent' for each component reference > - If EXPR isn't a component reference just set `BASE = EXPR', `OFFSET = 0', > - `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ > + /* Offset, size and base are values returned by `get_ref_base_and_extent' for > + each component reference If EXPR isn't a component reference just set > + `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ > HOST_WIDE_INT offset; > HOST_WIDE_INT size; > + > + /* If reg_acc_type is non-NULL, this is the size of the actual egister type > + the access represented before it was extended to cover padding. Otherwise > + must be equal to size. */ > + HOST_WIDE_INT reg_size; > tree base; > > /* Expression. It is context dependent so do not use it to create new > @@ -144,6 +150,12 @@ struct access > /* Type. */ > tree type; > > + /* If non-NULL, this is the type that should be actually extracted or > + inserted from/to the replacement decl when replacing accesses to the > + individual field itself (as opposed to accesses created as part of > + replacing aggregate copies which should use TYPE). */ > + tree reg_acc_type; > + > /* The statement this access belongs to. */ > gimple *stmt; > > @@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp) > print_generic_expr (f, access->base); > fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset); > fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size); > + if (access->reg_size != access->size) > + fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size); > fprintf (f, ", expr = "); > print_generic_expr (f, access->expr); > fprintf (f, ", type = "); > print_generic_expr (f, access->type); > + if (access->reg_acc_type) > + { > + fprintf (f, ", reg_acc_type = "); > + print_generic_expr (f, access->type); > + } > fprintf (f, ", reverse = %d", access->reverse); > if (grp) > fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, " > @@ -481,18 +500,27 @@ get_base_access_vector (tree base) > return base_access_vec->get (base); > } > > +/* Return ACCESS's size or reg_size depending on REG. */ > + > +static HOST_WIDE_INT > +acc_size (struct access *access, bool reg) > +{ > + return reg ? access->reg_size : access->size; > +} > + > /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted > in ACCESS. Return NULL if it cannot be found. */ > > static struct access * > find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, > - HOST_WIDE_INT size) > + HOST_WIDE_INT size, bool reg) > { > - while (access && (access->offset != offset || access->size != size)) > + while (access && (access->offset != offset > + || acc_size (access, reg) != size)) > { > struct access *child = access->first_child; > > - while (child && (child->offset + child->size <= offset)) > + while (child && (child->offset + acc_size (child, reg) <= offset)) > child = child->next_sibling; > access = child; > } > @@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, > if (access) > while (access->first_child > && access->first_child->offset == offset > - && access->first_child->size == size) > + && acc_size (access->first_child, reg) == size) > access = access->first_child; > > return access; > @@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, > if (!access) > return NULL; > > - return find_access_in_subtree (access, offset, size); > + return find_access_in_subtree (access, offset, size, true); > } > > /* Add LINK to the linked list of assign links of RACC. */ > @@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) > access->base = base; > access->offset = offset; > access->size = size; > + access->reg_size = size; > > base_access_vec->get_or_insert (base).safe_push (access); > > @@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, > { > tree res; > if (model->grp_same_access_path > + && !model->reg_acc_type > && !TREE_THIS_VOLATILE (base) > && (TYPE_ADDR_SPACE (TREE_TYPE (base)) > == TYPE_ADDR_SPACE (TREE_TYPE (model->expr))) > @@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access) > return access->replacement_decl; > } > > +/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which > + case also construct BIT_FIELD_REF into the replacement of the corresponding > + type (with location LOC). */ > + > +static tree > +get_reg_access_replacement (location_t loc, struct access *access, bool write, > + gassign **conversion) > +{ > + tree repl = get_access_replacement (access); > + if (!access->reg_acc_type) > + { > + *conversion = NULL; > + return repl; > + } > + > + tree tmp = make_ssa_name (access->reg_acc_type); > + if (write) > + *conversion = gimple_build_assign (repl, NOP_EXPR, tmp); > + else > + *conversion = gimple_build_assign (tmp, NOP_EXPR, repl); > + gimple_set_location (*conversion, loc); > + return tmp; > +} > + > > /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the > linked list along the way. Stop when *ACCESS is NULL or the access pointed > @@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root) > gcc_assert (offset == access->offset); > gcc_assert (access->grp_unscalarizable_region > || size == max_size); > - gcc_assert (max_size == access->size); > + if (access->reg_acc_type) > + gcc_assert (max_size == access->reg_size > + && max_size < access->size); > + else > + gcc_assert (max_size == access->size > + && max_size == access->reg_size); > gcc_assert (reverse == access->reverse); > > if (access->first_child) > @@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr) > return false; > } > > +static void > +upgrade_integral_size_to_prec (struct access *access) > +{ > + /* Always create access replacements that cover the whole access. > + For integral types this means the precision has to match. > + Avoid assumptions based on the integral type kind, too. */ > + if (INTEGRAL_TYPE_P (access->type) > + && (TREE_CODE (access->type) != INTEGER_TYPE > + || TYPE_PRECISION (access->type) != access->size) > + /* But leave bitfield accesses alone. */ > + && (TREE_CODE (access->expr) != COMPONENT_REF > + || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)))) > + { > + tree rt = access->type; > + gcc_assert ((access->offset % BITS_PER_UNIT) == 0 > + && (access->size % BITS_PER_UNIT) == 0); > + access->type = build_nonstandard_integer_type (access->size, > + TYPE_UNSIGNED (rt)); > + access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base, > + access->offset, access->reverse, > + access->type, NULL, false); > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Changing the type of a replacement for "); > + print_generic_expr (dump_file, access->base); > + fprintf (dump_file, " offset: %u, size: %u ", > + (unsigned) access->offset, (unsigned) access->size); > + fprintf (dump_file, " to an integer.\n"); > + } > + } > +} > + > /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when > both seeming beneficial and when ALLOW_REPLACEMENTS allows it. If TOTALLY > is set, we are totally scalarizing the aggregate. Also set all sorts of > @@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct access *parent, > hole = true; > } > > + if (!totally && root->reg_acc_type) > + { > + /* If total scalarization did not eventually succeed, let's undo any > + futile attempts to cover padding. */ > + root->type = root->reg_acc_type; > + root->size = root->reg_size; > + root->reg_acc_type = NULL_TREE; > + } > + > if (allow_replacements && scalar && !root->first_child > && (totally || !root->grp_total_scalarization) > && (totally > @@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct access *parent, > || ((root->grp_scalar_read || root->grp_assignment_read) > && (root->grp_scalar_write || root->grp_assignment_write)))) > { > - /* Always create access replacements that cover the whole access. > - For integral types this means the precision has to match. > - Avoid assumptions based on the integral type kind, too. */ > - if (INTEGRAL_TYPE_P (root->type) > - && (TREE_CODE (root->type) != INTEGER_TYPE > - || TYPE_PRECISION (root->type) != root->size) > - /* But leave bitfield accesses alone. */ > - && (TREE_CODE (root->expr) != COMPONENT_REF > - || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1)))) > - { > - tree rt = root->type; > - gcc_assert ((root->offset % BITS_PER_UNIT) == 0 > - && (root->size % BITS_PER_UNIT) == 0); > - root->type = build_nonstandard_integer_type (root->size, > - TYPE_UNSIGNED (rt)); > - root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base, > - root->offset, root->reverse, > - root->type, NULL, false); > - > - if (dump_file && (dump_flags & TDF_DETAILS)) > - { > - fprintf (dump_file, "Changing the type of a replacement for "); > - print_generic_expr (dump_file, root->base); > - fprintf (dump_file, " offset: %u, size: %u ", > - (unsigned) root->offset, (unsigned) root->size); > - fprintf (dump_file, " to an integer.\n"); > - } > - } > + upgrade_integral_size_to_prec (root); > > root->grp_to_be_replaced = 1; > root->replacement_decl = create_access_replacement (root); > @@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct access *parent, > root->grp_total_scalarization = 0; > } > > - if (!hole || totally) > + if (!hole) > root->grp_covered = 1; > else if (root->grp_write || comes_initialized_p (root->base)) > root->grp_unscalarized_data = 1; /* not covered and written to */ > @@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, struct access *model, > access->expr = expr; > access->offset = new_offset; > access->size = model->size; > + gcc_assert (model->size == model->reg_size); > + access->reg_size = model->reg_size; > access->type = model->type; > access->parent = parent; > access->grp_read = set_grp_read; > @@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos, > access->base = parent->base; > access->offset = pos; > access->size = size; > + access->reg_size = size; > access->expr = expr; > access->type = type; > access->parent = parent; > @@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos, > return new_acc; > } > > +/* Given SIZE return the integer type to represent that many bits or NULL if > + there is no suitable one. */ > + > +static tree > +sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1) > +{ > + tree type = lang_hooks.types.type_for_size (size, unsignedp); > + scalar_int_mode mode; > + if (type > + && is_a <scalar_int_mode> (TYPE_MODE (type), &mode) > + && GET_MODE_BITSIZE (mode) == size) > + return type; > + else > + return NULL_TREE; > +} > + > +/* Create a series of accesses to cover padding from FROM to TO anch chain them > + between LAST_PTR and NEXT_CHILD. Return the pointer to the last created > + one. */ > + > +static struct access * > +fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from, > + HOST_WIDE_INT to, struct access **last_ptr, > + struct access *next_child) > +{ > + struct access *access = NULL; > + > + gcc_assert (from < to); > + do > + { > + HOST_WIDE_INT diff = to - from; > + gcc_assert (diff >= BITS_PER_UNIT); > + HOST_WIDE_INT stsz = 1 << floor_log2 (diff); > + tree type; > + > + while (true) > + { > + type = sra_type_for_size (stsz); > + if (type) > + break; > + stsz /= 2; > + gcc_checking_assert (stsz >= BITS_PER_UNIT); > + } > + > + do { > + tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base, > + from, parent->reverse, type, NULL, > + false); > + access = create_total_scalarization_access (parent, from, stsz, type, > + expr, last_ptr, next_child); > + access->grp_no_warning = 1; > + from += stsz; > + } > + while (to - from >= stsz); > + gcc_assert (from <= to); > + } > + while (from < to); > + return access; > +} > + > +/* Check whether any padding between FROM and TO must be filled during > + total scalarization and if so, extend *LAST_SIBLING, create additional > + artificial accesses under PARENT or both to fill it. Set *LAST_SIBLING to > + the last created access and set its next_sibling to NEXT_CHILD. */ > + > +static void > +total_scalarization_fill_padding (struct access *parent, > + struct access **last_sibling, > + HOST_WIDE_INT from, HOST_WIDE_INT to, > + struct access *next_child) > +{ > + if (constant_decl_p (parent->base)) > + return; > + > + gcc_assert (from <= to); > + if (from == to) > + return; > + > + if (struct access *ls = *last_sibling) > + { > + /* First, perform any upgrades to full integers we would have done > + anyway. */ > + upgrade_integral_size_to_prec (ls); > + > + HOST_WIDE_INT lastsize = ls->size; > + if (INTEGRAL_TYPE_P (ls->type) > + && pow2_or_zerop (lastsize)) > + { > + /* The last field was a reaonably sized integer, let's try to > + enlarge as much as we can so that it is still naturally > + aligned. */ > + HOST_WIDE_INT lastpos = ls->offset; > + HOST_WIDE_INT blocksize = lastsize; > + tree type = NULL_TREE; > + int unsignedp = TYPE_UNSIGNED (ls->type); > + > + while (true) > + { > + HOST_WIDE_INT b2 = blocksize * 2; > + if (lastpos + b2 > to > + || (lastpos % b2) != 0) > + break; > + tree t2 = sra_type_for_size (b2, unsignedp); > + if (!t2) > + break; > + > + blocksize = b2; > + type = t2; > + } > + > + if (blocksize != lastsize) > + { > + ls->reg_acc_type = ls->type; > + ls->type = type; > + ls->size = blocksize; > + from = lastpos + blocksize; > + } > + > + if (from == to) > + return; > + } > + } > + > + struct access **last_ptr; > + if (*last_sibling) > + last_ptr = &(*last_sibling)->next_sibling; > + else > + last_ptr = &parent->first_child; > + *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr, > + next_child); > +} > + > static bool totally_scalarize_subtree (struct access *root); > > /* Return true if INNER is either the same type as OUTER or if it is the type > @@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access *parent, > HOST_WIDE_INT size) > { > struct access *next_child; > + HOST_WIDE_INT covered; > if (!*last_seen_sibling) > - next_child = parent->first_child; > + { > + next_child = parent->first_child; > + covered = parent->offset; > + } > else > - next_child = (*last_seen_sibling)->next_sibling; > + { > + next_child = (*last_seen_sibling)->next_sibling; > + covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size; > + } > > /* First, traverse the chain of siblings until it points to an access with > offset at least equal to POS. Check all skipped accesses whether they > @@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access *parent, > { > if (next_child->offset + next_child->size > pos) > return TOTAL_FLD_FAILED; > + total_scalarization_fill_padding (parent, last_seen_sibling, covered, > + next_child->offset, next_child); > + covered = next_child->offset + next_child->size; > *last_seen_sibling = next_child; > next_child = next_child->next_sibling; > } > > + total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos, > + next_child); > + > /* Now check whether next_child has exactly the right POS and SIZE and if so, > whether it can represent what we need and can be totally scalarized > itself. */ > @@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root) > } > > out: > + /* Even though there is nothing in the type, there can still be accesses here > + in the IL. */ > + > + HOST_WIDE_INT covered; > + struct access *next_child; > + > + if (last_seen_sibling) > + { > + covered = last_seen_sibling->offset + last_seen_sibling->size; > + next_child = last_seen_sibling->next_sibling; > + } > + else > + { > + covered = root->offset; > + next_child = root->first_child; > + } > + > + while (next_child) > + { > + total_scalarization_fill_padding (root, &last_seen_sibling, covered, > + next_child->offset, next_child); > + covered = next_child->offset + next_child->size; > + last_seen_sibling = next_child; > + next_child = next_child->next_sibling; > + } > + > + total_scalarization_fill_padding (root, &last_seen_sibling, covered, > + root->offset + root->size, NULL); > return true; > } > > @@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) > > if (access->grp_to_be_replaced) > { > - tree repl = get_access_replacement (access); > /* If we replace a non-register typed access simply use the original > access expression to extract the scalar component afterwards. > This happens if scalarizing a function return value or parameter > @@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) > be accessed as a different type too, potentially creating a need for > type conversion (see PR42196) and when scalarized unions are involved > in assembler statements (see PR42398). */ > - if (!useless_type_conversion_p (type, access->type)) > + tree actual_type > + = access->reg_acc_type ? access->reg_acc_type : access->type; > + > + if (!useless_type_conversion_p (type, actual_type)) > { > tree ref; > > + tree repl = get_access_replacement (access); > ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false); > > if (write) > @@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) > } > } > else > - *expr = repl; > + { > + gassign *conversion; > + tree repl = get_reg_access_replacement (loc, access, write, > + &conversion); > + *expr = repl; > + > + if (conversion) > + { > + if (write) > + gsi_insert_after (gsi, conversion, GSI_NEW_STMT); > + else > + gsi_insert_before (gsi, conversion, GSI_SAME_STMT); > + } > + } > + > sra_stats.exprs++; > } > else if (write && access->grp_to_be_debug_replaced) > @@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc, > gassign *stmt; > tree rhs; > > - racc = find_access_in_subtree (sad->top_racc, offset, lacc->size); > + racc = find_access_in_subtree (sad->top_racc, offset, lacc->size, > + false); > if (racc && racc->grp_to_be_replaced) > { > rhs = get_access_replacement (racc); > @@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc, > tree drhs; > struct access *racc = find_access_in_subtree (sad->top_racc, > offset, > - lacc->size); > + lacc->size, false); > > if (racc && racc->grp_to_be_replaced) > { > @@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) > loc = gimple_location (stmt); > if (lacc && lacc->grp_to_be_replaced) > { > - lhs = get_access_replacement (lacc); > + gassign *lhs_conversion = NULL; > + lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion); > gimple_assign_set_lhs (stmt, lhs); > + if (lhs_conversion) > + gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT); > modify_this_stmt = true; > if (lacc->grp_partial_lhs) > force_gimple_rhs = true; > @@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) > > if (racc && racc->grp_to_be_replaced) > { > - rhs = get_access_replacement (racc); > + gassign *rhs_conversion = NULL; > + rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion); > + if (rhs_conversion) > + gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT); > modify_this_stmt = true; > if (racc->grp_partial_lhs) > force_gimple_rhs = true; >
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3541c6638f9..34c60e6f2a3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,35 @@ +2020-01-26 Martin Jambor <mjambor@suse.cz> + + PR tree-optimization/92486 + * tree-sra.c: Include langhooks.h + (struct access): New fields reg_size and reg_acc_type. + (dump_access): Print new fields. + (acc_size): New function. + (find_access_in_subtree): Use it, new parameter reg. + (get_var_base_offset_size_access): Pass true to + find_access_in_subtree. + (create_access_1): Initialize reg_size. + (create_artificial_child_access): Likewise. + (create_total_scalarization_access): Likewise. + (build_ref_for_model): Do not use model expr if reg_acc_size is + non-NULL. + (get_reg_access_replacement): New function. + (verify_sra_access_forest): Adjust verification for presence of + extended accesses covering padding. + (analyze_access_subtree): Undo extension over padding if total + scalarization failed, set grp_partial_lhs if we are going to introduce + a partial store to the new replacement, do not ignore holes when + totally scalarizing. + (sra_type_for_size): New function. + (total_scalarization_fill_padding): Likewise. + (total_should_skip_creating_access): Use it. + (totally_scalarize_subtree): Likewise. + (sra_modify_expr): Use get_reg_access_replacement instead of + get_access_replacement, adjust for reg_acc_type. + (sra_modify_assign): Likewise. + (load_assign_lhs_subreplacements): Pass false to + find_access_in_subtree. + 2020-01-27 Martin Liska <mliska@suse.cz> PR driver/91220 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 22a37dd1ab2..7ccb5098224 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2020-01-24 Martin Jambor <mjambor@suse.cz> + + PR tree-optimization/92486 + * gcc.dg/tree-ssa/pr92486.c: New test. + 2020-01-27 Martin Liska <mliska@suse.cz> PR target/93274 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c new file mode 100644 index 00000000000..77e84241eff --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options "-O1" } */ + +struct s { + char c; + int i; +}; + +__attribute__((noipa)) +void f(struct s *p, struct s *q) +{ + struct s w; + + __builtin_memset(&w, 0, sizeof(struct s)); + w = *q; + + __builtin_memset(p, 0, sizeof(struct s)); + *p = w; +} + +int main() +{ + struct s x; + __builtin_memset(&x, 1, sizeof(struct s)); + + struct s y; + __builtin_memset(&y, 2, sizeof(struct s)); + + f(&y, &x); + + for (unsigned char *p = (unsigned char *)&y; + p < (unsigned char *)&y + sizeof(struct s); + p++) + if (*p != 1) + __builtin_abort (); + + return 0; +} diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index ea8594db193..bda342ffdf9 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "builtins.h" #include "tree-sra.h" +#include "langhooks.h" /* Enumeration of all aggregate reductions we can do. */ @@ -130,11 +131,16 @@ struct assign_link; struct access { - /* Values returned by `get_ref_base_and_extent' for each component reference - If EXPR isn't a component reference just set `BASE = EXPR', `OFFSET = 0', - `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ + /* Offset, size and base are values returned by `get_ref_base_and_extent' for + each component reference If EXPR isn't a component reference just set + `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'. */ HOST_WIDE_INT offset; HOST_WIDE_INT size; + + /* If reg_acc_type is non-NULL, this is the size of the actual egister type + the access represented before it was extended to cover padding. Otherwise + must be equal to size. */ + HOST_WIDE_INT reg_size; tree base; /* Expression. It is context dependent so do not use it to create new @@ -144,6 +150,12 @@ struct access /* Type. */ tree type; + /* If non-NULL, this is the type that should be actually extracted or + inserted from/to the replacement decl when replacing accesses to the + individual field itself (as opposed to accesses created as part of + replacing aggregate copies which should use TYPE). */ + tree reg_acc_type; + /* The statement this access belongs to. */ gimple *stmt; @@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp) print_generic_expr (f, access->base); fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset); fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size); + if (access->reg_size != access->size) + fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size); fprintf (f, ", expr = "); print_generic_expr (f, access->expr); fprintf (f, ", type = "); print_generic_expr (f, access->type); + if (access->reg_acc_type) + { + fprintf (f, ", reg_acc_type = "); + print_generic_expr (f, access->type); + } fprintf (f, ", reverse = %d", access->reverse); if (grp) fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, " @@ -481,18 +500,27 @@ get_base_access_vector (tree base) return base_access_vec->get (base); } +/* Return ACCESS's size or reg_size depending on REG. */ + +static HOST_WIDE_INT +acc_size (struct access *access, bool reg) +{ + return reg ? access->reg_size : access->size; +} + /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted in ACCESS. Return NULL if it cannot be found. */ static struct access * find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, - HOST_WIDE_INT size) + HOST_WIDE_INT size, bool reg) { - while (access && (access->offset != offset || access->size != size)) + while (access && (access->offset != offset + || acc_size (access, reg) != size)) { struct access *child = access->first_child; - while (child && (child->offset + child->size <= offset)) + while (child && (child->offset + acc_size (child, reg) <= offset)) child = child->next_sibling; access = child; } @@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset, if (access) while (access->first_child && access->first_child->offset == offset - && access->first_child->size == size) + && acc_size (access->first_child, reg) == size) access = access->first_child; return access; @@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset, if (!access) return NULL; - return find_access_in_subtree (access, offset, size); + return find_access_in_subtree (access, offset, size, true); } /* Add LINK to the linked list of assign links of RACC. */ @@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) access->base = base; access->offset = offset; access->size = size; + access->reg_size = size; base_access_vec->get_or_insert (base).safe_push (access); @@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset, { tree res; if (model->grp_same_access_path + && !model->reg_acc_type && !TREE_THIS_VOLATILE (base) && (TYPE_ADDR_SPACE (TREE_TYPE (base)) == TYPE_ADDR_SPACE (TREE_TYPE (model->expr))) @@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access) return access->replacement_decl; } +/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which + case also construct BIT_FIELD_REF into the replacement of the corresponding + type (with location LOC). */ + +static tree +get_reg_access_replacement (location_t loc, struct access *access, bool write, + gassign **conversion) +{ + tree repl = get_access_replacement (access); + if (!access->reg_acc_type) + { + *conversion = NULL; + return repl; + } + + tree tmp = make_ssa_name (access->reg_acc_type); + if (write) + *conversion = gimple_build_assign (repl, NOP_EXPR, tmp); + else + *conversion = gimple_build_assign (tmp, NOP_EXPR, repl); + gimple_set_location (*conversion, loc); + return tmp; +} + /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the linked list along the way. Stop when *ACCESS is NULL or the access pointed @@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root) gcc_assert (offset == access->offset); gcc_assert (access->grp_unscalarizable_region || size == max_size); - gcc_assert (max_size == access->size); + if (access->reg_acc_type) + gcc_assert (max_size == access->reg_size + && max_size < access->size); + else + gcc_assert (max_size == access->size + && max_size == access->reg_size); gcc_assert (reverse == access->reverse); if (access->first_child) @@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr) return false; } +static void +upgrade_integral_size_to_prec (struct access *access) +{ + /* Always create access replacements that cover the whole access. + For integral types this means the precision has to match. + Avoid assumptions based on the integral type kind, too. */ + if (INTEGRAL_TYPE_P (access->type) + && (TREE_CODE (access->type) != INTEGER_TYPE + || TYPE_PRECISION (access->type) != access->size) + /* But leave bitfield accesses alone. */ + && (TREE_CODE (access->expr) != COMPONENT_REF + || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)))) + { + tree rt = access->type; + gcc_assert ((access->offset % BITS_PER_UNIT) == 0 + && (access->size % BITS_PER_UNIT) == 0); + access->type = build_nonstandard_integer_type (access->size, + TYPE_UNSIGNED (rt)); + access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base, + access->offset, access->reverse, + access->type, NULL, false); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Changing the type of a replacement for "); + print_generic_expr (dump_file, access->base); + fprintf (dump_file, " offset: %u, size: %u ", + (unsigned) access->offset, (unsigned) access->size); + fprintf (dump_file, " to an integer.\n"); + } + } +} + /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when both seeming beneficial and when ALLOW_REPLACEMENTS allows it. If TOTALLY is set, we are totally scalarizing the aggregate. Also set all sorts of @@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct access *parent, hole = true; } + if (!totally && root->reg_acc_type) + { + /* If total scalarization did not eventually succeed, let's undo any + futile attempts to cover padding. */ + root->type = root->reg_acc_type; + root->size = root->reg_size; + root->reg_acc_type = NULL_TREE; + } + if (allow_replacements && scalar && !root->first_child && (totally || !root->grp_total_scalarization) && (totally @@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct access *parent, || ((root->grp_scalar_read || root->grp_assignment_read) && (root->grp_scalar_write || root->grp_assignment_write)))) { - /* Always create access replacements that cover the whole access. - For integral types this means the precision has to match. - Avoid assumptions based on the integral type kind, too. */ - if (INTEGRAL_TYPE_P (root->type) - && (TREE_CODE (root->type) != INTEGER_TYPE - || TYPE_PRECISION (root->type) != root->size) - /* But leave bitfield accesses alone. */ - && (TREE_CODE (root->expr) != COMPONENT_REF - || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1)))) - { - tree rt = root->type; - gcc_assert ((root->offset % BITS_PER_UNIT) == 0 - && (root->size % BITS_PER_UNIT) == 0); - root->type = build_nonstandard_integer_type (root->size, - TYPE_UNSIGNED (rt)); - root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base, - root->offset, root->reverse, - root->type, NULL, false); - - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "Changing the type of a replacement for "); - print_generic_expr (dump_file, root->base); - fprintf (dump_file, " offset: %u, size: %u ", - (unsigned) root->offset, (unsigned) root->size); - fprintf (dump_file, " to an integer.\n"); - } - } + upgrade_integral_size_to_prec (root); root->grp_to_be_replaced = 1; root->replacement_decl = create_access_replacement (root); @@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct access *parent, root->grp_total_scalarization = 0; } - if (!hole || totally) + if (!hole) root->grp_covered = 1; else if (root->grp_write || comes_initialized_p (root->base)) root->grp_unscalarized_data = 1; /* not covered and written to */ @@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, struct access *model, access->expr = expr; access->offset = new_offset; access->size = model->size; + gcc_assert (model->size == model->reg_size); + access->reg_size = model->reg_size; access->type = model->type; access->parent = parent; access->grp_read = set_grp_read; @@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos, access->base = parent->base; access->offset = pos; access->size = size; + access->reg_size = size; access->expr = expr; access->type = type; access->parent = parent; @@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos, return new_acc; } +/* Given SIZE return the integer type to represent that many bits or NULL if + there is no suitable one. */ + +static tree +sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1) +{ + tree type = lang_hooks.types.type_for_size (size, unsignedp); + scalar_int_mode mode; + if (type + && is_a <scalar_int_mode> (TYPE_MODE (type), &mode) + && GET_MODE_BITSIZE (mode) == size) + return type; + else + return NULL_TREE; +} + +/* Create a series of accesses to cover padding from FROM to TO anch chain them + between LAST_PTR and NEXT_CHILD. Return the pointer to the last created + one. */ + +static struct access * +fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from, + HOST_WIDE_INT to, struct access **last_ptr, + struct access *next_child) +{ + struct access *access = NULL; + + gcc_assert (from < to); + do + { + HOST_WIDE_INT diff = to - from; + gcc_assert (diff >= BITS_PER_UNIT); + HOST_WIDE_INT stsz = 1 << floor_log2 (diff); + tree type; + + while (true) + { + type = sra_type_for_size (stsz); + if (type) + break; + stsz /= 2; + gcc_checking_assert (stsz >= BITS_PER_UNIT); + } + + do { + tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base, + from, parent->reverse, type, NULL, + false); + access = create_total_scalarization_access (parent, from, stsz, type, + expr, last_ptr, next_child); + access->grp_no_warning = 1; + from += stsz; + } + while (to - from >= stsz); + gcc_assert (from <= to); + } + while (from < to); + return access; +} + +/* Check whether any padding between FROM and TO must be filled during + total scalarization and if so, extend *LAST_SIBLING, create additional + artificial accesses under PARENT or both to fill it. Set *LAST_SIBLING to + the last created access and set its next_sibling to NEXT_CHILD. */ + +static void +total_scalarization_fill_padding (struct access *parent, + struct access **last_sibling, + HOST_WIDE_INT from, HOST_WIDE_INT to, + struct access *next_child) +{ + if (constant_decl_p (parent->base)) + return; + + gcc_assert (from <= to); + if (from == to) + return; + + if (struct access *ls = *last_sibling) + { + /* First, perform any upgrades to full integers we would have done + anyway. */ + upgrade_integral_size_to_prec (ls); + + HOST_WIDE_INT lastsize = ls->size; + if (INTEGRAL_TYPE_P (ls->type) + && pow2_or_zerop (lastsize)) + { + /* The last field was a reaonably sized integer, let's try to + enlarge as much as we can so that it is still naturally + aligned. */ + HOST_WIDE_INT lastpos = ls->offset; + HOST_WIDE_INT blocksize = lastsize; + tree type = NULL_TREE; + int unsignedp = TYPE_UNSIGNED (ls->type); + + while (true) + { + HOST_WIDE_INT b2 = blocksize * 2; + if (lastpos + b2 > to + || (lastpos % b2) != 0) + break; + tree t2 = sra_type_for_size (b2, unsignedp); + if (!t2) + break; + + blocksize = b2; + type = t2; + } + + if (blocksize != lastsize) + { + ls->reg_acc_type = ls->type; + ls->type = type; + ls->size = blocksize; + from = lastpos + blocksize; + } + + if (from == to) + return; + } + } + + struct access **last_ptr; + if (*last_sibling) + last_ptr = &(*last_sibling)->next_sibling; + else + last_ptr = &parent->first_child; + *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr, + next_child); +} + static bool totally_scalarize_subtree (struct access *root); /* Return true if INNER is either the same type as OUTER or if it is the type @@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access *parent, HOST_WIDE_INT size) { struct access *next_child; + HOST_WIDE_INT covered; if (!*last_seen_sibling) - next_child = parent->first_child; + { + next_child = parent->first_child; + covered = parent->offset; + } else - next_child = (*last_seen_sibling)->next_sibling; + { + next_child = (*last_seen_sibling)->next_sibling; + covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size; + } /* First, traverse the chain of siblings until it points to an access with offset at least equal to POS. Check all skipped accesses whether they @@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access *parent, { if (next_child->offset + next_child->size > pos) return TOTAL_FLD_FAILED; + total_scalarization_fill_padding (parent, last_seen_sibling, covered, + next_child->offset, next_child); + covered = next_child->offset + next_child->size; *last_seen_sibling = next_child; next_child = next_child->next_sibling; } + total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos, + next_child); + /* Now check whether next_child has exactly the right POS and SIZE and if so, whether it can represent what we need and can be totally scalarized itself. */ @@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root) } out: + /* Even though there is nothing in the type, there can still be accesses here + in the IL. */ + + HOST_WIDE_INT covered; + struct access *next_child; + + if (last_seen_sibling) + { + covered = last_seen_sibling->offset + last_seen_sibling->size; + next_child = last_seen_sibling->next_sibling; + } + else + { + covered = root->offset; + next_child = root->first_child; + } + + while (next_child) + { + total_scalarization_fill_padding (root, &last_seen_sibling, covered, + next_child->offset, next_child); + covered = next_child->offset + next_child->size; + last_seen_sibling = next_child; + next_child = next_child->next_sibling; + } + + total_scalarization_fill_padding (root, &last_seen_sibling, covered, + root->offset + root->size, NULL); return true; } @@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) if (access->grp_to_be_replaced) { - tree repl = get_access_replacement (access); /* If we replace a non-register typed access simply use the original access expression to extract the scalar component afterwards. This happens if scalarizing a function return value or parameter @@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) be accessed as a different type too, potentially creating a need for type conversion (see PR42196) and when scalarized unions are involved in assembler statements (see PR42398). */ - if (!useless_type_conversion_p (type, access->type)) + tree actual_type + = access->reg_acc_type ? access->reg_acc_type : access->type; + + if (!useless_type_conversion_p (type, actual_type)) { tree ref; + tree repl = get_access_replacement (access); ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false); if (write) @@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) } } else - *expr = repl; + { + gassign *conversion; + tree repl = get_reg_access_replacement (loc, access, write, + &conversion); + *expr = repl; + + if (conversion) + { + if (write) + gsi_insert_after (gsi, conversion, GSI_NEW_STMT); + else + gsi_insert_before (gsi, conversion, GSI_SAME_STMT); + } + } + sra_stats.exprs++; } else if (write && access->grp_to_be_debug_replaced) @@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc, gassign *stmt; tree rhs; - racc = find_access_in_subtree (sad->top_racc, offset, lacc->size); + racc = find_access_in_subtree (sad->top_racc, offset, lacc->size, + false); if (racc && racc->grp_to_be_replaced) { rhs = get_access_replacement (racc); @@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc, tree drhs; struct access *racc = find_access_in_subtree (sad->top_racc, offset, - lacc->size); + lacc->size, false); if (racc && racc->grp_to_be_replaced) { @@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) loc = gimple_location (stmt); if (lacc && lacc->grp_to_be_replaced) { - lhs = get_access_replacement (lacc); + gassign *lhs_conversion = NULL; + lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion); gimple_assign_set_lhs (stmt, lhs); + if (lhs_conversion) + gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT); modify_this_stmt = true; if (lacc->grp_partial_lhs) force_gimple_rhs = true; @@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) if (racc && racc->grp_to_be_replaced) { - rhs = get_access_replacement (racc); + gassign *rhs_conversion = NULL; + rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion); + if (rhs_conversion) + gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT); modify_this_stmt = true; if (racc->grp_partial_lhs) force_gimple_rhs = true;