Message ID | alpine.LNX.2.20.13.1709212003310.31862@monopod.intra.ispras.ru |
---|---|
State | New |
Headers | show |
Series | tree-sra: fix compare_access_positions qsort comparator | expand |
Alexander Monakov <amonakov@ispras.ru> writes: > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index 163b7a2d03b..4f9a8802aeb 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -1542,19 +1542,17 @@ compare_access_positions (const void *a, const void *b) > && TREE_CODE (f2->type) != COMPLEX_TYPE > && TREE_CODE (f2->type) != VECTOR_TYPE) > return -1; > + /* Put any non-integral type before any integral type. */ > + else if (INTEGRAL_TYPE_P (f1->type) > + && !INTEGRAL_TYPE_P (f2->type)) > + return 1; > + else if (!INTEGRAL_TYPE_P (f1->type) > + && INTEGRAL_TYPE_P (f2->type)) > + return -1; > /* Put the integral type with the bigger precision first. */ > else if (INTEGRAL_TYPE_P (f1->type) > && INTEGRAL_TYPE_P (f2->type)) > return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); > - /* Put any integral type with non-full precision last. */ > - else if (INTEGRAL_TYPE_P (f1->type) > - && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type)) > - != TYPE_PRECISION (f1->type))) > - return 1; > - else if (INTEGRAL_TYPE_P (f2->type) > - && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type)) > - != TYPE_PRECISION (f2->type))) > - return -1; > /* Stabilize the sort. */ > return TYPE_UID (f1->type) - TYPE_UID (f2->type); > } LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION test fails to stabilise the sort if you have two integral types with the same precision? Thanks, Richard
On Thu, 21 Sep 2017, Richard Sandiford wrote: > LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION > test fails to stabilise the sort if you have two integral types with > the same precision? Yes, but that's a pre-existing issue, so I didn't change it in the patch. I think GCC broadly makes an assumption that libc qsort is deterministic, i.e. as long as comparators are deterministic, outcome of qsort doesn't change across compiler runs. In this case qsort behaviour affects IR on a testcase like this: long g() { struct s {union { unsigned long ul; long l;}; } s; asm("# %0" : "=X"(s.ul)); return s.l; } (sort order determines types of post-early-SRA temporaries), so I guess it's desirable to stabilize it. Is this patch OK as is, or do I need to resubmit a variant that uses TYPE_UID comparison in case TYPE_PRECISION is equal? Thanks. Alexander
On Thu, Sep 21, 2017 at 08:27:31PM +0300, Alexander Monakov wrote: > Hi, > > The compare_access_positions qsort comparator lacks transitivity, although > somewhat surprisingly this issue didn't manifest on 64-bit x86 bootstraps. > The first invalid comparison step is here (tree-sra.c:1545): > > /* Put the integral type with the bigger precision first. */ > else if (INTEGRAL_TYPE_P (f1->type) > && INTEGRAL_TYPE_P (f2->type)) > return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); > > Imagine you have items A, B, C such that they compare equal according to > preceding comparison steps, A and C are integral and have precision 64 resp. > 32, B is non-integral. Then you have C < A according to this step, but > comparisons against B depend on TYPE_UID, so you can end up with A < B < C < A. > > A minimal fix would be to order all integral items before/after non-integral, > like preceding code already does for aggregate/vector/complex types. Thanks for spotting this. Nevertheless, I would prefer SRA to select integer types over non-integer ones (i.e. floats), because in the common scenario which is a union of a float and an int, the int is the type for which we can enable more subsequent optimizations, even if it means that we do not scalarize some exotic cases. So I'm currently testing the following (if we ever find that this is a problem in practice, we can fix it at the cost of making sort_and_splice_var_accesses more complicated but capable of undoing this decision). By the way, after I jettison IPA-SRA from tree-sra.c, I'll start reworking it in a way that does not have the qsort in it. Thanks again, Martin diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 163b7a2d03b..0f92033d0bb 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b) && TREE_CODE (f2->type) != COMPLEX_TYPE && TREE_CODE (f2->type) != VECTOR_TYPE) return -1; - /* Put the integral type with the bigger precision first. */ + /* Put any integral type before any non-integral type. When splicing, we + make sure that those with insufficient precision and occupupying the + same space are not scalarized. */ else if (INTEGRAL_TYPE_P (f1->type) + && !INTEGRAL_TYPE_P (f2->type)) + return -1; + else if (!INTEGRAL_TYPE_P (f1->type) && INTEGRAL_TYPE_P (f2->type)) - return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); - /* Put any integral type with non-full precision last. */ - else if (INTEGRAL_TYPE_P (f1->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type)) - != TYPE_PRECISION (f1->type))) return 1; - else if (INTEGRAL_TYPE_P (f2->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type)) - != TYPE_PRECISION (f2->type))) - return -1; + /* Put the integral type with the bigger precision first. */ + else if (INTEGRAL_TYPE_P (f1->type) + && INTEGRAL_TYPE_P (f2->type) + && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type))) + return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); /* Stabilize the sort. */ return TYPE_UID (f1->type) - TYPE_UID (f2->type); } @@ -2055,6 +2056,9 @@ sort_and_splice_var_accesses (tree var) bool grp_partial_lhs = access->grp_partial_lhs; bool first_scalar = is_gimple_reg_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; + bool non_full_precision = (INTEGRAL_TYPE_P (access->type) + && (TREE_INT_CST_LOW (TYPE_SIZE (access->type)) + != TYPE_PRECISION (access->type))); if (first || access->offset >= high) { @@ -2102,6 +2106,21 @@ sort_and_splice_var_accesses (tree var) this combination of size and offset, the comparison function should have put the scalars first. */ gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type)); + /* It also prefers integral types to non-integral. However, when the + precision of the selected type does not span the entire area and + should also be used for a non-integer (i.e. float), we must not + let that happen. */ + if (non_full_precision && !INTEGRAL_TYPE_P (ac2->type)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Cannot sclarize the following access " + "because insufficient precision integer type was " + "selected.\n "); + dump_access (dump_file, access, false); + } + unscalarizable_region = true; + } ac2->group_representative = access; j++; }
On Mon, 25 Sep 2017, Martin Jambor wrote: > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b) > && TREE_CODE (f2->type) != COMPLEX_TYPE > && TREE_CODE (f2->type) != VECTOR_TYPE) > return -1; > - /* Put the integral type with the bigger precision first. */ > + /* Put any integral type before any non-integral type. When splicing, we > + make sure that those with insufficient precision and occupupying the Typo (s/upup/up). > @@ -2102,6 +2106,21 @@ sort_and_splice_var_accesses (tree var) > this combination of size and offset, the comparison function > should have put the scalars first. */ > gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type)); > + /* It also prefers integral types to non-integral. However, when the > + precision of the selected type does not span the entire area and > + should also be used for a non-integer (i.e. float), we must not > + let that happen. */ > + if (non_full_precision && !INTEGRAL_TYPE_P (ac2->type)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Cannot sclarize the following access " Typo ('scalarize'). Thanks! If this is resolved, haifa-sched autoprefetch ranking will become the last remaining (among discovered so far) inconsistent qsort comparator in GCC. Alexander
Hi, On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote: > > Thanks! If this is resolved, haifa-sched autoprefetch ranking will become the > last remaining (among discovered so far) inconsistent qsort comparator in GCC. > So the following has passed bootstrap and testing on x86_64-linux (including Ada). I have failed to create a C/C++ testcase but when I was trying to come up with one I realized there is code in analyze_access_subtree that replaces integer types with smaller precision with full-precision ones for all cases but bit-fields. Therefore I adjusted the disqualification to only trigger for bit-fields (that happen to be of equal size as a non-integer register type), which I think is very unlikely to ever happen. Richi, is it OK for trunk? Thanks, Martin 2017-09-26 Martin Jambor <mjambor@suse.cz> * tree-sra.c (compare_access_positions): Put integral types first, stabilize sorting of integral types, remove conditions putting non-full-precision integers last. (sort_and_splice_var_accesses): Disable scalarization if a non-integert would be represented by a non-full-precision integer. --- gcc/tree-sra.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 163b7a2d03b..f5675edc7f1 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b) && TREE_CODE (f2->type) != COMPLEX_TYPE && TREE_CODE (f2->type) != VECTOR_TYPE) return -1; - /* Put the integral type with the bigger precision first. */ + /* Put any integral type before any non-integral type. When splicing, we + make sure that those with insufficient precision and occupying the + same space are not scalarized. */ else if (INTEGRAL_TYPE_P (f1->type) + && !INTEGRAL_TYPE_P (f2->type)) + return -1; + else if (!INTEGRAL_TYPE_P (f1->type) && INTEGRAL_TYPE_P (f2->type)) - return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); - /* Put any integral type with non-full precision last. */ - else if (INTEGRAL_TYPE_P (f1->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type)) - != TYPE_PRECISION (f1->type))) return 1; - else if (INTEGRAL_TYPE_P (f2->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type)) - != TYPE_PRECISION (f2->type))) - return -1; + /* Put the integral type with the bigger precision first. */ + else if (INTEGRAL_TYPE_P (f1->type) + && INTEGRAL_TYPE_P (f2->type) + && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type))) + return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); /* Stabilize the sort. */ return TYPE_UID (f1->type) - TYPE_UID (f2->type); } @@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var) bool grp_partial_lhs = access->grp_partial_lhs; bool first_scalar = is_gimple_reg_type (access->type); bool unscalarizable_region = access->grp_unscalarizable_region; + bool bf_non_full_precision + = (INTEGRAL_TYPE_P (access->type) + && TYPE_PRECISION (access->type) != access->size + && TREE_CODE (access->expr) == COMPONENT_REF + && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))); if (first || access->offset >= high) { @@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var) this combination of size and offset, the comparison function should have put the scalars first. */ gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type)); + /* It also prefers integral types to non-integral. However, when the + precision of the selected type does not span the entire area and + should also be used for a non-integer (i.e. float), we must not + let that happen. Normally analyze_access_subtree expands the type + to cover the entire area but for bit-fields it doesn't. */ + if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Cannot scalarize the following access " + "because insufficient precision integer type was " + "selected.\n "); + dump_access (dump_file, access, false); + } + unscalarizable_region = true; + } ac2->group_representative = access; j++; }
On September 26, 2017 5:20:25 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote: >Hi, > >On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote: >> >> Thanks! If this is resolved, haifa-sched autoprefetch ranking will >become the >> last remaining (among discovered so far) inconsistent qsort >comparator in GCC. >> > >So the following has passed bootstrap and testing on x86_64-linux >(including Ada). I have failed to create a C/C++ testcase but when I >was trying to come up with one I realized there is code in >analyze_access_subtree that replaces integer types with smaller >precision with full-precision ones for all cases but bit-fields. >Therefore I adjusted the disqualification to only trigger for >bit-fields (that happen to be of equal size as a non-integer register >type), which I think is very unlikely to ever happen. > >Richi, is it OK for trunk? OK. Richard. >Thanks, > >Martin > > >2017-09-26 Martin Jambor <mjambor@suse.cz> > > * tree-sra.c (compare_access_positions): Put integral types first, > stabilize sorting of integral types, remove conditions putting > non-full-precision integers last. > (sort_and_splice_var_accesses): Disable scalarization if a > non-integert would be represented by a non-full-precision integer. >--- > gcc/tree-sra.c | 42 ++++++++++++++++++++++++++++++++---------- > 1 file changed, 32 insertions(+), 10 deletions(-) > >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >index 163b7a2d03b..f5675edc7f1 100644 >--- a/gcc/tree-sra.c >+++ b/gcc/tree-sra.c >@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const >void *b) > && TREE_CODE (f2->type) != COMPLEX_TYPE > && TREE_CODE (f2->type) != VECTOR_TYPE) > return -1; >- /* Put the integral type with the bigger precision first. */ >+ /* Put any integral type before any non-integral type. When >splicing, we >+ make sure that those with insufficient precision and occupying the >+ same space are not scalarized. */ > else if (INTEGRAL_TYPE_P (f1->type) >+ && !INTEGRAL_TYPE_P (f2->type)) >+ return -1; >+ else if (!INTEGRAL_TYPE_P (f1->type) > && INTEGRAL_TYPE_P (f2->type)) >- return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); >- /* Put any integral type with non-full precision last. */ >- else if (INTEGRAL_TYPE_P (f1->type) >- && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type)) >- != TYPE_PRECISION (f1->type))) > return 1; >- else if (INTEGRAL_TYPE_P (f2->type) >- && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type)) >- != TYPE_PRECISION (f2->type))) >- return -1; >+ /* Put the integral type with the bigger precision first. */ >+ else if (INTEGRAL_TYPE_P (f1->type) >+ && INTEGRAL_TYPE_P (f2->type) >+ && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type))) >+ return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); > /* Stabilize the sort. */ > return TYPE_UID (f1->type) - TYPE_UID (f2->type); > } >@@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var) > bool grp_partial_lhs = access->grp_partial_lhs; > bool first_scalar = is_gimple_reg_type (access->type); > bool unscalarizable_region = access->grp_unscalarizable_region; >+ bool bf_non_full_precision >+ = (INTEGRAL_TYPE_P (access->type) >+ && TYPE_PRECISION (access->type) != access->size >+ && TREE_CODE (access->expr) == COMPONENT_REF >+ && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))); > > if (first || access->offset >= high) > { >@@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var) > this combination of size and offset, the comparison function > should have put the scalars first. */ > gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type)); >+ /* It also prefers integral types to non-integral. However, when >the >+ precision of the selected type does not span the entire area and >+ should also be used for a non-integer (i.e. float), we must not >+ let that happen. Normally analyze_access_subtree expands the >type >+ to cover the entire area but for bit-fields it doesn't. */ >+ if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type)) >+ { >+ if (dump_file && (dump_flags & TDF_DETAILS)) >+ { >+ fprintf (dump_file, "Cannot scalarize the following access " >+ "because insufficient precision integer type was " >+ "selected.\n "); >+ dump_access (dump_file, access, false); >+ } >+ unscalarizable_region = true; >+ } > ac2->group_representative = access; > j++; > }
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 163b7a2d03b..4f9a8802aeb 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -1542,19 +1542,17 @@ compare_access_positions (const void *a, const void *b) && TREE_CODE (f2->type) != COMPLEX_TYPE && TREE_CODE (f2->type) != VECTOR_TYPE) return -1; + /* Put any non-integral type before any integral type. */ + else if (INTEGRAL_TYPE_P (f1->type) + && !INTEGRAL_TYPE_P (f2->type)) + return 1; + else if (!INTEGRAL_TYPE_P (f1->type) + && INTEGRAL_TYPE_P (f2->type)) + return -1; /* Put the integral type with the bigger precision first. */ else if (INTEGRAL_TYPE_P (f1->type) && INTEGRAL_TYPE_P (f2->type)) return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type); - /* Put any integral type with non-full precision last. */ - else if (INTEGRAL_TYPE_P (f1->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type)) - != TYPE_PRECISION (f1->type))) - return 1; - else if (INTEGRAL_TYPE_P (f2->type) - && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type)) - != TYPE_PRECISION (f2->type))) - return -1; /* Stabilize the sort. */ return TYPE_UID (f1->type) - TYPE_UID (f2->type); }