diff mbox series

tree-sra: fix compare_access_positions qsort comparator

Message ID alpine.LNX.2.20.13.1709212003310.31862@monopod.intra.ispras.ru
State New
Headers show
Series tree-sra: fix compare_access_positions qsort comparator | expand

Commit Message

Alexander Monakov Sept. 21, 2017, 5:27 p.m. UTC
Hi,

The compare_access_positions qsort comparator lacks transitivity, although
somewhat surprisingly this issue didn't manifest on 64-bit x86 bootstraps.
The first invalid comparison step is here (tree-sra.c:1545):

      /* Put the integral type with the bigger precision first.  */
      else if (INTEGRAL_TYPE_P (f1->type)
	       && INTEGRAL_TYPE_P (f2->type))
	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);

Imagine you have items A, B, C such that they compare equal according to
preceding comparison steps, A and C are integral and have precision 64 resp.
32, B is non-integral.  Then you have C < A according to this step, but
comparisons against B depend on TYPE_UID, so you can end up with A < B < C < A.

A minimal fix would be to order all integral items before/after non-integral,
like preceding code already does for aggregate/vector/complex types.

Bootstrapped/regtested on 32-bit x86.

Thanks.
Alexander

	* tree-sra (compare_access_positions): Order non-integral types before
	integral types.

Comments

Richard Sandiford Sept. 21, 2017, 8:31 p.m. UTC | #1
Alexander Monakov <amonakov@ispras.ru> writes:
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 163b7a2d03b..4f9a8802aeb 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1542,19 +1542,17 @@ compare_access_positions (const void *a, const void *b)
>  	       && TREE_CODE (f2->type) != COMPLEX_TYPE
>  	       && TREE_CODE (f2->type) != VECTOR_TYPE)
>  	return -1;
> +      /* Put any non-integral type before any integral type.  */
> +      else if (INTEGRAL_TYPE_P (f1->type)
> +	       && !INTEGRAL_TYPE_P (f2->type))
> +	return 1;
> +      else if (!INTEGRAL_TYPE_P (f1->type)
> +	       && INTEGRAL_TYPE_P (f2->type))
> +	return -1;
>        /* Put the integral type with the bigger precision first.  */
>        else if (INTEGRAL_TYPE_P (f1->type)
>  	       && INTEGRAL_TYPE_P (f2->type))
>  	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
> -      /* Put any integral type with non-full precision last.  */
> -      else if (INTEGRAL_TYPE_P (f1->type)
> -	       && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
> -		   != TYPE_PRECISION (f1->type)))
> -	return 1;
> -      else if (INTEGRAL_TYPE_P (f2->type)
> -	       && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
> -		   != TYPE_PRECISION (f2->type)))
> -	return -1;
>        /* Stabilize the sort.  */
>        return TYPE_UID (f1->type) - TYPE_UID (f2->type);
>      }

LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION
test fails to stabilise the sort if you have two integral types with
the same precision?

Thanks,
Richard
Alexander Monakov Sept. 25, 2017, 9:23 a.m. UTC | #2
On Thu, 21 Sep 2017, Richard Sandiford wrote:
> LGTM FWIW, but isn't there also the problem that the TYPE_PRECISION
> test fails to stabilise the sort if you have two integral types with
> the same precision?

Yes, but that's a pre-existing issue, so I didn't change it in the patch.
I think GCC broadly makes an assumption that libc qsort is deterministic,
i.e. as long as comparators are deterministic, outcome of qsort doesn't
change across compiler runs.  In this case qsort behaviour affects IR on
a testcase like this:

long g()
{
  struct s {union { unsigned long ul; long l;}; } s;
  asm("# %0" : "=X"(s.ul));
  return s.l;
}

(sort order determines types of post-early-SRA temporaries), so I guess
it's desirable to stabilize it.

Is this patch OK as is, or do I need to resubmit a variant that uses
TYPE_UID comparison in case TYPE_PRECISION is equal?

Thanks.
Alexander
Martin Jambor Sept. 25, 2017, 12:36 p.m. UTC | #3
On Thu, Sep 21, 2017 at 08:27:31PM +0300, Alexander Monakov wrote:
> Hi,
> 
> The compare_access_positions qsort comparator lacks transitivity, although
> somewhat surprisingly this issue didn't manifest on 64-bit x86 bootstraps.
> The first invalid comparison step is here (tree-sra.c:1545):
> 
>       /* Put the integral type with the bigger precision first.  */
>       else if (INTEGRAL_TYPE_P (f1->type)
> 	       && INTEGRAL_TYPE_P (f2->type))
> 	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
> 
> Imagine you have items A, B, C such that they compare equal according to
> preceding comparison steps, A and C are integral and have precision 64 resp.
> 32, B is non-integral.  Then you have C < A according to this step, but
> comparisons against B depend on TYPE_UID, so you can end up with A < B < C < A.
> 
> A minimal fix would be to order all integral items before/after non-integral,
> like preceding code already does for aggregate/vector/complex types.

Thanks for spotting this.  Nevertheless, I would prefer SRA to select
integer types over non-integer ones (i.e. floats), because in the
common scenario which is a union of a float and an int, the int is the
type for which we can enable more subsequent optimizations, even if it
means that we do not scalarize some exotic cases.  So I'm currently
testing the following (if we ever find that this is a problem in
practice, we can fix it at the cost of making
sort_and_splice_var_accesses more complicated but capable of undoing
this decision).

By the way, after I jettison IPA-SRA from tree-sra.c, I'll start
reworking it in a way that does not have the qsort in it.

Thanks again,

Martin

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 163b7a2d03b..0f92033d0bb 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b)
 	       && TREE_CODE (f2->type) != COMPLEX_TYPE
 	       && TREE_CODE (f2->type) != VECTOR_TYPE)
 	return -1;
-      /* Put the integral type with the bigger precision first.  */
+      /* Put any integral type before any non-integral type.  When splicing, we
+	 make sure that those with insufficient precision and occupupying the
+	 same space are not scalarized.  */
       else if (INTEGRAL_TYPE_P (f1->type)
+	       && !INTEGRAL_TYPE_P (f2->type))
+	return -1;
+      else if (!INTEGRAL_TYPE_P (f1->type)
 	       && INTEGRAL_TYPE_P (f2->type))
-	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
-      /* Put any integral type with non-full precision last.  */
-      else if (INTEGRAL_TYPE_P (f1->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
-		   != TYPE_PRECISION (f1->type)))
 	return 1;
-      else if (INTEGRAL_TYPE_P (f2->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
-		   != TYPE_PRECISION (f2->type)))
-	return -1;
+      /* Put the integral type with the bigger precision first.  */
+      else if (INTEGRAL_TYPE_P (f1->type)
+	       && INTEGRAL_TYPE_P (f2->type)
+	       && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type)))
+	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
       /* Stabilize the sort.  */
       return TYPE_UID (f1->type) - TYPE_UID (f2->type);
     }
@@ -2055,6 +2056,9 @@ sort_and_splice_var_accesses (tree var)
       bool grp_partial_lhs = access->grp_partial_lhs;
       bool first_scalar = is_gimple_reg_type (access->type);
       bool unscalarizable_region = access->grp_unscalarizable_region;
+      bool non_full_precision = (INTEGRAL_TYPE_P (access->type)
+				 && (TREE_INT_CST_LOW (TYPE_SIZE (access->type))
+				     != TYPE_PRECISION (access->type)));
 
       if (first || access->offset >= high)
 	{
@@ -2102,6 +2106,21 @@ sort_and_splice_var_accesses (tree var)
 	     this combination of size and offset, the comparison function
 	     should have put the scalars first.  */
 	  gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
+	  /* It also prefers integral types to non-integral.  However, when the
+	     precision of the selected type does not span the entire area and
+	     should also be used for a non-integer (i.e. float), we must not
+	     let that happen.  */
+	  if (non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Cannot sclarize the following access "
+			   "because insufficient precision integer type was "
+			   "selected.\n  ");
+		  dump_access (dump_file, access, false);
+		}
+	      unscalarizable_region = true;
+	    }
 	  ac2->group_representative = access;
 	  j++;
 	}
Alexander Monakov Sept. 25, 2017, 1:22 p.m. UTC | #4
On Mon, 25 Sep 2017, Martin Jambor wrote:
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b)
>  	       && TREE_CODE (f2->type) != COMPLEX_TYPE
>  	       && TREE_CODE (f2->type) != VECTOR_TYPE)
>  	return -1;
> -      /* Put the integral type with the bigger precision first.  */
> +      /* Put any integral type before any non-integral type.  When splicing, we
> +	 make sure that those with insufficient precision and occupupying the

Typo (s/upup/up).

> @@ -2102,6 +2106,21 @@ sort_and_splice_var_accesses (tree var)
>  	     this combination of size and offset, the comparison function
>  	     should have put the scalars first.  */
>  	  gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
> +	  /* It also prefers integral types to non-integral.  However, when the
> +	     precision of the selected type does not span the entire area and
> +	     should also be used for a non-integer (i.e. float), we must not
> +	     let that happen.  */
> +	  if (non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
> +	    {
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +		{
> +		  fprintf (dump_file, "Cannot sclarize the following access "

Typo ('scalarize').

Thanks!  If this is resolved, haifa-sched autoprefetch ranking will become the
last remaining (among discovered so far) inconsistent qsort comparator in GCC.

Alexander
Martin Jambor Sept. 26, 2017, 3:20 p.m. UTC | #5
Hi,

On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote:
> 
> Thanks!  If this is resolved, haifa-sched autoprefetch ranking will become the
> last remaining (among discovered so far) inconsistent qsort comparator in GCC.
> 

So the following has passed bootstrap and testing on x86_64-linux
(including Ada).  I have failed to create a C/C++ testcase but when I
was trying to come up with one I realized there is code in
analyze_access_subtree that replaces integer types with smaller
precision with full-precision ones for all cases but bit-fields.
Therefore I adjusted the disqualification to only trigger for
bit-fields (that happen to be of equal size as a non-integer register
type), which I think is very unlikely to ever happen.

Richi, is it OK for trunk?

Thanks,

Martin


2017-09-26  Martin Jambor  <mjambor@suse.cz>

	* tree-sra.c (compare_access_positions): Put integral types first,
	stabilize sorting of integral types, remove conditions putting
	non-full-precision integers last.
	(sort_and_splice_var_accesses): Disable scalarization if a
	non-integert would be represented by a non-full-precision integer.
---
 gcc/tree-sra.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 163b7a2d03b..f5675edc7f1 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const void *b)
 	       && TREE_CODE (f2->type) != COMPLEX_TYPE
 	       && TREE_CODE (f2->type) != VECTOR_TYPE)
 	return -1;
-      /* Put the integral type with the bigger precision first.  */
+      /* Put any integral type before any non-integral type.  When splicing, we
+	 make sure that those with insufficient precision and occupying the
+	 same space are not scalarized.  */
       else if (INTEGRAL_TYPE_P (f1->type)
+	       && !INTEGRAL_TYPE_P (f2->type))
+	return -1;
+      else if (!INTEGRAL_TYPE_P (f1->type)
 	       && INTEGRAL_TYPE_P (f2->type))
-	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
-      /* Put any integral type with non-full precision last.  */
-      else if (INTEGRAL_TYPE_P (f1->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
-		   != TYPE_PRECISION (f1->type)))
 	return 1;
-      else if (INTEGRAL_TYPE_P (f2->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
-		   != TYPE_PRECISION (f2->type)))
-	return -1;
+      /* Put the integral type with the bigger precision first.  */
+      else if (INTEGRAL_TYPE_P (f1->type)
+	       && INTEGRAL_TYPE_P (f2->type)
+	       && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type)))
+	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
       /* Stabilize the sort.  */
       return TYPE_UID (f1->type) - TYPE_UID (f2->type);
     }
@@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var)
       bool grp_partial_lhs = access->grp_partial_lhs;
       bool first_scalar = is_gimple_reg_type (access->type);
       bool unscalarizable_region = access->grp_unscalarizable_region;
+      bool bf_non_full_precision
+	= (INTEGRAL_TYPE_P (access->type)
+	   && TYPE_PRECISION (access->type) != access->size
+	   && TREE_CODE (access->expr) == COMPONENT_REF
+	   && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)));
 
       if (first || access->offset >= high)
 	{
@@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var)
 	     this combination of size and offset, the comparison function
 	     should have put the scalars first.  */
 	  gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
+	  /* It also prefers integral types to non-integral.  However, when the
+	     precision of the selected type does not span the entire area and
+	     should also be used for a non-integer (i.e. float), we must not
+	     let that happen.  Normally analyze_access_subtree expands the type
+	     to cover the entire area but for bit-fields it doesn't.  */
+	  if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Cannot scalarize the following access "
+			   "because insufficient precision integer type was "
+			   "selected.\n  ");
+		  dump_access (dump_file, access, false);
+		}
+	      unscalarizable_region = true;
+	    }
 	  ac2->group_representative = access;
 	  j++;
 	}
Richard Biener Sept. 26, 2017, 3:33 p.m. UTC | #6
On September 26, 2017 5:20:25 PM GMT+02:00, Martin Jambor <mjambor@suse.cz> wrote:
>Hi,
>
>On Mon, Sep 25, 2017 at 04:22:06PM +0300, Alexander Monakov wrote:
>> 
>> Thanks!  If this is resolved, haifa-sched autoprefetch ranking will
>become the
>> last remaining (among discovered so far) inconsistent qsort
>comparator in GCC.
>> 
>
>So the following has passed bootstrap and testing on x86_64-linux
>(including Ada).  I have failed to create a C/C++ testcase but when I
>was trying to come up with one I realized there is code in
>analyze_access_subtree that replaces integer types with smaller
>precision with full-precision ones for all cases but bit-fields.
>Therefore I adjusted the disqualification to only trigger for
>bit-fields (that happen to be of equal size as a non-integer register
>type), which I think is very unlikely to ever happen.
>
>Richi, is it OK for trunk?

OK. 

Richard. 

>Thanks,
>
>Martin
>
>
>2017-09-26  Martin Jambor  <mjambor@suse.cz>
>
>	* tree-sra.c (compare_access_positions): Put integral types first,
>	stabilize sorting of integral types, remove conditions putting
>	non-full-precision integers last.
>	(sort_and_splice_var_accesses): Disable scalarization if a
>	non-integert would be represented by a non-full-precision integer.
>---
> gcc/tree-sra.c | 42 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 32 insertions(+), 10 deletions(-)
>
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 163b7a2d03b..f5675edc7f1 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -1542,19 +1542,20 @@ compare_access_positions (const void *a, const
>void *b)
> 	       && TREE_CODE (f2->type) != COMPLEX_TYPE
> 	       && TREE_CODE (f2->type) != VECTOR_TYPE)
> 	return -1;
>-      /* Put the integral type with the bigger precision first.  */
>+      /* Put any integral type before any non-integral type.  When
>splicing, we
>+	 make sure that those with insufficient precision and occupying the
>+	 same space are not scalarized.  */
>       else if (INTEGRAL_TYPE_P (f1->type)
>+	       && !INTEGRAL_TYPE_P (f2->type))
>+	return -1;
>+      else if (!INTEGRAL_TYPE_P (f1->type)
> 	       && INTEGRAL_TYPE_P (f2->type))
>-	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
>-      /* Put any integral type with non-full precision last.  */
>-      else if (INTEGRAL_TYPE_P (f1->type)
>-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
>-		   != TYPE_PRECISION (f1->type)))
> 	return 1;
>-      else if (INTEGRAL_TYPE_P (f2->type)
>-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
>-		   != TYPE_PRECISION (f2->type)))
>-	return -1;
>+      /* Put the integral type with the bigger precision first.  */
>+      else if (INTEGRAL_TYPE_P (f1->type)
>+	       && INTEGRAL_TYPE_P (f2->type)
>+	       && (TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type)))
>+	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
>       /* Stabilize the sort.  */
>       return TYPE_UID (f1->type) - TYPE_UID (f2->type);
>     }
>@@ -2055,6 +2056,11 @@ sort_and_splice_var_accesses (tree var)
>       bool grp_partial_lhs = access->grp_partial_lhs;
>       bool first_scalar = is_gimple_reg_type (access->type);
>       bool unscalarizable_region = access->grp_unscalarizable_region;
>+      bool bf_non_full_precision
>+	= (INTEGRAL_TYPE_P (access->type)
>+	   && TYPE_PRECISION (access->type) != access->size
>+	   && TREE_CODE (access->expr) == COMPONENT_REF
>+	   && DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1)));
> 
>       if (first || access->offset >= high)
> 	{
>@@ -2102,6 +2108,22 @@ sort_and_splice_var_accesses (tree var)
> 	     this combination of size and offset, the comparison function
> 	     should have put the scalars first.  */
> 	  gcc_assert (first_scalar || !is_gimple_reg_type (ac2->type));
>+	  /* It also prefers integral types to non-integral.  However, when
>the
>+	     precision of the selected type does not span the entire area and
>+	     should also be used for a non-integer (i.e. float), we must not
>+	     let that happen.  Normally analyze_access_subtree expands the
>type
>+	     to cover the entire area but for bit-fields it doesn't.  */
>+	  if (bf_non_full_precision && !INTEGRAL_TYPE_P (ac2->type))
>+	    {
>+	      if (dump_file && (dump_flags & TDF_DETAILS))
>+		{
>+		  fprintf (dump_file, "Cannot scalarize the following access "
>+			   "because insufficient precision integer type was "
>+			   "selected.\n  ");
>+		  dump_access (dump_file, access, false);
>+		}
>+	      unscalarizable_region = true;
>+	    }
> 	  ac2->group_representative = access;
> 	  j++;
> 	}
diff mbox series

Patch

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 163b7a2d03b..4f9a8802aeb 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1542,19 +1542,17 @@  compare_access_positions (const void *a, const void *b)
 	       && TREE_CODE (f2->type) != COMPLEX_TYPE
 	       && TREE_CODE (f2->type) != VECTOR_TYPE)
 	return -1;
+      /* Put any non-integral type before any integral type.  */
+      else if (INTEGRAL_TYPE_P (f1->type)
+	       && !INTEGRAL_TYPE_P (f2->type))
+	return 1;
+      else if (!INTEGRAL_TYPE_P (f1->type)
+	       && INTEGRAL_TYPE_P (f2->type))
+	return -1;
       /* Put the integral type with the bigger precision first.  */
       else if (INTEGRAL_TYPE_P (f1->type)
 	       && INTEGRAL_TYPE_P (f2->type))
 	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
-      /* Put any integral type with non-full precision last.  */
-      else if (INTEGRAL_TYPE_P (f1->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
-		   != TYPE_PRECISION (f1->type)))
-	return 1;
-      else if (INTEGRAL_TYPE_P (f2->type)
-	       && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
-		   != TYPE_PRECISION (f2->type)))
-	return -1;
       /* Stabilize the sort.  */
       return TYPE_UID (f1->type) - TYPE_UID (f2->type);
     }