diff mbox series

store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

Message ID Zysj1WBJrYz8WshJ@tucnak
State New
Headers show
Series store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439] | expand

Commit Message

Jakub Jelinek Nov. 6, 2024, 8:07 a.m. UTC
Hi!

encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained later in the comments:
  /* The native_encode_expr machinery uses TYPE_MODE to determine how many
     bytes to write.  This means it can write more than
     ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
     write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
     bitlen and zero out the bits that are not relevant as well (that may
     contain a sign bit due to sign-extension).  */
Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
so that case doesn't need those fancy games unless bitlen or bitpos
aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
possible).

The following patch makes us use the fast path even for empty_ctor_p
which occupy full bytes, we can just memset that in the provided buffer and
don't need to XALLOCAVEC another buffer.

This patch in itself fixes the testcase from the PR (which was about using
huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
next patch.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-11-06  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/117439
	* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
	empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
	integral mode.


	Jakub

Comments

Richard Biener Nov. 6, 2024, 8:46 a.m. UTC | #1
On Wed, 6 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
> it has to allocate a buffer and do various extra work like shifting the bits
> etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
> doesn't have corresponding integer mode.
> The last case is explained later in the comments:
>   /* The native_encode_expr machinery uses TYPE_MODE to determine how many
>      bytes to write.  This means it can write more than
>      ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
>      write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
>      bitlen and zero out the bits that are not relevant as well (that may
>      contain a sign bit due to sign-extension).  */
> Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
> or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
> so that case doesn't need those fancy games unless bitlen or bitpos
> aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
> possible).
> 
> The following patch makes us use the fast path even for empty_ctor_p
> which occupy full bytes, we can just memset that in the provided buffer and
> don't need to XALLOCAVEC another buffer.
> 
> This patch in itself fixes the testcase from the PR (which was about using
> huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
> next patch.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-11-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/117439
> 	* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
> 	empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
> 	integral mode.
> 
> --- gcc/gimple-ssa-store-merging.cc.jj	2024-10-25 10:00:29.467767871 +0200
> +++ gcc/gimple-ssa-store-merging.cc	2024-11-04 18:40:14.667260621 +0100
> @@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsign
>  		       unsigned int total_bytes)
>  {
>    unsigned int first_byte = bitpos / BITS_PER_UNIT;
> -  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> -			|| (bitpos % BITS_PER_UNIT)
> -			|| !int_mode_for_size (bitlen, 0).exists ());
>    bool empty_ctor_p
>      = (TREE_CODE (expr) == CONSTRUCTOR
>         && CONSTRUCTOR_NELTS (expr) == 0
>         && TYPE_SIZE_UNIT (TREE_TYPE (expr))
> -		       && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> +       && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> +  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> +			|| (bitpos % BITS_PER_UNIT)
> +			|| (!int_mode_for_size (bitlen, 0).exists ()
> +			    && !empty_ctor_p));
>  
>    if (!sub_byte_op_p)
>      {
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/gimple-ssa-store-merging.cc.jj	2024-10-25 10:00:29.467767871 +0200
+++ gcc/gimple-ssa-store-merging.cc	2024-11-04 18:40:14.667260621 +0100
@@ -1934,14 +1934,15 @@  encode_tree_to_bitpos (tree expr, unsign
 		       unsigned int total_bytes)
 {
   unsigned int first_byte = bitpos / BITS_PER_UNIT;
-  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
-			|| (bitpos % BITS_PER_UNIT)
-			|| !int_mode_for_size (bitlen, 0).exists ());
   bool empty_ctor_p
     = (TREE_CODE (expr) == CONSTRUCTOR
        && CONSTRUCTOR_NELTS (expr) == 0
        && TYPE_SIZE_UNIT (TREE_TYPE (expr))
-		       && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
+       && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
+  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
+			|| (bitpos % BITS_PER_UNIT)
+			|| (!int_mode_for_size (bitlen, 0).exists ()
+			    && !empty_ctor_p));
 
   if (!sub_byte_op_p)
     {