Message ID | Zysj1WBJrYz8WshJ@tucnak |
---|---|
State | New |
Headers | show |
Series | store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439] | expand |
On Wed, 6 Nov 2024, Jakub Jelinek wrote: > Hi! > > encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which > it has to allocate a buffer and do various extra work like shifting the bits > etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen > doesn't have corresponding integer mode. > The last case is explained later in the comments: > /* The native_encode_expr machinery uses TYPE_MODE to determine how many > bytes to write. This means it can write more than > ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example > write 8 bytes for a bitlen of 40). Skip the bytes that are not within > bitlen and zero out the bits that are not relevant as well (that may > contain a sign bit due to sign-extension). */ > Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR > or {CLOBBER}, which doesn't use native_encode_expr at all, just memset, > so that case doesn't need those fancy games unless bitlen or bitpos > aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is > possible). > > The following patch makes us use the fast path even for empty_ctor_p > which occupy full bytes, we can just memset that in the provided buffer and > don't need to XALLOCAVEC another buffer. > > This patch in itself fixes the testcase from the PR (which was about using > huge XALLLOCAVEC), but I want to do some other changes, to be posted in a > next patch. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Thanks, Richard. > 2024-11-06 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/117439 > * gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For > empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an > integral mode. > > --- gcc/gimple-ssa-store-merging.cc.jj 2024-10-25 10:00:29.467767871 +0200 > +++ gcc/gimple-ssa-store-merging.cc 2024-11-04 18:40:14.667260621 +0100 > @@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsign > unsigned int total_bytes) > { > unsigned int first_byte = bitpos / BITS_PER_UNIT; > - bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT) > - || (bitpos % BITS_PER_UNIT) > - || !int_mode_for_size (bitlen, 0).exists ()); > bool empty_ctor_p > = (TREE_CODE (expr) == CONSTRUCTOR > && CONSTRUCTOR_NELTS (expr) == 0 > && TYPE_SIZE_UNIT (TREE_TYPE (expr)) > - && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr)))); > + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr)))); > + bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT) > + || (bitpos % BITS_PER_UNIT) > + || (!int_mode_for_size (bitlen, 0).exists () > + && !empty_ctor_p)); > > if (!sub_byte_op_p) > { > > Jakub > >
--- gcc/gimple-ssa-store-merging.cc.jj 2024-10-25 10:00:29.467767871 +0200 +++ gcc/gimple-ssa-store-merging.cc 2024-11-04 18:40:14.667260621 +0100 @@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsign unsigned int total_bytes) { unsigned int first_byte = bitpos / BITS_PER_UNIT; - bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT) - || (bitpos % BITS_PER_UNIT) - || !int_mode_for_size (bitlen, 0).exists ()); bool empty_ctor_p = (TREE_CODE (expr) == CONSTRUCTOR && CONSTRUCTOR_NELTS (expr) == 0 && TYPE_SIZE_UNIT (TREE_TYPE (expr)) - && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr)))); + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr)))); + bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT) + || (bitpos % BITS_PER_UNIT) + || (!int_mode_for_size (bitlen, 0).exists () + && !empty_ctor_p)); if (!sub_byte_op_p) {