Message ID | e52abe71-2364-32e7-d29a-dc05452a06f5@gmail.com |
---|---|
State | New |
Headers | show |
Series | expmed: Allow extract_bit_field via mem for low-precision modes. | expand |
[Sorry for any weird MUA issues, don't have access to my usual set-up.] > when looking at a riscv ICE in vect-live-6.c I noticed that we > assume that the variable part (coeffs[1] * x1) of the to-be-extracted > bit number in extract_bit_field_1 is a multiple of BITS_PER_UNIT. > > This means that bits_to_bytes_round_down and num_trailing_bits > cannot handle e.g. extracting from a "VNx4BI"-mode vector which has > 4-bit precision on riscv. But in the VLA case, doesn't it instead have precision 4+4X? The problem then is that we can't tell at compile time which byte that corresponds to. So... > This patch adds a special case for that situation and sets bytenum to > zero as well as bitnum to its proper value. It works for the riscv > case because in all other situations we can align to a byte boundary. > If x1 were 3 for some reason, however, the above assertion would still > fail. I don't think this can happen for riscv as we only ever double > the number of chunks for larger vector sizes but not sure about the > general case. > > If there's another, correct way to work around feel free to suggest. > > Bootstrap/testsuite on aarch64 and x86 is running but I would be > surprised if there were any changes as riscv is the only target that > uses modes with precision < 8. > > Regards > Robin > > gcc/ChangeLog: > > * expmed.cc (extract_bit_field_1): Handle bitnum with variable > part less than BITS_PER_UNIT. > --- > gcc/expmed.cc | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/gcc/expmed.cc b/gcc/expmed.cc > index e22e43c8505..1b0119f9cfc 100644 > --- a/gcc/expmed.cc > +++ b/gcc/expmed.cc > @@ -1858,8 +1858,22 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, > but is useful for things like vector booleans. */ > if (MEM_P (op0) && !bitnum.is_constant ()) > { > - bytenum = bits_to_bytes_round_down (bitnum); > - bitnum = num_trailing_bits (bitnum); > + /* bits_to_bytes_round_down tries to align to a byte (BITS_PER_UNIT) > + boundary and asserts that bitnum.coeffs[1] % BITS_PER_UNIT == 0. > + For modes with precision < BITS_PER_UNIT this fails but we can > + still extract from the first byte. */ > + poly_uint16 prec = GET_MODE_PRECISION (outermode); > + if (prec.coeffs[1] < BITS_PER_UNIT && bitnum.coeffs[1] < BITS_PER_UNIT) > + { > + bytenum = 0; > + bitnum = bitnum.coeffs[0] & (BITS_PER_UNIT - 1); ...this doesn't look right. We can't drop bitnum.coeffs[1] when it's nonzero, because it says that for some runtime vector sizes, the bit position might be higher than bitnum.coeffs[0]. Also, it's not possible to access coeffs[1] unconditionally in target-independent code. Thanks, Richard > + } > + else > + { > + bytenum = bits_to_bytes_round_down (bitnum); > + bitnum = num_trailing_bits (bitnum); > + } > + > poly_uint64 bytesize = bits_to_bytes_round_up (bitnum + bitsize); > op0 = adjust_bitfield_address_size (op0, BLKmode, bytenum, bytesize); > op0_mode = opt_scalar_int_mode ();
> But in the VLA case, doesn't it instead have precision 4+4X? > The problem then is that we can't tell at compile time which > byte that corresponds to. So... Yes 4 + 4x. I keep getting confused with poly modes :) In this case we want to extract the bitnum [3 4] = 3 + 4x which would be in byte 0 for x = 0 or x = 1 and in byte 1 for x = 2, 3 and so on. Can't we still make that work somehow? As far as I can tell we're looking for the byte range to be accessed. It's not like we have a precision or bitnum of e.g. [3 17] where the access could be anywhere but still a pow2 fraction of BITS_PER_UNIT. I'm just having trouble writing that down. What about something like int factor = BITS_PER_UINT / prec.coeffs[0]; bytenum = force_align_down_and_div (bitnum, prec.coeffs[0]); bytenum *= factor; (or a similar thing done manually without helpers) guarded by the proper condition? Or do we need something more generic for the factor (i.e. prec.coeffs[0]) is not enough when we have a precision like [8 16]? Does that even exist?. Regards Robin
Robin Dapp <rdapp.gcc@gmail.com> writes: >> But in the VLA case, doesn't it instead have precision 4+4X? >> The problem then is that we can't tell at compile time which >> byte that corresponds to. So... > > Yes 4 + 4x. I keep getting confused with poly modes :) > In this case we want to extract the bitnum [3 4] = 3 + 4x which > would be in byte 0 for x = 0 or x = 1 and in byte 1 for x = 2, 3 and > so on. > > Can't we still make that work somehow? As far as I can tell we're looking > for the byte range to be accessed. It's not like we have a precision or > bitnum of e.g. [3 17] where the access could be anywhere but still a pow2 > fraction of BITS_PER_UNIT. > > I'm just having trouble writing that down. > > What about something like > > int factor = BITS_PER_UINT / prec.coeffs[0]; > bytenum = force_align_down_and_div (bitnum, prec.coeffs[0]); > bytenum *= factor; > > (or a similar thing done manually without helpers) guarded by the > proper condition? > Or do we need something more generic for the factor (i.e. prec.coeffs[0]) > is not enough when we have a precision like [8 16]? Does that even exist?. It's not just a question of which byte though. It's also a question of which bit. One option would be to code-generate for even X and for odd X, and select between them at runtime. But that doesn't scale well to 2+2X and 1+1X. Otherwise I think we need to treat the bit position as a variable, with bitpos % 8 and bitpos / 8 being calculated at runtime. Thanks, Richard
> It's not just a question of which byte though. It's also a question > of which bit. > > One option would be to code-generate for even X and for odd X, and select > between them at runtime. But that doesn't scale well to 2+2X and 1+1X. > > Otherwise I think we need to treat the bit position as a variable, > with bitpos % 8 and bitpos / 8 being calculated at runtime. Thanks. I worked around it with a backend vec_extract<VNxBI>QI expander so we don't run into that situation directly anymore. The problem is of course still latent and I'm going to look at it again after some other things on my plate. Regards Robin
Robin Dapp via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> It's not just a question of which byte though. It's also a question >> of which bit. >> >> One option would be to code-generate for even X and for odd X, and select >> between them at runtime. But that doesn't scale well to 2+2X and 1+1X. >> >> Otherwise I think we need to treat the bit position as a variable, >> with bitpos % 8 and bitpos / 8 being calculated at runtime. > > Thanks. I worked around it with a backend vec_extract<VNxBI>QI expander > so we don't run into that situation directly anymore. The problem is of > course still latent and I'm going to look at it again after some other things > on my plate. Yeah, sounds like a good workaround. If the target has an efficient way of coping with the VLAness then the optab will probably be better than whatever the generic code ends up being. Thanks, Richard
diff --git a/gcc/expmed.cc b/gcc/expmed.cc index e22e43c8505..1b0119f9cfc 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -1858,8 +1858,22 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, but is useful for things like vector booleans. */ if (MEM_P (op0) && !bitnum.is_constant ()) { - bytenum = bits_to_bytes_round_down (bitnum); - bitnum = num_trailing_bits (bitnum); + /* bits_to_bytes_round_down tries to align to a byte (BITS_PER_UNIT) + boundary and asserts that bitnum.coeffs[1] % BITS_PER_UNIT == 0. + For modes with precision < BITS_PER_UNIT this fails but we can + still extract from the first byte. */ + poly_uint16 prec = GET_MODE_PRECISION (outermode); + if (prec.coeffs[1] < BITS_PER_UNIT && bitnum.coeffs[1] < BITS_PER_UNIT) + { + bytenum = 0; + bitnum = bitnum.coeffs[0] & (BITS_PER_UNIT - 1); + } + else + { + bytenum = bits_to_bytes_round_down (bitnum); + bitnum = num_trailing_bits (bitnum); + } + poly_uint64 bytesize = bits_to_bytes_round_up (bitnum + bitsize); op0 = adjust_bitfield_address_size (op0, BLKmode, bytenum, bytesize); op0_mode = opt_scalar_int_mode ();