Message ID | ZcKhiVtTYSB7vV9T@tucnak |
---|---|
State | New |
Headers | show |
Series | wide-int: Fix mul_internal overflow handling [PR113753] | expand |
On Tue, 6 Feb 2024, Jakub Jelinek wrote: > Hi! > > As the following testcases show, the overflow (needs_overflow) and high > handling in wi::mul_internal seem to only work correctly for either > small precisions (less than or equal to 32, that is handled by earlier > simpler code, not the full Knuth's multiplication algorithm) or for > precisions which are multiple of HOST_BITS_PER_WIDE_INT, so it happened > to work fine in most pre-_BitInt era types (and for large bitfields we > probably didn't have good coverage or were lucky and nothing was asking > if there was overflow or not; I think high multiplication is typically > used only when we have optab in corresponding mode). > E.g. on the gcc.dg/bitint-87.c testcase, there were overflow warnings > emitted only the the / 2wb * 3wb _BitInt(128) cases, but not in the > other precisions. > > I found 3 issues when prec > HOST_BITS_PER_HALF_WIDE_INT and > (prec % HOST_BITS_PER_WIDE_INT) != 0: > 1) the checking for overflow was simply checking the values of the > r array members from half_blocks_needed to 2 * half_blocks_needed - 1, > for UNSIGNED overflow checking if any of them is non-zero, for > SIGNED comparing them if any is different from top where top is computed > from the sign bit of the result (see below); similarly, the highpart > multiplication simply picks the half limbs at r + half_blocks_needed > offset; and furthermore, for SIGNED there is an adjustment if either > operand was negative which also just walks r half-limbs from > half_blocks_needed onwards; > this works great for precisions like 64 or 128, but for precisions like > 129, 159, 160 or 161 doesn't, it would need to walk the bits in the > half-limbs starting right above the most significant bit of the base > precision; that can be up to a whole half-limb and all but one bit from > the one below it earlier > 2) as the comment says, the multiplication is originally done as unsigned > multiplication, with adjustment of the high bits which subtracts the > other operand once: > if (wi::neg_p (op1)) > { > b = 0; > for (i = 0; i < half_blocks_needed; i++) > { > t = (unsigned HOST_WIDE_INT)r[i + half_blocks_needed] > - (unsigned HOST_WIDE_INT)v[i] - b; > r[i + half_blocks_needed] = t & HALF_INT_MASK; > b = t >> (HOST_BITS_PER_WIDE_INT - 1); > } > } > and similarly for the other one. Now, this also only works nicely if > a negative operand has just a single sign bit set in the given precision; > but we were unpacking the operands with wi_unpack (..., SIGNED);, so > say for the negative operand in 129-bit precision, that means the least > significant bit of u[half_blocks_needed - 2] (or v instead of u depending > on which argument it is) is the set sign bit, but then there are 31 > further copies of the sign bit in u[half_blocks_needed - 2] and > further 32 copies in u[half_blocks_needed - 1]; the above adjustment > for signed operands doesn't really do the right job in such cases, it > would need to subtract many more times the other operand > 3) the computation of top for SIGNED > top = r[(half_blocks_needed) - 1]; > top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2)); > top &= mask; > also uses the most significant bit which fits into prec of the result > only if prec is multiple of HOST_BITS_PER_WIDE_INT, otherwise we need > to look at a different bit and sometimes it can be also a bit in > r[half_blocks_needed - 2] > > For 1), while for UNSIGNED overflow it could be fairly easy to check > the bits above prec in r half-limbs for being non-zero, doing all the > shifts also in the SIGNED adjustment code in 2 further locations and finally > for the high handling (unless we want to assert one doesn't do the highpart > multiply for such precisions) would be quite ugly and hard to maintain, so > I instead chose (implemented in the second hunk) to shift the > beyond-precision bits up such that the expectations of the rest of the > code are met, that is the LSB of r[half_blocks_needed] after adjustment > is the bit immediately above the precision, etc. We don't need to care > about the bits it shifts out, because the multiplication will yield at most > 2 * prec bits. > > For 2), the patch changes the wi_unpack argument from SIGNED to UNSIGNED, > so that we get all zero bits above the precision. > > And finally for 3) it does shifts and perhaps picks lower r half-limb so > that it uses the actual MSB of the result within prec. > > Bootstrapped/regtested on x86_64-linux and i686-linux, and additionally > tested with > make check-gcc -k -j32 GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg-torture.exp=*bitint*" > Ok for trunk? OK. Thanks, Richard. > 2024-02-06 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/113753 > * wide-int.cc (wi::mul_internal): Unpack op1val and op2val with > UNSIGNED rather than SIGNED. If high or needs_overflow and prec is > not a multiple of HOST_BITS_PER_WIDE_INT, shift left bits above prec > so that they start with r[half_blocks_needed] lowest bit. Fix up > computation of top mask for SIGNED. > > * gcc.dg/torture/bitint-56.c: New test. > * gcc.dg/bitint-87.c: New test. > > --- gcc/wide-int.cc.jj 2024-02-06 08:43:15.069885709 +0100 > +++ gcc/wide-int.cc 2024-02-06 10:29:18.137373518 +0100 > @@ -1483,8 +1483,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co > } > > /* We do unsigned mul and then correct it. */ > - wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED); > - wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED); > + wi_unpack (u, op1val, op1len, half_blocks_needed, prec, UNSIGNED); > + wi_unpack (v, op2val, op2len, half_blocks_needed, prec, UNSIGNED); > > /* The 2 is for a full mult. */ > memset (r, 0, half_blocks_needed * 2 > @@ -1503,6 +1503,28 @@ wi::mul_internal (HOST_WIDE_INT *val, co > r[j + half_blocks_needed] = k; > } > > + unsigned int shift; > + if ((high || needs_overflow) && (shift = prec % HOST_BITS_PER_WIDE_INT) != 0) > + { > + /* The high or needs_overflow code assumes that the high bits > + only appear from r[half_blocks_needed] up to > + r[half_blocks_needed * 2 - 1]. If prec is not a multiple > + of HOST_BITS_PER_WIDE_INT, shift the bits above prec up > + to make that code simple. */ > + if (shift == HOST_BITS_PER_HALF_WIDE_INT) > + memmove (&r[half_blocks_needed], &r[half_blocks_needed - 1], > + sizeof (r[0]) * half_blocks_needed); > + else > + { > + unsigned int skip = shift < HOST_BITS_PER_HALF_WIDE_INT; > + if (!skip) > + shift -= HOST_BITS_PER_HALF_WIDE_INT; > + for (i = 2 * half_blocks_needed - 1; i >= half_blocks_needed; i--) > + r[i] = ((r[i - skip] << (-shift % HOST_BITS_PER_HALF_WIDE_INT)) > + | (r[i - skip - 1] >> shift)); > + } > + } > + > /* We did unsigned math above. For signed we must adjust the > product (assuming we need to see that). */ > if (sgn == SIGNED && (high || needs_overflow)) > @@ -1543,8 +1565,12 @@ wi::mul_internal (HOST_WIDE_INT *val, co > top = 0; > else > { > - top = r[(half_blocks_needed) - 1]; > - top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2)); > + top = r[half_blocks_needed - 1 > + - ((-prec % HOST_BITS_PER_WIDE_INT) > + >= HOST_BITS_PER_HALF_WIDE_INT)]; > + top = SIGN_MASK (((unsigned HOST_WIDE_INT) top) > + << (HOST_BITS_PER_WIDE_INT / 2 > + + (-prec % HOST_BITS_PER_HALF_WIDE_INT))); > top &= mask; > } > > --- gcc/testsuite/gcc.dg/torture/bitint-56.c.jj 2024-02-06 08:53:45.584120553 +0100 > +++ gcc/testsuite/gcc.dg/torture/bitint-56.c 2024-02-06 08:53:45.584120553 +0100 > @@ -0,0 +1,25 @@ > +/* PR tree-optimization/113753 */ > +/* { dg-do run { target bitint } } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ > + > +#if __BITINT_MAXWIDTH__ >= 129 > +unsigned _BitInt(128) > +foo (unsigned u, unsigned _BitInt(128) a, unsigned _BitInt(128) b) > +{ > + unsigned _BitInt(129) m = a % b; > + return u * m / u; > +} > +#endif > + > +int > +main () > +{ > +#if __BITINT_MAXWIDTH__ >= 129 > + if (foo (0xfa637c33, 0x37af7fe8b0000000000000000wb, > + 0xfffffffff0000000000000000wb) > + != 0x16f7e93f6d726b38b38d0b753wb) > + __builtin_abort (); > +#endif > +} > --- gcc/testsuite/gcc.dg/bitint-87.c.jj 2024-01-13 00:05:00.077372302 +0100 > +++ gcc/testsuite/gcc.dg/bitint-87.c 2024-02-06 10:41:33.403167442 +0100 > @@ -0,0 +1,24 @@ > +/* PR tree-optimization/113753 */ > +/* { dg-do compile { target bitint575 } } */ > +/* { dg-options "-std=gnu23" } */ > + > +_BitInt(161) a = 1461501637330902918203684832716283019655932542975wb / 2wb * 2wb; > +_BitInt(160) b = 730750818665451459101842416358141509827966271487wb / 2wb * 2wb; > +_BitInt(159) c = 365375409332725729550921208179070754913983135743wb / 2wb * 2wb; > +_BitInt(129) d = 340282366920938463463374607431768211455wb / 2wb * 2wb; > +_BitInt(128) e = 170141183460469231731687303715884105727wb / 2wb * 2wb; > +_BitInt(161) f = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 2wb; > +_BitInt(160) g = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 2wb; > +_BitInt(159) h = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 2wb; > +_BitInt(129) i = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 2wb; > +_BitInt(128) j = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 2wb; > +_BitInt(161) k = 1461501637330902918203684832716283019655932542975wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */ > +_BitInt(160) l = 730750818665451459101842416358141509827966271487wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */ > +_BitInt(159) m = 365375409332725729550921208179070754913983135743wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */ > +_BitInt(129) n = 340282366920938463463374607431768211455wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */ > +_BitInt(128) o = 170141183460469231731687303715884105727wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */ > +_BitInt(161) p = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */ > +_BitInt(160) q = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */ > +_BitInt(159) r = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */ > +_BitInt(129) s = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */ > +_BitInt(128) t = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */ > > Jakub > >
--- gcc/wide-int.cc.jj 2024-02-06 08:43:15.069885709 +0100 +++ gcc/wide-int.cc 2024-02-06 10:29:18.137373518 +0100 @@ -1483,8 +1483,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED); - wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED); + wi_unpack (u, op1val, op1len, half_blocks_needed, prec, UNSIGNED); + wi_unpack (v, op2val, op2len, half_blocks_needed, prec, UNSIGNED); /* The 2 is for a full mult. */ memset (r, 0, half_blocks_needed * 2 @@ -1503,6 +1503,28 @@ wi::mul_internal (HOST_WIDE_INT *val, co r[j + half_blocks_needed] = k; } + unsigned int shift; + if ((high || needs_overflow) && (shift = prec % HOST_BITS_PER_WIDE_INT) != 0) + { + /* The high or needs_overflow code assumes that the high bits + only appear from r[half_blocks_needed] up to + r[half_blocks_needed * 2 - 1]. If prec is not a multiple + of HOST_BITS_PER_WIDE_INT, shift the bits above prec up + to make that code simple. */ + if (shift == HOST_BITS_PER_HALF_WIDE_INT) + memmove (&r[half_blocks_needed], &r[half_blocks_needed - 1], + sizeof (r[0]) * half_blocks_needed); + else + { + unsigned int skip = shift < HOST_BITS_PER_HALF_WIDE_INT; + if (!skip) + shift -= HOST_BITS_PER_HALF_WIDE_INT; + for (i = 2 * half_blocks_needed - 1; i >= half_blocks_needed; i--) + r[i] = ((r[i - skip] << (-shift % HOST_BITS_PER_HALF_WIDE_INT)) + | (r[i - skip - 1] >> shift)); + } + } + /* We did unsigned math above. For signed we must adjust the product (assuming we need to see that). */ if (sgn == SIGNED && (high || needs_overflow)) @@ -1543,8 +1565,12 @@ wi::mul_internal (HOST_WIDE_INT *val, co top = 0; else { - top = r[(half_blocks_needed) - 1]; - top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2)); + top = r[half_blocks_needed - 1 + - ((-prec % HOST_BITS_PER_WIDE_INT) + >= HOST_BITS_PER_HALF_WIDE_INT)]; + top = SIGN_MASK (((unsigned HOST_WIDE_INT) top) + << (HOST_BITS_PER_WIDE_INT / 2 + + (-prec % HOST_BITS_PER_HALF_WIDE_INT))); top &= mask; } --- gcc/testsuite/gcc.dg/torture/bitint-56.c.jj 2024-02-06 08:53:45.584120553 +0100 +++ gcc/testsuite/gcc.dg/torture/bitint-56.c 2024-02-06 08:53:45.584120553 +0100 @@ -0,0 +1,25 @@ +/* PR tree-optimization/113753 */ +/* { dg-do run { target bitint } } */ +/* { dg-options "-std=c23 -pedantic-errors" } */ +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */ +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ + +#if __BITINT_MAXWIDTH__ >= 129 +unsigned _BitInt(128) +foo (unsigned u, unsigned _BitInt(128) a, unsigned _BitInt(128) b) +{ + unsigned _BitInt(129) m = a % b; + return u * m / u; +} +#endif + +int +main () +{ +#if __BITINT_MAXWIDTH__ >= 129 + if (foo (0xfa637c33, 0x37af7fe8b0000000000000000wb, + 0xfffffffff0000000000000000wb) + != 0x16f7e93f6d726b38b38d0b753wb) + __builtin_abort (); +#endif +} --- gcc/testsuite/gcc.dg/bitint-87.c.jj 2024-01-13 00:05:00.077372302 +0100 +++ gcc/testsuite/gcc.dg/bitint-87.c 2024-02-06 10:41:33.403167442 +0100 @@ -0,0 +1,24 @@ +/* PR tree-optimization/113753 */ +/* { dg-do compile { target bitint575 } } */ +/* { dg-options "-std=gnu23" } */ + +_BitInt(161) a = 1461501637330902918203684832716283019655932542975wb / 2wb * 2wb; +_BitInt(160) b = 730750818665451459101842416358141509827966271487wb / 2wb * 2wb; +_BitInt(159) c = 365375409332725729550921208179070754913983135743wb / 2wb * 2wb; +_BitInt(129) d = 340282366920938463463374607431768211455wb / 2wb * 2wb; +_BitInt(128) e = 170141183460469231731687303715884105727wb / 2wb * 2wb; +_BitInt(161) f = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 2wb; +_BitInt(160) g = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 2wb; +_BitInt(159) h = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 2wb; +_BitInt(129) i = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 2wb; +_BitInt(128) j = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 2wb; +_BitInt(161) k = 1461501637330902918203684832716283019655932542975wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */ +_BitInt(160) l = 730750818665451459101842416358141509827966271487wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */ +_BitInt(159) m = 365375409332725729550921208179070754913983135743wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */ +_BitInt(129) n = 340282366920938463463374607431768211455wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */ +_BitInt(128) o = 170141183460469231731687303715884105727wb / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */ +_BitInt(161) p = (-1461501637330902918203684832716283019655932542975wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(161\\\)' results in" } */ +_BitInt(160) q = (-730750818665451459101842416358141509827966271487wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(160\\\)' results in" } */ +_BitInt(159) r = (-365375409332725729550921208179070754913983135743wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(159\\\)' results in" } */ +_BitInt(129) s = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(129\\\)' results in" } */ +_BitInt(128) t = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 3wb; /* { dg-warning "integer overflow in expression of type '_BitInt\\\(128\\\)' results in" } */