Message ID | 289f93f5-fe3a-4e04-bd49-390ba7a84a94@arm.com |
---|---|
State | New |
Headers | show |
Series | [RFC] aarch64: Add support for __BitInt | expand |
On Wed, Jan 10, 2024 at 11:06 AM Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> wrote: > > Hi, > > This patch is still work in progress, but posting to show failure with > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's > because the idx (3) is out of range for the __BitInt(135) with a > limb_prec of 64. Looks like the same issue can happen on x86_64, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113315 . Thanks, Andrew > > I hacked gcc locally to work around this issue and still have one > outstanding failure, so will look to resolve that failure before posting > a new version. > > Kind Regards, > Andre
On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote: > This patch is still work in progress, but posting to show failure with > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's > because the idx (3) is out of range for the __BitInt(135) with a limb_prec > of 64. I can reproduce it, will debug it momentarily. > I hacked gcc locally to work around this issue and still have one > outstanding failure, so will look to resolve that failure before posting a > new version. I see 2 issues in your patch: 1) as written earlier, the bitint lowering pass doesn't yet support big endian, so I think at least temporarily we shouldn't pretend it works until it is fixed; https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626847.html mentions the big-endian issues 2) for n in [65, 128] I believe the aarch64 ABI says that _BitInt should be treated the same as __int128; so for those cases we want limb_mode = abi_limb_mode = TImode; those cases are just lowered to TImode arithmetics, there is no attempt to treat those as arrays of smaller limbs. It is just the [129, 65535] cases where we want limb_mode DImode (for efficiency and implementability) and abi_limb_mode TImode (for the ABI reasons, alignment, sizing, ...), i.e. the cases where the ABI says it is treated as an array of the 128-bit limbs --- gcc/config/aarch64/aarch64.cc.jj 2024-01-11 09:37:28.000000000 +0100 +++ gcc/config/aarch64/aarch64.cc 2024-01-11 09:41:03.440202903 +0100 @@ -28279,12 +28279,20 @@ aarch64_excess_precision (enum excess_pr bool aarch64_bitint_type_info (int n, struct bitint_info *info) { + /* For now, the middle-end bitint lowering doesn't handle big endian. */ + if (TARGET_BIG_END) + return false; + if (n <= 8) info->limb_mode = QImode; else if (n <= 16) info->limb_mode = HImode; else if (n <= 32) info->limb_mode = SImode; + else if (n <= 64) + info->limb_mode = DImode; + else if (n <= 128) + info->limb_mode = TImode; else info->limb_mode = DImode; Jakub
On Thu, Jan 11, 2024 at 09:53:33AM +0100, Jakub Jelinek wrote: > On Wed, Jan 10, 2024 at 07:05:39PM +0000, Andre Vieira (lists) wrote: > > This patch is still work in progress, but posting to show failure with > > bitint-7 test where handle_stmt called from lower_mergeable_stmt ICE's > > because the idx (3) is out of range for the __BitInt(135) with a limb_prec > > of 64. > > I can reproduce it, will debug it momentarily. So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge BITINT_TYPEs to determine if it can be handled cheaply. On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine, if TYPE_SIZE is equal, it means it has the same number of limbs. But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same, both are 256-bit storage, but because DImode is used as limb_mode, the former actually needs just 3 limbs, while the latter needs 4 limbs. And limb_access_type was asserting that we don't try to access 4th limb on types which actually have a precision which needs just 3 limbs. The following patch (so far tested on x86_64 with all the bitint tests plus on the bitint-7.c testcase in a cross to aarch64) should fix that. Note, for the info.extended targets (currently none, but I think arm 32-bit in the ABI is meant like that), we'll need to do something different, because the upper bits aren't just padding and should be zero/sign extended, so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine. 2024-01-11 Jakub Jelinek <jakub@redhat.com> * gimple-lower-bitint.cc (mergeable_op): Instead of comparing TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare CEIL (TYPE_PRECISION (t), limb_prec). (bitint_large_huge::handle_cast): Likewise. --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100 +++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100 @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt) && TREE_CODE (rhs_type) == BITINT_TYPE && bitint_precision_kind (lhs_type) >= bitint_prec_large && bitint_precision_kind (rhs_type) >= bitint_prec_large - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type))) + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))) { if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type)) return true; @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs if m_upwards_2limb * limb_prec is equal to lhs precision that is not the case. */ || (!m_var_msb - && tree_int_cst_equal (TYPE_SIZE (rhs_type), - TYPE_SIZE (lhs_type)) + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)) && (!m_upwards_2limb || (m_upwards_2limb * limb_prec < TYPE_PRECISION (lhs_type))))) Jakub
On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote: > So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge > BITINT_TYPEs to determine if it can be handled cheaply. > On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine, > if TYPE_SIZE is equal, it means it has the same number of limbs. > But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same, > both are 256-bit storage, but because DImode is used as limb_mode, the > former actually needs just 3 limbs, while the latter needs 4 limbs. > And limb_access_type was asserting that we don't try to access 4th limb > on types which actually have a precision which needs just 3 limbs. > > The following patch (so far tested on x86_64 with all the bitint tests plus > on the bitint-7.c testcase in a cross to aarch64) should fix that. > > Note, for the info.extended targets (currently none, but I think arm 32-bit > in the ABI is meant like that), we'll need to do something different, > because the upper bits aren't just padding and should be zero/sign extended, > so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to > treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets > I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine. > > 2024-01-11 Jakub Jelinek <jakub@redhat.com> > > * gimple-lower-bitint.cc (mergeable_op): Instead of comparing > TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare > CEIL (TYPE_PRECISION (t), limb_prec). > (bitint_large_huge::handle_cast): Likewise. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100 > +++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100 > @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt) > && TREE_CODE (rhs_type) == BITINT_TYPE > && bitint_precision_kind (lhs_type) >= bitint_prec_large > && bitint_precision_kind (rhs_type) >= bitint_prec_large > - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type))) > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))) > { > if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type)) > return true; > @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs > if m_upwards_2limb * limb_prec is equal to > lhs precision that is not the case. */ > || (!m_var_msb > - && tree_int_cst_equal (TYPE_SIZE (rhs_type), > - TYPE_SIZE (lhs_type)) > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)) > && (!m_upwards_2limb > || (m_upwards_2limb * limb_prec > < TYPE_PRECISION (lhs_type))))) > > Jakub Jakub
On Fri, 12 Jan 2024, Jakub Jelinek wrote: > On Thu, Jan 11, 2024 at 12:12:59PM +0100, Jakub Jelinek wrote: > > So, the problem was that in 2 spots I was comparing TYPE_SIZE of large/huge > > BITINT_TYPEs to determine if it can be handled cheaply. > > On x86_64 with limb_mode == abi_limb_mode (both DImode) that works fine, > > if TYPE_SIZE is equal, it means it has the same number of limbs. > > But on aarch64 TYPE_SIZE of say _BitInt(135) and _BitInt(193) is the same, > > both are 256-bit storage, but because DImode is used as limb_mode, the > > former actually needs just 3 limbs, while the latter needs 4 limbs. > > And limb_access_type was asserting that we don't try to access 4th limb > > on types which actually have a precision which needs just 3 limbs. > > > > The following patch (so far tested on x86_64 with all the bitint tests plus > > on the bitint-7.c testcase in a cross to aarch64) should fix that. > > > > Note, for the info.extended targets (currently none, but I think arm 32-bit > > in the ABI is meant like that), we'll need to do something different, > > because the upper bits aren't just padding and should be zero/sign extended, > > so if we say have limb_mode SImode, abi_limb_mode DImode, we'll need to > > treat _BitInt(135) not as 5 SImode limbs, but 6. For !info.extended targets > > I think treating _BitInt(135) as 3 DImode limbs rather than 4 is fine. > > > > 2024-01-11 Jakub Jelinek <jakub@redhat.com> > > > > * gimple-lower-bitint.cc (mergeable_op): Instead of comparing > > TYPE_SIZE (t) of large/huge BITINT_TYPEs, compare > > CEIL (TYPE_PRECISION (t), limb_prec). > > (bitint_large_huge::handle_cast): Likewise. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK > > --- gcc/gimple-lower-bitint.cc.jj 2024-01-08 13:58:21.448176859 +0100 > > +++ gcc/gimple-lower-bitint.cc 2024-01-11 11:46:49.147779946 +0100 > > @@ -231,7 +231,8 @@ mergeable_op (gimple *stmt) > > && TREE_CODE (rhs_type) == BITINT_TYPE > > && bitint_precision_kind (lhs_type) >= bitint_prec_large > > && bitint_precision_kind (rhs_type) >= bitint_prec_large > > - && tree_int_cst_equal (TYPE_SIZE (lhs_type), TYPE_SIZE (rhs_type))) > > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) > > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec))) > > { > > if (TYPE_PRECISION (rhs_type) >= TYPE_PRECISION (lhs_type)) > > return true; > > @@ -1263,8 +1264,8 @@ bitint_large_huge::handle_cast (tree lhs > > if m_upwards_2limb * limb_prec is equal to > > lhs precision that is not the case. */ > > || (!m_var_msb > > - && tree_int_cst_equal (TYPE_SIZE (rhs_type), > > - TYPE_SIZE (lhs_type)) > > + && (CEIL (TYPE_PRECISION (lhs_type), limb_prec) > > + == CEIL (TYPE_PRECISION (rhs_type), limb_prec)) > > && (!m_upwards_2limb > > || (m_upwards_2limb * limb_prec > > < TYPE_PRECISION (lhs_type))))) > > > > Jakub > > Jakub > >
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index a5a6b52730d6c5013346d128e89915883f1707ae..15fb0ece5256f25c2ca8bb5cb82fc61488d0393e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -6534,7 +6534,7 @@ aarch64_return_in_memory_1 (const_tree type) machine_mode ag_mode; int count; - if (!AGGREGATE_TYPE_P (type) + if (!(AGGREGATE_TYPE_P (type) || TREE_CODE (type) == BITINT_TYPE) && TREE_CODE (type) != COMPLEX_TYPE && TREE_CODE (type) != VECTOR_TYPE) /* Simple scalar types always returned in registers. */ @@ -6618,6 +6618,10 @@ aarch64_function_arg_alignment (machine_mode mode, const_tree type, gcc_assert (TYPE_MODE (type) == mode); + if (TREE_CODE (type) == BITINT_TYPE + && int_size_in_bytes (type) > 16) + return GET_MODE_ALIGNMENT (TImode); + if (!AGGREGATE_TYPE_P (type)) { /* The ABI alignment is the natural alignment of the type, without @@ -21773,6 +21777,11 @@ aarch64_composite_type_p (const_tree type, if (type && (AGGREGATE_TYPE_P (type) || TREE_CODE (type) == COMPLEX_TYPE)) return true; + if (type + && TREE_CODE (type) == BITINT_TYPE + && int_size_in_bytes (type) > 16) + return true; + if (mode == BLKmode || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT) @@ -28265,6 +28274,29 @@ aarch64_excess_precision (enum excess_precision_type type) return FLT_EVAL_METHOD_UNPREDICTABLE; } +/* Implement TARGET_C_BITINT_TYPE_INFO. + Return true if _BitInt(N) is supported and fill its details into *INFO. */ +bool +aarch64_bitint_type_info (int n, struct bitint_info *info) +{ + if (n <= 8) + info->limb_mode = QImode; + else if (n <= 16) + info->limb_mode = HImode; + else if (n <= 32) + info->limb_mode = SImode; + else + info->limb_mode = DImode; + + if (n > 128) + info->abi_limb_mode = TImode; + else + info->abi_limb_mode = info->limb_mode; + info->big_endian = TARGET_BIG_END; + info->extended = false; + return true; +} + /* Implement TARGET_SCHED_CAN_SPECULATE_INSN. Return true if INSN can be scheduled for speculative execution. Reject the long-running division and square-root instructions. */ @@ -30374,6 +30406,9 @@ aarch64_run_selftests (void) #undef TARGET_C_EXCESS_PRECISION #define TARGET_C_EXCESS_PRECISION aarch64_excess_precision +#undef TARGET_C_BITINT_TYPE_INFO +#define TARGET_C_BITINT_TYPE_INFO aarch64_bitint_type_info + #undef TARGET_EXPAND_BUILTIN #define TARGET_EXPAND_BUILTIN aarch64_expand_builtin diff --git a/libgcc/config/aarch64/t-softfp b/libgcc/config/aarch64/t-softfp index 2e32366f891361e2056c680b2e36edb1871c7670..4302ad52eb881825d0fb65b9ebd21031781781f5 100644 --- a/libgcc/config/aarch64/t-softfp +++ b/libgcc/config/aarch64/t-softfp @@ -4,7 +4,8 @@ softfp_extensions := sftf dftf hftf bfsf softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf softfp_exclude_libgcc2 := n softfp_extras += fixhfti fixunshfti floattihf floatuntihf \ - floatdibf floatundibf floattibf floatuntibf + floatdibf floatundibf floattibf floatuntibf \ + fixtfbitint floatbitinttf TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes