Message ID | DBBPR83MB061317A13D70B84D8EA790EEF8642@DBBPR83MB0613.EURPRD83.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
On Thu, 12 Sep 2024, Evgeny Karpov wrote: > The current binutils implementation does not support offset up to 4GB in > IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. > This is related to differences in ELF and COFF relocation records. Yes, I agree. But I would not consider this a limitation of the binutils implementation, this is a limitation of the object file format. It can't be worked around by inventing your own custom relocations, but should instead worked around on the code generation side, to avoid needing such large offsets. This approach is one such, quite valid. Another one is to generate extra symbols to allow addressing anything with a smaller offset. > To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21 > relocation will remain unchanged, and the workaround below will be applied to > bypass the 1MB offset limitation. This looks very reasonable - I presume this will make sure that you only use the other code form if the offset actually is larger than 1 MB. For the case when the offset actually is larger than 1 MB, I guess this also ends up generating some other instruction sequence than just a "add x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code, it looks like it generates something like "mov x16, #imm; add x0, x0, x16"? That's probably quite reasonable. I don't know how emit_move_insn behaves if the immediates are larger - does it generate a sequence of mov/movk to materialize a larger constant? Because the range of immediates you can encode in one single mov instruction is pretty limited anyway. // Martin
Martin Storsjö <martin@martin.st> writes: > On Thu, 12 Sep 2024, Evgeny Karpov wrote: > >> The current binutils implementation does not support offset up to 4GB in >> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. >> This is related to differences in ELF and COFF relocation records. > > Yes, I agree. > > But I would not consider this a limitation of the binutils implementation, > this is a limitation of the object file format. It can't be worked around > by inventing your own custom relocations, but should instead worked around > on the code generation side, to avoid needing such large offsets. > > This approach is one such, quite valid. Another one is to generate extra > symbols to allow addressing anything with a smaller offset. Maybe this is my ELF bias showing, but: generating extra X=Y+OFF symbols isn't generally valid for ELF when Y is a global symbol, since interposition rules, comdat, weak symbols, and various other reasons, could mean that the local definition of Y isn't the one that gets used. Does COFF cope with that in some other way? If not, I would have expected that there would need to be a fallback path that didn't involve defining extra symbols. Thanks, Richard
Evgeny Karpov <Evgeny.Karpov@microsoft.com> writes: > The current binutils implementation does not support offset up to 4GB in > IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. > This is related to differences in ELF and COFF relocation records. > There are ways to fix this. This work on relocation change will be extracted to > a separate binutils patch series and discussion. > > To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21 > relocation will remain unchanged, and the workaround below will be applied to > bypass the 1MB offset limitation. > > Regards, > Evgeny > > > The patch will be replaced by this change. Seems like a reasonable workarond to me FWIW, but some comments on the implementation below: > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 03362a975c0..5f17936df1f 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, > if (can_create_pseudo_p ()) > tmp_reg = gen_reg_rtx (mode); > > - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); > + do > + { > + if (TARGET_PECOFF) > + { > + poly_int64 offset; > + HOST_WIDE_INT const_offset; > + strip_offset (imm, &offset); > + > + if (offset.is_constant (&const_offset) > + && abs(const_offset) >= 1 << 20) abs_hwi (const_offset) (since const_offset has HOST_WIDE_INT type). > + { > + rtx const_int = imm; > + const_int = XEXP (const_int, 0); > + XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20)); CONST_INTs are shared objects, so we can't modify their value in-place. It might be easier to pass base and const_offset from the caller (aarch64_expand_mov_immediate). We are then guaranteed that the offset is constant and don't need to worry about the SVE case. The new SYM+OFF expression can be calculated using plus_constant. I think it'd be worth asserting that the offset fits in 32 bits, since if by some bug the offset is larger, we'd generate silent wrong code (in the sense that the compiler would truncate the offset before the assembler sees it). > + > + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx(imm))); > + emit_insn (gen_add_hioffset (tmp_reg, GEN_INT(const_offset))); I think the normal addition patterns can handle this, if we pass the result of the ~0xfffff calculation. There should be no need for a dedicated pattern. > + break; > + } > + } > + > + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); > + } while(0); I think it'd be clearer to duplicate the gen_add_losym and avoid the do...while(0) Thanks, Richard > + > emit_insn (gen_add_losym (dest, tmp_reg, imm)); > return; > } > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 665a333903c..072110f93e7 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -7405,6 +7405,13 @@ > DONE; > }) > > +(define_insn "add_hioffset" > + [(match_operand 0 "register_operand") > + (match_operand 1 "const_int_operand")] > + "" > + "add %0, %0, (%1 & ~0xfffff) >> 12, lsl #12" > +) > + > (define_insn "add_losym_<mode>" > [(set (match_operand:P 0 "register_operand" "=r") > (lo_sum:P (match_operand:P 1 "register_operand" "r")
On Thu, 19 Sep 2024, Richard Sandiford wrote: > Martin Storsjö <martin@martin.st> writes: >> On Thu, 12 Sep 2024, Evgeny Karpov wrote: >> >>> The current binutils implementation does not support offset up to 4GB in >>> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB. >>> This is related to differences in ELF and COFF relocation records. >> >> Yes, I agree. >> >> But I would not consider this a limitation of the binutils implementation, >> this is a limitation of the object file format. It can't be worked around >> by inventing your own custom relocations, but should instead worked around >> on the code generation side, to avoid needing such large offsets. >> >> This approach is one such, quite valid. Another one is to generate extra >> symbols to allow addressing anything with a smaller offset. > > Maybe this is my ELF bias showing, but: generating extra X=Y+OFF > symbols isn't generally valid for ELF when Y is a global symbol, since > interposition rules, comdat, weak symbols, and various other reasons, > could mean that the local definition of Y isn't the one that gets used. > Does COFF cope with that in some other way? If not, I would have > expected that there would need to be a fallback path that didn't > involve defining extra symbols. That's indeed a fair point. COFF doesn't cope with that in other ways - so defining such extra symbols to cope for the offsets, for global symbols that can be interposed or swapped out at linking stage, would indeed be wrong. In practice, I think it's rare to reference such an interposable symbol with an offset overall - even more so to reference it with an offset over 1 MB. The practical cases where one mostly runs into the limitation, is when you have large sections, and use temporary labels to reference positions within those sections. As the temporary labels don't persist into the object file, the references against temporary labels end up as against section base, plus an offset. And those symbols (the section base) aren't global. The workaround I did for this within LLVM, https://github.com/llvm/llvm-project/commit/06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3, deals specifically only with temporary symbols. // Martin
Richard Sandiford <richard.sandiford@arm.com> writes: > Evgeny Karpov <Evgeny.Karpov@microsoft.com> writes: >> + { >> + rtx const_int = imm; >> + const_int = XEXP (const_int, 0); >> + XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20)); > > CONST_INTs are shared objects, so we can't modify their value in-place. Gah, sorry, I misread. The patch was only modifying the PLUS, which should be valid. My comment below still stands though. > It might be easier to pass base and const_offset from the caller > (aarch64_expand_mov_immediate). We are then guaranteed that the > offset is constant and don't need to worry about the SVE case. > The new SYM+OFF expression can be calculated using plus_constant. > > I think it'd be worth asserting that the offset fits in 32 bits, > since if by some bug the offset is larger, we'd generate silent > wrong code (in the sense that the compiler would truncate the offset > before the assembler sees it).
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 03362a975c0..5f17936df1f 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, if (can_create_pseudo_p ()) tmp_reg = gen_reg_rtx (mode); - emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); + do + { + if (TARGET_PECOFF) + { + poly_int64 offset; + HOST_WIDE_INT const_offset; + strip_offset (imm, &offset); + + if (offset.is_constant (&const_offset) + && abs(const_offset) >= 1 << 20) + { + rtx const_int = imm; + const_int = XEXP (const_int, 0); + XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20)); + + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx(imm))); + emit_insn (gen_add_hioffset (tmp_reg, GEN_INT(const_offset))); + break; + } + } + + emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm))); + } while(0); + emit_insn (gen_add_losym (dest, tmp_reg, imm)); return; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 665a333903c..072110f93e7 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -7405,6 +7405,13 @@ DONE; }) +(define_insn "add_hioffset" + [(match_operand 0 "register_operand") + (match_operand 1 "const_int_operand")] + "" + "add %0, %0, (%1 & ~0xfffff) >> 12, lsl #12" +) + (define_insn "add_losym_<mode>" [(set (match_operand:P 0 "register_operand" "=r") (lo_sum:P (match_operand:P 1 "register_operand" "r")