Message ID | DBBPR83MB061338EAA3463FDC8AABEB9FF89E2@DBBPR83MB0613.EURPRD83.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement | expand |
On Fri, 6 Sep 2024, Evgeny Karpov wrote: > aarch64.cc has been updated to prevent emitting "symbol + offset" > for SYMBOL_SMALL_ABSOLUTE for the PECOFF target. "symbol + offset" > cannot be used in relocations for aarch64-w64-mingw32 due to > relocation requirements. > > Instead, it will adjust the address by an offset with the > "add" instruction. > > This approach allows addressing 4GB, instead of 1MB as it was before. > This issue has been fixed in the binutils patch series. > > https://sourceware.org/pipermail/binutils/2024-August/136481.html Sorry, but no. You can't just redefine how relocations in your object file format works, just because you feel like it. Didn't you see my reply about how MSVC themselves also use this relocation? // Martin
Friday, September 6, 2024 Martin Storsjö <martin@martin.st> wrote: > Sorry, but no. > > You can't just redefine how relocations in your object file format works, > just because you feel like it. This patch changes how symbol with offset will be emitted. It will change: adrp x0, symbol + offset to: adrp x0, symbol add x0, x0, offset It is not optimal, however, it is not critical, and both relocation approaches work with it. The concern regarding changing relocation in binutils is clear and will be discussed on the binutils mailing list. > Didn't you see my reply about how MSVC themselves also use this > relocation? Yes, the extended answer will be prepared. Regards, Evgeny
On Fri, 6 Sep 2024, Evgeny Karpov wrote: > Friday, September 6, 2024 > Martin Storsjö <martin@martin.st> wrote: > >> Sorry, but no. >> >> You can't just redefine how relocations in your object file format works, >> just because you feel like it. > > This patch changes how symbol with offset will be emitted. > > It will change: > adrp x0, symbol + offset > > to: > adrp x0, symbol > add x0, x0, offset I presume more precisely, it changes adrp x0, symbol + offset add x0, x0, :lo12:symbol + offset into adrp x0, symbol add x0, x0, :lo12:symbol add x0, x0, offset > It is not optimal, however, it is not critical, and both relocation approaches > work with it. Indeed, it's not optimal, and this other form is totally acceptable. And using it to work around an issue somewhere is also quite ok. But don't make claims that "'symbol + offset' cannot be used in relocations for aarch64-w64-mingw32 due to relocation requirements." - because that is just misleading. Only in the case if the offset is more than +/- 1 MB, you would need to change the form of the generated code. And if the reason for this is the binutils patch that changes how the relocation is handled, that reason will go away, because I believe that patch is demonstrably wrong and that patch should be dropped (or reworked). // Martin
Thursday, September 12, 2024 Martin Storsjö <martin@martin.st> wrote: > This looks very reasonable - I presume this will make sure that you only > use the other code form if the offset actually is larger than 1 MB. > > For the case when the offset actually is larger than 1 MB, I guess this > also ends up generating some other instruction sequence than just a "add > x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code, > it looks like it generates something like "mov x16, #imm; add x0, x0, > x16"? That's probably quite reasonable. The generated code will stay unchanged for the offset less than 1MB: adrp x0, symbol + offset add x0, x0, :lo12:symbol + offset When the offset is >= 1MB: adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in IMAGE_REL_ARM64_PAGEBASE_REL21 add x0, x0, (offset & ~0xfffff) >> 12, lsl #12 // a workaround to support 4GB offset add x0, x0, :lo12:symbol + offset % (1 << 20) Regards, Evgeny
On Thu, 12 Sep 2024, Evgeny Karpov wrote: > Thursday, September 12, 2024 > Martin Storsjö <martin@martin.st> wrote: > >> This looks very reasonable - I presume this will make sure that you only >> use the other code form if the offset actually is larger than 1 MB. >> >> For the case when the offset actually is larger than 1 MB, I guess this >> also ends up generating some other instruction sequence than just a "add >> x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code, >> it looks like it generates something like "mov x16, #imm; add x0, x0, >> x16"? That's probably quite reasonable. > > The generated code will stay unchanged for the offset less than 1MB: > > adrp x0, symbol + offset > add x0, x0, :lo12:symbol + offset > > When the offset is >= 1MB: > > adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in IMAGE_REL_ARM64_PAGEBASE_REL21 > add x0, x0, (offset & ~0xfffff) >> 12, lsl #12 // a workaround to support 4GB offset > add x0, x0, :lo12:symbol + offset % (1 << 20) Ah, I see. Yeah, that works. That won't get you up to a full 4 GB offset from your symbol though, I think that'll get you up to 16 MB offsets. In the "add x0, x0, #imm, lsl #12" case, the immediate is a 12 bit immediate, shifted left by 12, so you effectively have 24 bit range there. But clearly this works a bit further than 1 MB at least. // Martin
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 03362a975c0..3a8ecdf562b 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -4887,6 +4887,19 @@ aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src, temp1, temp2, 0, false); } +/* Load the address and apply the offset by using "add" instruction. */ + +static void +aarch64_load_symref_and_add_offset (scalar_int_mode mode, rtx dest, rtx src, + poly_int64 offset) +{ + gcc_assert (can_create_pseudo_p ()); + src = aarch64_force_temporary (mode, dest, src); + aarch64_add_offset (mode, dest, src, offset, + NULL_RTX, NULL_RTX, 0, false); +} + + /* Add DELTA to the stack pointer, marking the instructions frame-related. TEMP1 is available as a temporary if nonnull. FORCE_ISA_MODE is as for aarch64_add_offset. EMIT_MOVE_IMM is false if TEMP1 already @@ -6054,10 +6067,8 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) case SYMBOL_TINY_TLSIE: if (const_offset != 0) { - gcc_assert(can_create_pseudo_p ()); - base = aarch64_force_temporary (int_mode, dest, base); - aarch64_add_offset (int_mode, dest, base, const_offset, - NULL_RTX, NULL_RTX, 0, false); + aarch64_load_symref_and_add_offset (int_mode, dest, base, + const_offset); return; } /* FALLTHRU */ @@ -6068,6 +6079,13 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) case SYMBOL_TLSLE24: case SYMBOL_TLSLE32: case SYMBOL_TLSLE48: + if (TARGET_PECOFF && const_offset != 0) + { + aarch64_load_symref_and_add_offset (int_mode, dest, base, + const_offset); + return; + } + aarch64_load_symref_appropriately (dest, imm, sty); return;