diff mbox series

[v2,6/9] aarch64: Use symbols without offset to prevent relocation issues

Message ID DBBPR83MB061338EAA3463FDC8AABEB9FF89E2@DBBPR83MB0613.EURPRD83.prod.outlook.com
State New
Headers show
Series SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement | expand

Commit Message

Evgeny Karpov Sept. 6, 2024, noon UTC
aarch64.cc has been updated to prevent emitting "symbol + offset"
for SYMBOL_SMALL_ABSOLUTE for the PECOFF target. "symbol + offset"
cannot be used in relocations for aarch64-w64-mingw32 due to
relocation requirements.

Instead, it will adjust the address by an offset with the
"add" instruction.

This approach allows addressing 4GB, instead of 1MB as it was before.
This issue has been fixed in the binutils patch series.

https://sourceware.org/pipermail/binutils/2024-August/136481.html

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_load_symref_and_add_offset):
	New.
	(aarch64_expand_mov_immediate): Use
	aarch64_load_symref_and_add_offset.
---
 gcc/config/aarch64/aarch64.cc | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Martin Storsjö Sept. 6, 2024, 12:06 p.m. UTC | #1
On Fri, 6 Sep 2024, Evgeny Karpov wrote:

> aarch64.cc has been updated to prevent emitting "symbol + offset"
> for SYMBOL_SMALL_ABSOLUTE for the PECOFF target. "symbol + offset"
> cannot be used in relocations for aarch64-w64-mingw32 due to
> relocation requirements.
>
> Instead, it will adjust the address by an offset with the
> "add" instruction.
>
> This approach allows addressing 4GB, instead of 1MB as it was before.
> This issue has been fixed in the binutils patch series.
>
> https://sourceware.org/pipermail/binutils/2024-August/136481.html

Sorry, but no.

You can't just redefine how relocations in your object file format works, 
just because you feel like it.

Didn't you see my reply about how MSVC themselves also use this 
relocation?

// Martin
Evgeny Karpov Sept. 6, 2024, 2:57 p.m. UTC | #2
Friday, September 6, 2024
Martin Storsjö <martin@martin.st> wrote:

> Sorry, but no.
> 
> You can't just redefine how relocations in your object file format works,
> just because you feel like it.

This patch changes how symbol with offset will be emitted.

It will change:
adrp x0, symbol + offset

to:
adrp  x0, symbol
add x0, x0, offset

It is not optimal, however, it is not critical, and both relocation approaches
work with it. The concern regarding changing relocation in binutils is clear
and will be discussed on the binutils mailing list.

> Didn't you see my reply about how MSVC themselves also use this
> relocation?

Yes, the extended answer will be prepared.

Regards,
Evgeny
Martin Storsjö Sept. 6, 2024, 4:09 p.m. UTC | #3
On Fri, 6 Sep 2024, Evgeny Karpov wrote:

> Friday, September 6, 2024
> Martin Storsjö <martin@martin.st> wrote:
>
>> Sorry, but no.
>> 
>> You can't just redefine how relocations in your object file format works,
>> just because you feel like it.
>
> This patch changes how symbol with offset will be emitted.
>
> It will change:
> adrp x0, symbol + offset
>
> to:
> adrp  x0, symbol
> add x0, x0, offset

I presume more precisely, it changes

     adrp x0, symbol + offset
     add x0, x0, :lo12:symbol + offset

into

     adrp x0, symbol
     add x0, x0, :lo12:symbol
     add x0, x0, offset

> It is not optimal, however, it is not critical, and both relocation approaches
> work with it.

Indeed, it's not optimal, and this other form is totally acceptable. And 
using it to work around an issue somewhere is also quite ok.

But don't make claims that "'symbol + offset' cannot be used in 
relocations for aarch64-w64-mingw32 due to relocation requirements." - 
because that is just misleading.

Only in the case if the offset is more than +/- 1 MB, you would need to 
change the form of the generated code.

And if the reason for this is the binutils patch that changes how the 
relocation is handled, that reason will go away, because I believe that 
patch is demonstrably wrong and that patch should be dropped (or 
reworked).

// Martin
Evgeny Karpov Sept. 12, 2024, 9:36 p.m. UTC | #4
Thursday, September 12, 2024
Martin Storsjö <martin@martin.st> wrote:

> This looks very reasonable - I presume this will make sure that you only
> use the other code form if the offset actually is larger than 1 MB.
>
> For the case when the offset actually is larger than 1 MB, I guess this
> also ends up generating some other instruction sequence than just a "add
> x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code,
> it looks like it generates something like "mov x16, #imm; add x0, x0,
> x16"? That's probably quite reasonable.

The generated code will stay unchanged for the offset less than 1MB:

adrp x0, symbol + offset
add x0, x0, :lo12:symbol + offset

When the offset is >= 1MB:

adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in IMAGE_REL_ARM64_PAGEBASE_REL21
add x0, x0, (offset & ~0xfffff) >> 12, lsl #12 // a workaround to support 4GB offset
add x0, x0, :lo12:symbol + offset % (1 << 20)

Regards,
Evgeny
Martin Storsjö Sept. 13, 2024, 6:56 a.m. UTC | #5
On Thu, 12 Sep 2024, Evgeny Karpov wrote:

> Thursday, September 12, 2024
> Martin Storsjö <martin@martin.st> wrote:
>
>> This looks very reasonable - I presume this will make sure that you only
>> use the other code form if the offset actually is larger than 1 MB.
>>
>> For the case when the offset actually is larger than 1 MB, I guess this
>> also ends up generating some other instruction sequence than just a "add
>> x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code,
>> it looks like it generates something like "mov x16, #imm; add x0, x0,
>> x16"? That's probably quite reasonable.
>
> The generated code will stay unchanged for the offset less than 1MB:
>
> adrp x0, symbol + offset
> add x0, x0, :lo12:symbol + offset
>
> When the offset is >= 1MB:
>
> adrp x0, symbol + offset % (1 << 20) // it prevents relocation overflow in IMAGE_REL_ARM64_PAGEBASE_REL21
> add x0, x0, (offset & ~0xfffff) >> 12, lsl #12 // a workaround to support 4GB offset
> add x0, x0, :lo12:symbol + offset % (1 << 20)

Ah, I see. Yeah, that works.

That won't get you up to a full 4 GB offset from your symbol though, I 
think that'll get you up to 16 MB offsets. In the "add x0, x0, #imm, lsl 
#12" case, the immediate is a 12 bit immediate, shifted left by 12, so you 
effectively have 24 bit range there. But clearly this works a bit further 
than 1 MB at least.

// Martin
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 03362a975c0..3a8ecdf562b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4887,6 +4887,19 @@  aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src,
 		      temp1, temp2, 0, false);
 }
 
+/* Load the address and apply the offset by using "add" instruction.  */
+
+static void
+aarch64_load_symref_and_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+				    poly_int64 offset)
+{
+  gcc_assert (can_create_pseudo_p ());
+  src = aarch64_force_temporary (mode, dest, src);
+  aarch64_add_offset (mode, dest, src, offset,
+		      NULL_RTX, NULL_RTX, 0, false);
+}
+
+
 /* Add DELTA to the stack pointer, marking the instructions frame-related.
    TEMP1 is available as a temporary if nonnull.  FORCE_ISA_MODE is as
    for aarch64_add_offset.  EMIT_MOVE_IMM is false if TEMP1 already
@@ -6054,10 +6067,8 @@  aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	case SYMBOL_TINY_TLSIE:
 	  if (const_offset != 0)
 	    {
-	      gcc_assert(can_create_pseudo_p ());
-	      base = aarch64_force_temporary (int_mode, dest, base);
-	      aarch64_add_offset (int_mode, dest, base, const_offset,
-				  NULL_RTX, NULL_RTX, 0, false);
+	      aarch64_load_symref_and_add_offset (int_mode, dest, base,
+						  const_offset);
 	      return;
 	    }
 	  /* FALLTHRU */
@@ -6068,6 +6079,13 @@  aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	case SYMBOL_TLSLE24:
 	case SYMBOL_TLSLE32:
 	case SYMBOL_TLSLE48:
+	  if (TARGET_PECOFF && const_offset != 0)
+	    {
+	      aarch64_load_symref_and_add_offset (int_mode, dest, base,
+						  const_offset);
+	      return;
+	    }
+
 	  aarch64_load_symref_appropriately (dest, imm, sty);
 	  return;