diff mbox series

[v2,6/9] aarch64: Use symbols without offset to prevent relocation issues

Message ID DBBPR83MB061317A13D70B84D8EA790EEF8642@DBBPR83MB0613.EURPRD83.prod.outlook.com
State New
Headers show
Series None | expand

Commit Message

Evgeny Karpov Sept. 12, 2024, 3:36 p.m. UTC
The current binutils implementation does not support offset up to 4GB in
IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
This is related to differences in ELF and COFF relocation records.
There are ways to fix this. This work on relocation change will be extracted to
a separate binutils patch series and discussion.

To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
relocation will remain unchanged, and the workaround below will be applied to
bypass the 1MB offset limitation.

Regards,
Evgeny


The patch will be replaced by this change.

Comments

Martin Storsjö Sept. 12, 2024, 5:13 p.m. UTC | #1
On Thu, 12 Sep 2024, Evgeny Karpov wrote:

> The current binutils implementation does not support offset up to 4GB in
> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
> This is related to differences in ELF and COFF relocation records.

Yes, I agree.

But I would not consider this a limitation of the binutils implementation, 
this is a limitation of the object file format. It can't be worked around 
by inventing your own custom relocations, but should instead worked around 
on the code generation side, to avoid needing such large offsets.

This approach is one such, quite valid. Another one is to generate extra 
symbols to allow addressing anything with a smaller offset.

> To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
> relocation will remain unchanged, and the workaround below will be applied to
> bypass the 1MB offset limitation.

This looks very reasonable - I presume this will make sure that you only 
use the other code form if the offset actually is larger than 1 MB.

For the case when the offset actually is larger than 1 MB, I guess this 
also ends up generating some other instruction sequence than just a "add 
x0, x0, #imm", as the #imm is limited to <= 4096. From reading the code, 
it looks like it generates something like "mov x16, #imm; add x0, x0, 
x16"? That's probably quite reasonable.

I don't know how emit_move_insn behaves if the immediates are larger - 
does it generate a sequence of mov/movk to materialize a larger constant? 
Because the range of immediates you can encode in one single mov 
instruction is pretty limited anyway.

// Martin
Richard Sandiford Sept. 19, 2024, 8:07 a.m. UTC | #2
Martin Storsjö <martin@martin.st> writes:
> On Thu, 12 Sep 2024, Evgeny Karpov wrote:
>
>> The current binutils implementation does not support offset up to 4GB in
>> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
>> This is related to differences in ELF and COFF relocation records.
>
> Yes, I agree.
>
> But I would not consider this a limitation of the binutils implementation, 
> this is a limitation of the object file format. It can't be worked around 
> by inventing your own custom relocations, but should instead worked around 
> on the code generation side, to avoid needing such large offsets.
>
> This approach is one such, quite valid. Another one is to generate extra 
> symbols to allow addressing anything with a smaller offset.

Maybe this is my ELF bias showing, but: generating extra X=Y+OFF
symbols isn't generally valid for ELF when Y is a global symbol, since
interposition rules, comdat, weak symbols, and various other reasons,
could mean that the local definition of Y isn't the one that gets used.
Does COFF cope with that in some other way?  If not, I would have
expected that there would need to be a fallback path that didn't
involve defining extra symbols.

Thanks,
Richard
Richard Sandiford Sept. 19, 2024, 8:32 a.m. UTC | #3
Evgeny Karpov <Evgeny.Karpov@microsoft.com> writes:
> The current binutils implementation does not support offset up to 4GB in
> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
> This is related to differences in ELF and COFF relocation records.
> There are ways to fix this. This work on relocation change will be extracted to
> a separate binutils patch series and discussion.
>
> To unblock the current patch series, the IMAGE_REL_ARM64_PAGEBASE_REL21
> relocation will remain unchanged, and the workaround below will be applied to
> bypass the 1MB offset limitation.
>
> Regards,
> Evgeny
>
>
> The patch will be replaced by this change.

Seems like a reasonable workarond to me FWIW, but some comments on the
implementation below:

>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 03362a975c0..5f17936df1f 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2896,7 +2896,30 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
>         if (can_create_pseudo_p ())
>           tmp_reg = gen_reg_rtx (mode);
>
> -       emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
> +       do
> +         {
> +           if (TARGET_PECOFF)
> +             {
> +               poly_int64 offset;
> +               HOST_WIDE_INT const_offset;
> +               strip_offset (imm, &offset);
> +
> +               if (offset.is_constant (&const_offset)
> +                   && abs(const_offset) >= 1 << 20)

abs_hwi (const_offset) (since const_offset has HOST_WIDE_INT type).

> +                 {
> +                   rtx const_int = imm;
> +                   const_int = XEXP (const_int, 0);
> +                   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));

CONST_INTs are shared objects, so we can't modify their value in-place.

It might be easier to pass base and const_offset from the caller
(aarch64_expand_mov_immediate).  We are then guaranteed that the
offset is constant and don't need to worry about the SVE case.
The new SYM+OFF expression can be calculated using plus_constant.

I think it'd be worth asserting that the offset fits in 32 bits,
since if by some bug the offset is larger, we'd generate silent
wrong code (in the sense that the compiler would truncate the offset
before the assembler sees it).

> +
> +                   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx(imm)));
> +                   emit_insn (gen_add_hioffset (tmp_reg, GEN_INT(const_offset)));

I think the normal addition patterns can handle this, if we pass the
result of the ~0xfffff calculation.  There should be no need for a
dedicated pattern.

> +                   break;
> +                 }
> +             }
> +
> +             emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
> +         } while(0);

I think it'd be clearer to duplicate the gen_add_losym and avoid the
do...while(0)

Thanks,
Richard

> +
>         emit_insn (gen_add_losym (dest, tmp_reg, imm));
>         return;
>        }
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 665a333903c..072110f93e7 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -7405,6 +7405,13 @@
>    DONE;
>  })
>
> +(define_insn "add_hioffset"
> +  [(match_operand 0 "register_operand")
> +   (match_operand 1 "const_int_operand")]
> +  ""
> +  "add %0, %0, (%1 & ~0xfffff) >> 12, lsl #12"
> +)
> +
>  (define_insn "add_losym_<mode>"
>    [(set (match_operand:P 0 "register_operand" "=r")
>         (lo_sum:P (match_operand:P 1 "register_operand" "r")
Martin Storsjö Sept. 19, 2024, 8:58 a.m. UTC | #4
On Thu, 19 Sep 2024, Richard Sandiford wrote:

> Martin Storsjö <martin@martin.st> writes:
>> On Thu, 12 Sep 2024, Evgeny Karpov wrote:
>>
>>> The current binutils implementation does not support offset up to 4GB in
>>> IMAGE_REL_ARM64_PAGEBASE_REL21 relocation and is limited to 1MB.
>>> This is related to differences in ELF and COFF relocation records.
>>
>> Yes, I agree.
>>
>> But I would not consider this a limitation of the binutils implementation,
>> this is a limitation of the object file format. It can't be worked around
>> by inventing your own custom relocations, but should instead worked around
>> on the code generation side, to avoid needing such large offsets.
>>
>> This approach is one such, quite valid. Another one is to generate extra
>> symbols to allow addressing anything with a smaller offset.
>
> Maybe this is my ELF bias showing, but: generating extra X=Y+OFF
> symbols isn't generally valid for ELF when Y is a global symbol, since
> interposition rules, comdat, weak symbols, and various other reasons,
> could mean that the local definition of Y isn't the one that gets used.
> Does COFF cope with that in some other way?  If not, I would have
> expected that there would need to be a fallback path that didn't
> involve defining extra symbols.

That's indeed a fair point. COFF doesn't cope with that in other ways - so 
defining such extra symbols to cope for the offsets, for global symbols 
that can be interposed or swapped out at linking stage, would indeed be 
wrong.

In practice, I think it's rare to reference such an interposable symbol 
with an offset overall - even more so to reference it with an offset over 
1 MB.

The practical cases where one mostly runs into the limitation, is when you 
have large sections, and use temporary labels to reference positions 
within those sections. As the temporary labels don't persist into the 
object file, the references against temporary labels end up as against 
section base, plus an offset. And those symbols (the section base) aren't 
global.

The workaround I did for this within LLVM, 
https://github.com/llvm/llvm-project/commit/06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3, 
deals specifically only with temporary symbols.

// Martin
Richard Sandiford Sept. 19, 2024, 10:15 a.m. UTC | #5
Richard Sandiford <richard.sandiford@arm.com> writes:
> Evgeny Karpov <Evgeny.Karpov@microsoft.com> writes:
>> +                 {
>> +                   rtx const_int = imm;
>> +                   const_int = XEXP (const_int, 0);
>> +                   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));
>
> CONST_INTs are shared objects, so we can't modify their value in-place.

Gah, sorry, I misread.  The patch was only modifying the PLUS, which should
be valid.  My comment below still stands though.

> It might be easier to pass base and const_offset from the caller
> (aarch64_expand_mov_immediate).  We are then guaranteed that the
> offset is constant and don't need to worry about the SVE case.
> The new SYM+OFF expression can be calculated using plus_constant.
>
> I think it'd be worth asserting that the offset fits in 32 bits,
> since if by some bug the offset is larger, we'd generate silent
> wrong code (in the sense that the compiler would truncate the offset
> before the assembler sees it).
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 03362a975c0..5f17936df1f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2896,7 +2896,30 @@  aarch64_load_symref_appropriately (rtx dest, rtx imm,
        if (can_create_pseudo_p ())
          tmp_reg = gen_reg_rtx (mode);

-       emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
+       do
+         {
+           if (TARGET_PECOFF)
+             {
+               poly_int64 offset;
+               HOST_WIDE_INT const_offset;
+               strip_offset (imm, &offset);
+
+               if (offset.is_constant (&const_offset)
+                   && abs(const_offset) >= 1 << 20)
+                 {
+                   rtx const_int = imm;
+                   const_int = XEXP (const_int, 0);
+                   XEXP (const_int, 1) = GEN_INT(const_offset % (1 << 20));
+
+                   emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx(imm)));
+                   emit_insn (gen_add_hioffset (tmp_reg, GEN_INT(const_offset)));
+                   break;
+                 }
+             }
+
+             emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, copy_rtx (imm)));
+         } while(0);
+
        emit_insn (gen_add_losym (dest, tmp_reg, imm));
        return;
       }
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 665a333903c..072110f93e7 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7405,6 +7405,13 @@ 
   DONE;
 })

+(define_insn "add_hioffset"
+  [(match_operand 0 "register_operand")
+   (match_operand 1 "const_int_operand")]
+  ""
+  "add %0, %0, (%1 & ~0xfffff) >> 12, lsl #12"
+)
+
 (define_insn "add_losym_<mode>"
   [(set (match_operand:P 0 "register_operand" "=r")
        (lo_sum:P (match_operand:P 1 "register_operand" "r")