Message ID | 1360882071-4072668-9-git-send-email-arnd@arndb.de |
---|---|
State | New |
Headers | show |
On 02/14/2013 03:47 PM, Arnd Bergmann wrote: > Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S" > moves the v7_invalidate_l1 symbol out of imx/headsmp.S, > which seems to cause a link error because it is now > too far away from v7_cpu_resume when building an > allyesconfig kernel. Is the problem from the following in arch/arm/mach-imx/headsmp.S: ENTRY(v7_cpu_resume) bl v7_invalidate_l1 Isn't the range of bl +/- 32MiB (or +/- 16MibB in Thumb 2). Is the kernel really that big? Sorry, I'm having trouble understanding what causes the problem.
On Thursday 14 February 2013, Stephen Warren wrote: > On 02/14/2013 03:47 PM, Arnd Bergmann wrote: > > Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S" > > moves the v7_invalidate_l1 symbol out of imx/headsmp.S, > > which seems to cause a link error because it is now > > too far away from v7_cpu_resume when building an > > allyesconfig kernel. > > Is the problem from the following in arch/arm/mach-imx/headsmp.S: > > ENTRY(v7_cpu_resume) > bl v7_invalidate_l1 > > Isn't the range of bl +/- 32MiB (or +/- 16MibB in Thumb 2). Is the > kernel really that big? Sorry, I'm having trouble understanding what > causes the problem. Well, it is an "allyesconfig" kernel, so things can get pretty big: $ size obj-tmp/vmlinux -A obj-tmp/vmlinux : section size addr .head.text 504 3221258240 .text 32707336 3221258752 .text.head 8 3253966088 .rodata 14028722 3253968896 __bug_table 127764 3267997624 .builtin_fw 684 3268125388 __ksymtab 53424 3268126072 __ksymtab_gpl 43560 3268179496 __kcrctab 26712 3268223056 __kcrctab_gpl 21780 3268249768 __ksymtab_strings 233706 3268271548 __param 33104 3268505256 __modver 4104 3268538360 __ex_table 4112 3268542464 .ARM.unwind_idx 967784 3268546576 .ARM.unwind_tab 1452168 3269514360 .notes 36 3270966528 .init.text 677840 3270967296 .exit.text 125672 3271645136 .init.arch.info 5396 3271770808 .init.tagtable 72 3271776204 .init.smpalt 928 3271776276 .init.pv_table 1704 3271777204 .init.data 678108 3271778912 .exit.data 119 3272457020 .data..percpu 1460032 3272458240 .data 3370068 3273924608 .tcm_start 940 3277294676 .bss 8007724 3277295616 .comment 43 0 .ARM.attributes 50 0 .debug_line 15780363 0 .debug_info 57192143 0 .debug_abbrev 5747374 0 .debug_aranges 299608 0 .debug_ranges 5414592 0 .debug_frame 4801748 0 .debug_str 7003282 0 .debug_loc 36237476 0 Total 196510790 THUMB2 support is obviously enabled (allyesconfig), and from the start of the .head.text section to the end of .bss, it is 64,045,100 bytes, using yesterday's linux-next kernel with my fixes. It will get bigger as we add more stuff to multiplatform. The .text section alone is just short of 32MB. Arnd
On Thu, Feb 14, 2013 at 11:47:50PM +0100, Arnd Bergmann wrote: > Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S" > moves the v7_invalidate_l1 symbol out of imx/headsmp.S, > which seems to cause a link error because it is now > too far away from v7_cpu_resume when building an > allyesconfig kernel. > > If we move the v7_cpu_resume function from the .data > section to .text, that creates another link error > for the reference to phys_l2x0_saved_regs, but we > can move all of the above to .text. > > I believe that this is not a correct bug fix but just > a bad workaround, so I'm open to ideas from people > who understand the bigger picture. Unfortunately, this ends up with writable data in the .text section, which is supposed to be read-only. We should try to preserve that status, even though we don't enforce it. I guess the problem is that we end up with the .data segment soo far away from the .text segment that these branches no longer work (and remember that this code executes with the MMU off.) The only solution I can think is that we need to do quite a bit of magic here to jump to an absolute address, but taking account of the V:P offset. That's not going to be particularly nice, and it's going to then also have to jump back the other way - to the cpu_resume code which is also in the .data section.
On Fri, Feb 15, 2013 at 11:05:14AM +0000, Arnd Bergmann wrote: > $ size obj-tmp/vmlinux -A > obj-tmp/vmlinux : > section size addr > .head.text 504 3221258240 > .text 32707336 3221258752 > .text.head 8 3253966088 Interesting... I wonder if that should be .head.text, or maybe just .text. Looking at iMX, it's just the secondary startup, which is calling into .head.text, so it would be much safer given the size of the .text segment for it to be in .head.text. > The .text section alone is just short of 32MB. I suspect when it does go over that, we'll see a lot more link time failures due to the 'bl' instructions failing to encode their PC relative jumps. The only solution then will be to switch everything to use the less efficient long jumps (load address from literal pool, bx) which'll also need much of the asm changed. That also brings up the question whether we want to penalize the kernel performance just to make allyesconfig work... I guess we can make it a compile time option, which of course allyesconfig will automatically enable. I suspect we'll see a lot of people mistakenly enabling it too though.
On Friday 15 February 2013, Russell King - ARM Linux wrote: > > The .text section alone is just short of 32MB. > > I suspect when it does go over that, we'll see a lot more link time > failures due to the 'bl' instructions failing to encode their PC relative > jumps. The only solution then will be to switch everything to use the > less efficient long jumps (load address from literal pool, bx) which'll > also need much of the asm changed. That also brings up the question > whether we want to penalize the kernel performance just to make > allyesconfig work... I guess we can make it a compile time option, which > of course allyesconfig will automatically enable. I suspect we'll see > a lot of people mistakenly enabling it too though. I'm not too worried about people accidentally enabling such a build option, at least if it's appropriately labeled as something that nobody should need. Changing the inline assembly however sounds like a big pain to do, which would cause performance and size overhead as well as regressions in rarely executed code when we get it wrong. I've tried rebulding the allyesconfig kernel with -O3 -funroll-all-loops to make it even bigger on purpose, and got these errors: arch/arm/kernel/built-in.o:(.fixup+0x8): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x10): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x18): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x20): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x28): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x30): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x38): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x40): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x48): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x50): relocation truncated to fit: R_ARM_JUMP24 against `.text' arch/arm/kernel/built-in.o:(.fixup+0x58): additional relocation overflows omitted from the output so the .fixup section is completely out of reach of the start of the .text section. The size output for the partial link of vmlinux.o is section size addr .head.text 492 3221258240 .text 43889568 3221258752 .text.head 8 0 .rodata 11020608 4096 __bug_table 138636 11024704 .pci_fixup 0 11163340 .builtin_fw 684 11163340 .rio_ops 0 11164024 __ksymtab 53424 11164024 __ksymtab_gpl 43560 11217448 __ksymtab_unused 0 11261008 __ksymtab_unused_gpl 0 11261008 __ksymtab_gpl_future 0 11261008 __kcrctab 26712 11261008 __kcrctab_gpl 21780 11287720 __kcrctab_unused 0 11309500 __kcrctab_unused_gpl 0 11309500 __kcrctab_gpl_future 0 11309500 __ksymtab_strings 251888 11309500 __param 33104 11561388 __modver 1284 11594492 __ex_table 6288 11595776 .ARM.unwind_idx 907888 11602064 .ARM.unwind_tab 1362276 12509952 .notes 36 13872228 .init.text 685436 13873152 .exit.text 124324 14558588 .init.arch.info 5396 14682912 .init.tagtable 72 14688308 .init.smpalt 952 14688380 .init.pv_table 2232 14689332 .init.data 649164 14691568 .exit.data 120 15340732 .data..percpu 1460032 15343616 .data 3370676 16809984 .tcm_start 332 20180660 .text_itcm 0 4294836224 .dtcm_start 0 20180992 .data_dtcm 0 4294868992 .tcm_end 0 20180992 .bss 8007852 20180992 .comment 314424 0 .ARM.attributes 50 0 .debug_line 19185565 0 .debug_info 61757649 0 .debug_abbrev 5835457 0 .debug_aranges 299416 0 .debug_ranges 10482064 0 .debug_frame 4601768 0 .debug_str 37433732 0 .debug_loc 45578206 0 Total 257553155 I had the idea that turning off THUMB2 support in allyesconfig would help, but then I found that it's already disabled unlike what I claimed earlier, because ARMv6 support is enabled alongside ARMv7, so we are still screwed. Also, with the size of the kernel binary growing, it may just be a matter of time until this is not just a problem for artificial cases like allyesconfig, but also for something that users will actually want to run. IIRC, 10 years ago, half a megabyte seemed like a reasonable size for a kernel binary, while today 5 megabyte are not out of the ordinary on ARM. If the trend continues the same way, we might have users complaining about 32 megabyte kernels in 5-7 years from now, unless everybody is using 64-bit by then ;-) Arnd
diff --git a/arch/arm/mach-imx/headsmp.S b/arch/arm/mach-imx/headsmp.S index 921fc15..0de76cc 100644 --- a/arch/arm/mach-imx/headsmp.S +++ b/arch/arm/mach-imx/headsmp.S @@ -30,7 +30,7 @@ ENDPROC(v7_secondary_startup) * allow phys_l2x0_saved_regs to be accessed with a relative load * as we are running on physical address here. */ - .data + .text .align #ifdef CONFIG_CACHE_L2X0 @@ -51,6 +51,8 @@ phys_l2x0_saved_regs: .endm #endif + .text + ENTRY(v7_cpu_resume) bl v7_invalidate_l1 pl310_resume
Patch c08e20d24 "arm: Add v7_invalidate_l1 to cache-v7.S" moves the v7_invalidate_l1 symbol out of imx/headsmp.S, which seems to cause a link error because it is now too far away from v7_cpu_resume when building an allyesconfig kernel. If we move the v7_cpu_resume function from the .data section to .text, that creates another link error for the reference to phys_l2x0_saved_regs, but we can move all of the above to .text. I believe that this is not a correct bug fix but just a bad workaround, so I'm open to ideas from people who understand the bigger picture. Without this patch, building allyesconfig results in: arch/arm/mach-imx/built-in.o: In function `v7_cpu_resume': arch/arm/mach-imx/headsmp.S:55:(.data+0x87f8): relocation truncated to fit: R_ARM_CALL against symbol `v7_invalidate_l1' defined in .text section in arch/arm/mm/built-in.o Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Dinh Nguyen <dinguyen@altera.com> Cc: Pavel Machek <pavel@denx.de> Cc: Stephen Warren <swarren@nvidia.com> Cc: Simon Horman <horms+renesas@verge.net.au> --- arch/arm/mach-imx/headsmp.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)