diff mbox

powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

Message ID 20160804013729.7fffa45a@roar.ozlabs.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Nicholas Piggin Aug. 3, 2016, 3:37 p.m. UTC
On Wed, 03 Aug 2016 14:29:13 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 3, 2016 10:19:11 PM CEST Stephen Rothwell wrote:
> > Hi Arnd,
> > 
> > On Wed, 03 Aug 2016 09:52:23 +0200 Arnd Bergmann <arnd@arndb.de> wrote:  
> > >
> > > Using a different way to link the kernel would also help us with
> > > the remaining allyesconfig problem on ARM, as the problem is only in
> > > 'ld -r' not producing trampolines for symbols that later cannot get
> > > them any more. It would probably also help building with ld.gold,
> > > which is currently not working.
> > > 
> > > What is your suggested alternative?  
> > 
> > I have a patch that make the built-in.o files into thin archives (same
> > as archives, but the actual objects are replaced with the name of the
> > original object file).  That way the final link has all the original
> > objects.  I haven't checked to see what the overheads of doing it this
> > way is.
> > 
> > Nick Piggin has just today taken my old patch (it was last rebased to
> > v4.4-rc1) and tried it on a recent kernel and it still seems to mostly
> > work.  It probably needs some tidying up, but you are welcome to test
> > it if you want to.  
> 
> Sure, I'll certainly give it a try on ARM when you send me a copy.

I've attached what I'm using, which builds and runs for me without
any work. Your arch obviously has to select the option to use it.

    text      data     bss      dec       hex     filename
    11196784  1185024  1923820  14305628  da495c  vmlinuxppc64.before
    11187536  1181848  1923176  14292560  da1650  vmlinuxppc64.after
    
~9K text saving, ~3K data saving. I assume this comes from fewer
branch trampolines and toc entries, but haven't verified exactly.



commit 8bc3ca4798c215e9a9107b6d44408f0af259f84f
Author: Stephen Rothwell <sfr@canb.auug.org.au>
Date:   Tue Oct 30 12:14:18 2012 +1100

    kbuild: allow architectures to use thin archives instead of ld -r
    
    Alan Modra has been trying to convince the kernel developers that ld -r
    is "evil" for many years.  This is an alternative and means that the
    linker has much more information available to it when it links the
    kernel.
    
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Comments

Arnd Bergmann Aug. 3, 2016, 6:52 p.m. UTC | #1
On Thursday, August 4, 2016 1:37:29 AM CEST Nicholas Piggin wrote:
> 
> I've attached what I'm using, which builds and runs for me without
> any work. Your arch obviously has to select the option to use it.
> 
>     text      data     bss      dec       hex     filename
>     11196784  1185024  1923820  14305628  da495c  vmlinuxppc64.before
>     11187536  1181848  1923176  14292560  da1650  vmlinuxppc64.after
>     
> ~9K text saving, ~3K data saving. I assume this comes from fewer
> branch trampolines and toc entries, but haven't verified exactly.

The patch seems to work great, but for me it's getting bigger
(compared to my older patch, mainline allyesconfig doesn't build):

   text	   data	    bss	    dec	    hex	filename
51299868	42599559	23362148	117261575	6fd4507	vmlinuxarm.before
51302545	42595015	23361884	117259444	6fd3cb4	vmlinuxarm.after

Most of the difference appears to be in branch trampolines (634 added,
559 removed, 14837 unchanged) as you suspect, but I also see a couple
of symbols show up in vmlinux that were not there before:

-A __crc_dma_noop_ops
-D dma_noop_ops
-R __clz_tab
-r fdt_errtable
-r __kcrctab_dma_noop_ops
-r __kstrtab_dma_noop_ops
-R __ksymtab_dma_noop_ops
-t dma_noop_alloc
-t dma_noop_free
-t dma_noop_map_page
-t dma_noop_mapping_error
-t dma_noop_map_sg
-t dma_noop_supported
-T fdt_add_reservemap_entry
-T fdt_begin_node
-T fdt_create
-T fdt_create_empty_tree
-T fdt_end_node
-T fdt_finish
-T fdt_finish_reservemap
-T fdt_property
-T fdt_resize
-T fdt_strerror
-T find_cpio_data

From my first look, it seems that all of lib/*.o is now getting linked
into vmlinux, while we traditionally leave out everything from lib/
that is not referenced.

I also see a noticeable overhead in link time, the numbers are for
a cache-hot rebuild after a successful allyesconfig build, using a
24-way Opteron@2.5Ghz, just relinking vmlinux:

$ time make skj30 vmlinux # before
real	2m8.092s
user	3m41.008s
sys	0m48.172s

$ time make skj30 vmlinux # after
real	4m10.189s
user	5m43.804s
sys	0m52.988s

That is clearly a very sharp difference. Fortunately for the defconfig
build, the times are much lower, and I see no real difference other
than the noise between subsequent runs:

$ time make skj30 vmlinux # before
real	0m5.415s
user	0m19.716s
sys	0m9.356s
$ time make skj30 vmlinux # before
real	0m9.536s
user	0m21.320s
sys	0m9.224s


$ time make skj30 vmlinux # after
real	0m5.539s
user	0m20.360s
sys	0m9.224s

$ time make skj30 vmlinux # after
real	0m9.138s
user	0m21.932s
sys	0m8.988s

$ time make skj30 vmlinux # after
real	0m5.659s
user	0m20.332s
sys	0m9.620s

	Arnd
Segher Boessenkool Aug. 3, 2016, 7:44 p.m. UTC | #2
Hi Arnd,

On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:
> From my first look, it seems that all of lib/*.o is now getting linked
> into vmlinux, while we traditionally leave out everything from lib/
> that is not referenced.
> 
> I also see a noticeable overhead in link time, the numbers are for
> a cache-hot rebuild after a successful allyesconfig build, using a
> 24-way Opteron@2.5Ghz, just relinking vmlinux:
> 
> $ time make skj30 vmlinux # before
> real	2m8.092s
> user	3m41.008s
> sys	0m48.172s
> 
> $ time make skj30 vmlinux # after
> real	4m10.189s
> user	5m43.804s
> sys	0m52.988s

Is it better when using rcT instead of rcsT?


Segher
Arnd Bergmann Aug. 3, 2016, 8:13 p.m. UTC | #3
On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote:
> Hi Arnd,
> 
> On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:
> > From my first look, it seems that all of lib/*.o is now getting linked
> > into vmlinux, while we traditionally leave out everything from lib/
> > that is not referenced.
> > 
> > I also see a noticeable overhead in link time, the numbers are for
> > a cache-hot rebuild after a successful allyesconfig build, using a
> > 24-way Opteron@2.5Ghz, just relinking vmlinux:
> > 
> > $ time make skj30 vmlinux # before
> > real	2m8.092s
> > user	3m41.008s
> > sys	0m48.172s
> > 
> > $ time make skj30 vmlinux # after
> > real	4m10.189s
> > user	5m43.804s
> > sys	0m52.988s
> 
> Is it better when using rcT instead of rcsT?

It seems to be noticeably better for the clean rebuild case, though
not as good as the original:

real	3m34.015s
user	5m7.104s
sys	0m49.172s

I've also tried now with my own patch applied as well (linking
each drivers/*/built-in.o into vmlinux rather than having them
linked into drivers/built-in.o first), but that makes no
difference.

	Arnd
Stephen Rothwell Aug. 4, 2016, 12:10 a.m. UTC | #4
Hi Arnd,

On Wed, 03 Aug 2016 20:52:48 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
>
> Most of the difference appears to be in branch trampolines (634 added,
> 559 removed, 14837 unchanged) as you suspect, but I also see a couple
> of symbols show up in vmlinux that were not there before:
> 
> -A __crc_dma_noop_ops
> -D dma_noop_ops
> -R __clz_tab
> -r fdt_errtable
> -r __kcrctab_dma_noop_ops
> -r __kstrtab_dma_noop_ops
> -R __ksymtab_dma_noop_ops
> -t dma_noop_alloc
> -t dma_noop_free
> -t dma_noop_map_page
> -t dma_noop_mapping_error
> -t dma_noop_map_sg
> -t dma_noop_supported
> -T fdt_add_reservemap_entry
> -T fdt_begin_node
> -T fdt_create
> -T fdt_create_empty_tree
> -T fdt_end_node
> -T fdt_finish
> -T fdt_finish_reservemap
> -T fdt_property
> -T fdt_resize
> -T fdt_strerror
> -T find_cpio_data
> 
> From my first look, it seems that all of lib/*.o is now getting linked
> into vmlinux, while we traditionally leave out everything from lib/
> that is not referenced.

You could try removing the --{,no-}whole-archive arguments to ld in
scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh.  Last time I did
that, though, a whole lot of stuff failed to be linked in. (Especially
stuff only referenced by EXPORT_SYMBOL()s, bu that may have been fixed).

> I also see a noticeable overhead in link time, the numbers are for
> a cache-hot rebuild after a successful allyesconfig build, using a
> 24-way Opteron@2.5Ghz, just relinking vmlinux:

I was afraid of that, but it is offset by the time saved by not doing
the "ld -r"s along the way?  It may also be that (for powerpc anyway)
the linker is doing a better job.
Nicholas Piggin Aug. 11, 2016, 12:43 p.m. UTC | #5
On Wed, 03 Aug 2016 22:13:28 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote:
> > Hi Arnd,
> > 
> > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:  
> > > From my first look, it seems that all of lib/*.o is now getting linked
> > > into vmlinux, while we traditionally leave out everything from lib/
> > > that is not referenced.
> > > 
> > > I also see a noticeable overhead in link time, the numbers are for
> > > a cache-hot rebuild after a successful allyesconfig build, using a
> > > 24-way Opteron@2.5Ghz, just relinking vmlinux:
> > > 
> > > $ time make skj30 vmlinux # before
> > > real	2m8.092s
> > > user	3m41.008s
> > > sys	0m48.172s
> > > 
> > > $ time make skj30 vmlinux # after
> > > real	4m10.189s
> > > user	5m43.804s
> > > sys	0m52.988s  
> > 
> > Is it better when using rcT instead of rcsT?  
> 
> It seems to be noticeably better for the clean rebuild case, though
> not as good as the original:
> 
> real	3m34.015s
> user	5m7.104s
> sys	0m49.172s
> 
> I've also tried now with my own patch applied as well (linking
> each drivers/*/built-in.o into vmlinux rather than having them
> linked into drivers/built-in.o first), but that makes no
> difference.

I just want to come back to this, because I've subbmitted the thin
archives kbuild patch, I wanted to make sure we're doing okay on
ARM/ARM64. I cross compiled with my laptop.

For ARM64 allyesconfig:

After building then removing all built-in.o then rebuilding vmlinux:
inclink
time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
real    1m18.977s
user    2m14.512s
sys     0m29.704s

thinarc
time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
real    1m18.433s
user    2m6.128s
sys     0m28.372s


Final ld time
inclink
real    0m4.005s
user    0m3.464s
sys     0m0.536s

thinarc
real    0m5.841s
user    0m4.916s
sys     0m0.916s


Build directory size is of course much better (3953MB vs 5519MB).


For ARM, defconfig

After building then removing all built-in.o then rebuilding vmlinux:
inclink
real	0m19.593s
user	0m22.372s
sys	0m6.428s

thinarc
real	0m18.919s
user	0m21.924s
sys	0m6.400s


Final ld time
inclink
real	0m0.378s
user	0m0.304s
sys	0m0.076s

thinarc
real    0m0.894s
user    0m0.684s
sys     0m0.200s

For both cases final link gets slower with thin archives. I guess there is some
per-file overhead but I thought with --whole-archive it should not be that much
slower. Still, overall time for main ar/ld phases comes out about the same in
the end so I don't think it's too much problem. Unless ARM blows up significantly
worse with a bigger config.

Linking with thin archives takes significantly more time in bfd hash lookup code.
I haven't dug much further yet.

Thanks,
Nick
Arnd Bergmann Aug. 11, 2016, 1:04 p.m. UTC | #6
On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote:
> On Wed, 03 Aug 2016 22:13:28 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> 
> > On Wednesday, August 3, 2016 2:44:29 PM CEST Segher Boessenkool wrote:
> > > Hi Arnd,
> > > 
> > > On Wed, Aug 03, 2016 at 08:52:48PM +0200, Arnd Bergmann wrote:  
> > > > From my first look, it seems that all of lib/*.o is now getting linked
> > > > into vmlinux, while we traditionally leave out everything from lib/
> > > > that is not referenced.
> > > > 
> > > > I also see a noticeable overhead in link time, the numbers are for
> > > > a cache-hot rebuild after a successful allyesconfig build, using a
> > > > 24-way Opteron@2.5Ghz, just relinking vmlinux:
> > > > 
> > > > $ time make skj30 vmlinux # before
> > > > real	2m8.092s
> > > > user	3m41.008s
> > > > sys	0m48.172s
> > > > 
> > > > $ time make skj30 vmlinux # after
> > > > real	4m10.189s
> > > > user	5m43.804s
> > > > sys	0m52.988s  
> > > 
> > > Is it better when using rcT instead of rcsT?  
> > 
> > It seems to be noticeably better for the clean rebuild case, though
> > not as good as the original:
> > 
> > real	3m34.015s
> > user	5m7.104s
> > sys	0m49.172s
> > 
> > I've also tried now with my own patch applied as well (linking
> > each drivers/*/built-in.o into vmlinux rather than having them
> > linked into drivers/built-in.o first), but that makes no
> > difference.
> 
> I just want to come back to this, because I've subbmitted the thin
> archives kbuild patch, I wanted to make sure we're doing okay on
> ARM/ARM64. I cross compiled with my laptop.
> 
> For ARM64 allyesconfig:
> 
> After building then removing all built-in.o then rebuilding vmlinux:
> inclink
> time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
> real    1m18.977s
> user    2m14.512s
> sys     0m29.704s
> 
> thinarc
> time make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- -j8 vmlinux
> real    1m18.433s
> user    2m6.128s
> sys     0m28.372s
> 
> 
> Final ld time
> inclink
> real    0m4.005s
> user    0m3.464s
> sys     0m0.536s
> 
> thinarc
> real    0m5.841s
> user    0m4.916s
> sys     0m0.916s
> 
> 
> Build directory size is of course much better (3953MB vs 5519MB).

Ok, looks great. Some downsides and some upsides here, but overall
I think this is a win.

> 
> For ARM, defconfig
> 
> After building then removing all built-in.o then rebuilding vmlinux:
> inclink
> real	0m19.593s
> user	0m22.372s
> sys	0m6.428s
> 
> thinarc
> real	0m18.919s
> user	0m21.924s
> sys	0m6.400s
> 
> 
> Final ld time
> inclink
> real	0m0.378s
> user	0m0.304s
> sys	0m0.076s
> 
> thinarc
> real    0m0.894s
> user    0m0.684s
> sys     0m0.200s

This also still seems fine.

> For both cases final link gets slower with thin archives. I guess there is some
> per-file overhead but I thought with --whole-archive it should not be that much
> slower. Still, overall time for main ar/ld phases comes out about the same in
> the end so I don't think it's too much problem. Unless ARM blows up significantly
> worse with a bigger config.

Unfortunately I think it does. I haven't tried your latest series yet,
but I think the total time for removing built-in.o and relinking went
up from around 4 minutes (already way too much) to 18 minutes for me.

> Linking with thin archives takes significantly more time in bfd hash lookup code.
> I haven't dug much further yet.

Can you try the ARM allyesconfig with thin archives? I'll follow up with two
patches: one to get ARM to link without thin archives, and one that I used
to get --gc-sections to work.

	Arnd
Nicholas Piggin Aug. 11, 2016, 1:12 p.m. UTC | #7
On Thu, 11 Aug 2016 15:04:00 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thursday, August 11, 2016 10:43:20 PM CEST Nicholas Piggin wrote:
> > On Wed, 03 Aug 2016 22:13:28 +0200

> > Final ld time
> > inclink
> > real	0m0.378s
> > user	0m0.304s
> > sys	0m0.076s
> > 
> > thinarc
> > real    0m0.894s
> > user    0m0.684s
> > sys     0m0.200s  
> 
> This also still seems fine.
> 
> > For both cases final link gets slower with thin archives. I guess there is some
> > per-file overhead but I thought with --whole-archive it should not be that much
> > slower. Still, overall time for main ar/ld phases comes out about the same in
> > the end so I don't think it's too much problem. Unless ARM blows up significantly
> > worse with a bigger config.  
> 
> Unfortunately I think it does. I haven't tried your latest series yet,
> but I think the total time for removing built-in.o and relinking went
> up from around 4 minutes (already way too much) to 18 minutes for me.
> 
> > Linking with thin archives takes significantly more time in bfd hash lookup code.
> > I haven't dug much further yet.  
> 
> Can you try the ARM allyesconfig with thin archives? I'll follow up with two
> patches: one to get ARM to link without thin archives, and one that I used
> to get --gc-sections to work.

Okay send them over, I'll try digging into it. There is not much kbuild
code to maintain so we don't have to switch every arch. It would be nice
to though.

Thanks,
Nick
diff mbox

Patch

diff --git a/arch/Kconfig b/arch/Kconfig
index d794384..1330bf4 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -424,6 +424,12 @@  config CC_STACKPROTECTOR_STRONG
 
 endchoice
 
+config THIN_ARCHIVES
+	bool
+	help
+	  Select this if the architecture wants to use thin archives
+	  instead of ld -r to create the built-in.o files.
+
 config HAVE_CONTEXT_TRACKING
 	bool
 	help
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 0d1ca5b..bbf60b3 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -358,10 +358,15 @@  $(sort $(subdir-obj-y)): $(subdir-ym) ;
 # Rule to compile a set of .o files into one .o file
 #
 ifdef builtin-target
+ifdef CONFIG_THIN_ARCHIVES
+  cmd_make_builtin = rm -f $@; $(AR) rcsT$(KBUILD_ARFLAGS)
+else
+  cmd_make_builtin = $(LD) $(ld_flags) -r -o
+endif
 quiet_cmd_link_o_target = LD      $@
 # If the list of objects to link is empty, just create an empty built-in.o
 cmd_link_o_target = $(if $(strip $(obj-y)),\
-		      $(LD) $(ld_flags) -r -o $@ $(filter $(obj-y), $^) \
+		      $(cmd_make_builtin) $@ $(filter $(obj-y), $^) \
 		      $(cmd_secanalysis),\
 		      rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@)
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index f0f6d9d..ef4658f 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -41,8 +41,14 @@  info()
 # ${1} output file
 modpost_link()
 {
-	${LD} ${LDFLAGS} -r -o ${1} ${KBUILD_VMLINUX_INIT}                   \
-		--start-group ${KBUILD_VMLINUX_MAIN} --end-group
+	local objects
+
+	if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
+		objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
+	else
+		objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
+	fi
+	${LD} ${LDFLAGS} -r -o ${1} ${objects}
 }
 
 # Link of vmlinux
@@ -51,11 +57,16 @@  modpost_link()
 vmlinux_link()
 {
 	local lds="${objtree}/${KBUILD_LDS}"
+	local objects
 
 	if [ "${SRCARCH}" != "um" ]; then
+		if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
+			objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
+		else
+			objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
+		fi
 		${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2}                  \
-			-T ${lds} ${KBUILD_VMLINUX_INIT}                     \
-			--start-group ${KBUILD_VMLINUX_MAIN} --end-group ${1}
+			-T ${lds} ${objects} ${1}
 	else
 		${CC} ${CFLAGS_vmlinux} -o ${2}                              \
 			-Wl,-T,${lds} ${KBUILD_VMLINUX_INIT}                 \