[v1] nptl: smarter not-parallel-ing

Message ID	xnk1bgoi3e.fsf@greed.delorie.com
State	New
Headers	show Return-Path: <libc-alpha-return-104384-incoming=patchwork.ozlabs.org@sourceware.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:message-id:from:to:subject; q=dns; s= default; b=ZJbbvYNT09BATKGZAw51cM5Bi9oeehrVgOhSwautyy3yVJ4ur99MO b4mJuXYIaCgpZdXmVsLbKzsZp2xI0PVHmQzgRMUMkO6iOUGCpB+tIph3UfzXMmHk /+S9Yyi4yxpv1edsNzLIb8UkKAhYZkQkAf4408ytklwASXEoYP2Q+c= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Date: Tue, 13 Aug 2019 17:45:09 -0400 Message-Id: <xnk1bgoi3e.fsf@greed.delorie.com> From: DJ Delorie <dj@redhat.com> To: libc-alpha@sourceware.org Subject: [patch v1] nptl: smarter not-parallel-ing
Series	[v1] nptl: smarter not-parallel-ing \| expand [v1] nptl: smarter not-parallel-ing

DJ Delorie Aug. 13, 2019, 9:45 p.m. UTC

On my i7-4790K (4 real, 8 HT, 4.0 GHz) machine, this reduces "make -j8
check" time from 15m36 to 15m10, saving 26 seconds.

From 722d522f3145736fced14a2f7b57adfb8ecc2f30 Mon Sep 17 00:00:00 2001
From: DJ Delorie <dj@redhat.com>
Date: Tue, 13 Aug 2019 17:34:55 -0400
Subject: Optimize nptl test sequencing.

Remove .NOTPARALLEL, which forces tests to be *built* in series.
Replace with order-only dependencies between targets which only
affects the tests themselves.

Florian Weimer Aug. 14, 2019, 5:17 a.m. UTC | #1

* DJ Delorie:

> diff --git a/nptl/Makefile b/nptl/Makefile
> index 0567e77a79..d7a3857c95 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -666,7 +666,7 @@ $(objpfx)tst-cleanup0.out: /dev/null $(objpfx)tst-cleanup0
>  	$(evaluate-test)
>  
>  $(objpfx)tst-cleanup0-cmp.out: tst-cleanup0.expect $(objpfx)tst-cleanup0.out
> -	cmp $^ > $@; \
> +	cmp tst-cleanup0.expect $(objpfx)tst-cleanup0.out > $@; \
>  	$(evaluate-test)

Why is this needed if | prerequisites are not listed in $^?

> +ifeq ($(run-built-tests),yes)
> +# The tests in this subdir should not be run in parallel.
> +#
> +# The following will create rules like "foo2.out :| foo1.out" for all
> +# tests, which forces the tests to be run serially, but does not force
> +# a test to be run just because some other test was run.
> +#
> +# Caveat: the :|-style dependencies won't be listed in $^, so avoid
> +# using $^ to depend on test result files.
> +
> +ALLTESTS = $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \
> +	$(tests-container:%=$(objpfx)%.out) \
> +	$(tests-special) $(tests-printers-out) \
> +	$(xtests:%=$(objpfx)%.out) $(xtests-special)
> +
> +ALLBUTFIRSTTEST = $(filter-out $(firstword $(ALLTESTS)), $(ALLTESTS))
> +ALLBUTLASTTEST = $(filter-out $(lastword $(ALLTESTS)), $(ALLTESTS))
> +TESTPAIRS = $(join $(ALLBUTFIRSTTEST),$(addprefix :|,$(ALLBUTLASTTEST)))
> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
>  endif

I would have used different variable names (lowercase, perhaps with an
nptl- prefix), but that's just a detail.  The $(filter-out …)
construct is awkward, but I don't see a better way.

Florian Weimer Aug. 14, 2019, 12:41 p.m. UTC | #2

I did some benchmarks with this patch.

* Cold case.

Newly build source tree, “make subdirs=login check” (to populate the
testroot), followed by “make subdirs=nptl check”, timings for the
latter.  All tests were for x86-64, timings in seconds (wall clock).
Appropriate -j options supplied.

Before:

8-thread system:   399 399 397 400 398 400 400
256-thread system: 392 389 389 391 394 393 392

After:

8-thread system:   372 375 373 376 373 374 375
256-thread system: 357 357 356 361 358 358 357

* Warm case (incremental make).

Timings of “make subdirs=nptl check” after the first test suite run.
Appropriate -j options supplied.

After:

8-thread system:   1.541 1.014 1.025 1.016 1.030 1.029
256-thread system: 1.789 1.720 1.142 1.119 1.133 1.143

(I didn't bother to time the before case, considering these low
numbers.)

So it seems to me that performance-wise, this is a small overall win.

Thanks,
Florian

Florian Weimer Aug. 14, 2019, 12:46 p.m. UTC | #3

* DJ Delorie:

> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))

One more comment: In the past, there was a sustained objection against
using $(eval …) in makefiles.  If I recall correctly, the person who
voiced the objection no longer participates in the project.

Thanks,
Florian

Carlos O'Donell Aug. 14, 2019, 1:57 p.m. UTC | #4

On 8/14/19 8:46 AM, Florian Weimer wrote:
> * DJ Delorie:
> 
>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
> 
> One more comment: In the past, there was a sustained objection against
> using $(eval …) in makefiles.  If I recall correctly, the person who
> voiced the objection no longer participates in the project.

Do we remember why?

Florian Weimer Aug. 14, 2019, 2 p.m. UTC | #5

* Carlos O'Donell:

> On 8/14/19 8:46 AM, Florian Weimer wrote:
>> * DJ Delorie:
>>
>>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
>>
>> One more comment: In the past, there was a sustained objection against
>> using $(eval …) in makefiles.  If I recall correctly, the person who
>> voiced the objection no longer participates in the project.
>
> Do we remember why?

Which part?  No deep technical reason was given at the time for the
$(eval …)  objection, and I did not make an attempt do understand this
position.

For target-generating loops, the workaround has always been to use
o-iterator.mk, which I personally do not consider an improvement.

Thanks,
Florian

Joseph Myers Aug. 14, 2019, 4:47 p.m. UTC | #6

On Wed, 14 Aug 2019, Florian Weimer wrote:

> * Carlos O'Donell:
> 
> > On 8/14/19 8:46 AM, Florian Weimer wrote:
> >> * DJ Delorie:
> >>
> >>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
> >>
> >> One more comment: In the past, there was a sustained objection against
> >> using $(eval …) in makefiles.  If I recall correctly, the person who
> >> voiced the objection no longer participates in the project.
> >
> > Do we remember why?
> 
> Which part?  No deep technical reason was given at the time for the
> $(eval …)  objection, and I did not make an attempt do understand this
> position.
> 
> For target-generating loops, the workaround has always been to use
> o-iterator.mk, which I personally do not consider an improvement.

We could revisit Andreas's patch 
<https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for 
subsequently added uses of o-iterator.mk, and see if it still gives the 
reported speedup.

Andreas Schwab Aug. 14, 2019, 4:57 p.m. UTC | #7

On Aug 14 2019, Joseph Myers <joseph@codesourcery.com> wrote:

> We could revisit Andreas's patch 
> <https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for 
> subsequently added uses of o-iterator.mk, and see if it still gives the 
> reported speedup.

It doesn't.

Andreas.

[v1] nptl: smarter not-parallel-ing

Commit Message

Comments

Patch