Message ID | xnk1bgoi3e.fsf@greed.delorie.com |
---|---|
State | New |
Headers | show |
Series | [v1] nptl: smarter not-parallel-ing | expand |
* DJ Delorie: > diff --git a/nptl/Makefile b/nptl/Makefile > index 0567e77a79..d7a3857c95 100644 > --- a/nptl/Makefile > +++ b/nptl/Makefile > @@ -666,7 +666,7 @@ $(objpfx)tst-cleanup0.out: /dev/null $(objpfx)tst-cleanup0 > $(evaluate-test) > > $(objpfx)tst-cleanup0-cmp.out: tst-cleanup0.expect $(objpfx)tst-cleanup0.out > - cmp $^ > $@; \ > + cmp tst-cleanup0.expect $(objpfx)tst-cleanup0.out > $@; \ > $(evaluate-test) Why is this needed if | prerequisites are not listed in $^? > +ifeq ($(run-built-tests),yes) > +# The tests in this subdir should not be run in parallel. > +# > +# The following will create rules like "foo2.out :| foo1.out" for all > +# tests, which forces the tests to be run serially, but does not force > +# a test to be run just because some other test was run. > +# > +# Caveat: the :|-style dependencies won't be listed in $^, so avoid > +# using $^ to depend on test result files. > + > +ALLTESTS = $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \ > + $(tests-container:%=$(objpfx)%.out) \ > + $(tests-special) $(tests-printers-out) \ > + $(xtests:%=$(objpfx)%.out) $(xtests-special) > + > +ALLBUTFIRSTTEST = $(filter-out $(firstword $(ALLTESTS)), $(ALLTESTS)) > +ALLBUTLASTTEST = $(filter-out $(lastword $(ALLTESTS)), $(ALLTESTS)) > +TESTPAIRS = $(join $(ALLBUTFIRSTTEST),$(addprefix :|,$(ALLBUTLASTTEST))) > +$(foreach pair,$(TESTPAIRS),$(eval $(pair))) > endif I would have used different variable names (lowercase, perhaps with an nptl- prefix), but that's just a detail. The $(filter-out …) construct is awkward, but I don't see a better way.
I did some benchmarks with this patch. * Cold case. Newly build source tree, “make subdirs=login check” (to populate the testroot), followed by “make subdirs=nptl check”, timings for the latter. All tests were for x86-64, timings in seconds (wall clock). Appropriate -j options supplied. Before: 8-thread system: 399 399 397 400 398 400 400 256-thread system: 392 389 389 391 394 393 392 After: 8-thread system: 372 375 373 376 373 374 375 256-thread system: 357 357 356 361 358 358 357 * Warm case (incremental make). Timings of “make subdirs=nptl check” after the first test suite run. Appropriate -j options supplied. After: 8-thread system: 1.541 1.014 1.025 1.016 1.030 1.029 256-thread system: 1.789 1.720 1.142 1.119 1.133 1.143 (I didn't bother to time the before case, considering these low numbers.) So it seems to me that performance-wise, this is a small overall win. Thanks, Florian
* DJ Delorie:
> +$(foreach pair,$(TESTPAIRS),$(eval $(pair)))
One more comment: In the past, there was a sustained objection against
using $(eval …) in makefiles. If I recall correctly, the person who
voiced the objection no longer participates in the project.
Thanks,
Florian
On 8/14/19 8:46 AM, Florian Weimer wrote: > * DJ Delorie: > >> +$(foreach pair,$(TESTPAIRS),$(eval $(pair))) > > One more comment: In the past, there was a sustained objection against > using $(eval …) in makefiles. If I recall correctly, the person who > voiced the objection no longer participates in the project. Do we remember why?
* Carlos O'Donell: > On 8/14/19 8:46 AM, Florian Weimer wrote: >> * DJ Delorie: >> >>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair))) >> >> One more comment: In the past, there was a sustained objection against >> using $(eval …) in makefiles. If I recall correctly, the person who >> voiced the objection no longer participates in the project. > > Do we remember why? Which part? No deep technical reason was given at the time for the $(eval …) objection, and I did not make an attempt do understand this position. For target-generating loops, the workaround has always been to use o-iterator.mk, which I personally do not consider an improvement. Thanks, Florian
On Wed, 14 Aug 2019, Florian Weimer wrote: > * Carlos O'Donell: > > > On 8/14/19 8:46 AM, Florian Weimer wrote: > >> * DJ Delorie: > >> > >>> +$(foreach pair,$(TESTPAIRS),$(eval $(pair))) > >> > >> One more comment: In the past, there was a sustained objection against > >> using $(eval …) in makefiles. If I recall correctly, the person who > >> voiced the objection no longer participates in the project. > > > > Do we remember why? > > Which part? No deep technical reason was given at the time for the > $(eval …) objection, and I did not make an attempt do understand this > position. > > For target-generating loops, the workaround has always been to use > o-iterator.mk, which I personally do not consider an improvement. We could revisit Andreas's patch <https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for subsequently added uses of o-iterator.mk, and see if it still gives the reported speedup.
On Aug 14 2019, Joseph Myers <joseph@codesourcery.com> wrote: > We could revisit Andreas's patch > <https://sourceware.org/ml/libc-alpha/2012-07/msg00250.html>, updated for > subsequently added uses of o-iterator.mk, and see if it still gives the > reported speedup. It doesn't. Andreas.
diff --git a/ChangeLog b/ChangeLog index 71b254158d..12c3ef0dbb 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2019-08-13 DJ Delorie <dj@delorie.com> + + * nptl/Makefile (.NOTPARALLEL): Remove. Replace with computed + sequence dependencies instead. + 2019-08-13 Florian Weimer <fweimer@redhat.com> * login/utmp_file.c (LOCK_FILE, LOCKING_FAILED, UNLOCK_FILE): diff --git a/nptl/Makefile b/nptl/Makefile index 0567e77a79..d7a3857c95 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -666,7 +666,7 @@ $(objpfx)tst-cleanup0.out: /dev/null $(objpfx)tst-cleanup0 $(evaluate-test) $(objpfx)tst-cleanup0-cmp.out: tst-cleanup0.expect $(objpfx)tst-cleanup0.out - cmp $^ > $@; \ + cmp tst-cleanup0.expect $(objpfx)tst-cleanup0.out > $@; \ $(evaluate-test) $(objpfx)crti.o: $(objpfx)pt-crti.o @@ -723,7 +723,23 @@ tst-audit-threads-ENV = LD_AUDIT=$(objpfx)tst-audit-threads-mod1.so CFLAGS-tst-unwind-thread.c += -funwind-tables -# The tests here better do not run in parallel -ifneq ($(filter %tests,$(MAKECMDGOALS)),) -.NOTPARALLEL: +ifeq ($(run-built-tests),yes) +# The tests in this subdir should not be run in parallel. +# +# The following will create rules like "foo2.out :| foo1.out" for all +# tests, which forces the tests to be run serially, but does not force +# a test to be run just because some other test was run. +# +# Caveat: the :|-style dependencies won't be listed in $^, so avoid +# using $^ to depend on test result files. + +ALLTESTS = $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \ + $(tests-container:%=$(objpfx)%.out) \ + $(tests-special) $(tests-printers-out) \ + $(xtests:%=$(objpfx)%.out) $(xtests-special) + +ALLBUTFIRSTTEST = $(filter-out $(firstword $(ALLTESTS)), $(ALLTESTS)) +ALLBUTLASTTEST = $(filter-out $(lastword $(ALLTESTS)), $(ALLTESTS)) +TESTPAIRS = $(join $(ALLBUTFIRSTTEST),$(addprefix :|,$(ALLBUTLASTTEST))) +$(foreach pair,$(TESTPAIRS),$(eval $(pair))) endif
On my i7-4790K (4 real, 8 HT, 4.0 GHz) machine, this reduces "make -j8 check" time from 15m36 to 15m10, saving 26 seconds. From 722d522f3145736fced14a2f7b57adfb8ecc2f30 Mon Sep 17 00:00:00 2001 From: DJ Delorie <dj@redhat.com> Date: Tue, 13 Aug 2019 17:34:55 -0400 Subject: Optimize nptl test sequencing. Remove .NOTPARALLEL, which forces tests to be *built* in series. Replace with order-only dependencies between targets which only affects the tests themselves.