Message ID | yddlhh0hblq.fsf@lokon.CeBiTec.Uni-Bielefeld.DE |
---|---|
State | New |
Headers | show |
On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote: > As reported in the PR, with the addition of all those OpenACC tests, > libgomp make check times have skyrocketed since the testsuite is still > run sequentially. > > Even on a reasonably fast x86 machine (4 x 2.0 Ghz Xeon E7450) the run > takes 4286 seconds. > > On slower sparc boxes (1.2 GHz UltraSPARC-T2) we're at 29406 seconds, > compared to 7825 seconds on the 4.9 branch. > > Thus, the libgomp tests massively slow down the whole testsuite run, > being the last part to finish. > > Fixing this proved trivial: I managed to almost literally copy the > solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change > to libgomp.exp so the generated libgomp-test-support.exp file is found > in both the sequential and parallel cases. This isn't an issue in > libstdc++ since all necessary variables are stored in a single > site.exp. It is far from trivial though. The point is that most of the OpenMP tests are parallelized with the default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the machine a lot, the higher number of hw threads the more. If we go forward with some parallelization of the tests, we at least should try to export something like OMP_WAIT_POLICY=passive so that the oversubscribed machine would at least not spend too much time in spinning. And perhaps reconsider running all OpenACC threads 3 times, just allow user to select which offloading target they want to test (host fallback, the host nonshm hack, PTX, XeonPHI in the future?), and test just that (that is pretty much how OpenMP offloading testing works). For tests that always want to test host fallback, I hope OpenACC offers clauses to force the host fallback. Jakub
On May 7, 2015, at 4:39 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote: >> As reported in the PR, with the addition of all those OpenACC tests, >> libgomp make check times have skyrocketed since the testsuite is still >> run sequentially. >> >> Even on a reasonably fast x86 machine (4 x 2.0 Ghz Xeon E7450) the run >> takes 4286 seconds. >> >> On slower sparc boxes (1.2 GHz UltraSPARC-T2) we're at 29406 seconds, >> compared to 7825 seconds on the 4.9 branch. >> >> Thus, the libgomp tests massively slow down the whole testsuite run, >> being the last part to finish. >> >> Fixing this proved trivial: I managed to almost literally copy the >> solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change >> to libgomp.exp so the generated libgomp-test-support.exp file is found >> in both the sequential and parallel cases. This isn't an issue in >> libstdc++ since all necessary variables are stored in a single >> site.exp. > > It is far from trivial though. > The point is that most of the OpenMP tests are parallelized with the > default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the > machine a lot If OpenMP cannot keep the machine busy, then the test suite should. A 15x speed up means that OpenMP cannot keep the machine busy. I’d not expect OpenMP to fill the gap here, so that leave just the test suite. So, unless someone wants to try their hand at getting some serious time from OpenMP, I think the patch lies on the path of goodness.
Hi! On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote: > > As reported in the PR, with the addition of all those OpenACC tests, > > libgomp make check times have skyrocketed since the testsuite is still > > run sequentially. ACK. And, thanks for looking into that! > > Fixing this proved trivial: I managed to almost literally copy the > > solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change > > to libgomp.exp so the generated libgomp-test-support.exp file is found > > in both the sequential and parallel cases. This isn't an issue in > > libstdc++ since all necessary variables are stored in a single > > site.exp. > > It is far from trivial though. > The point is that most of the OpenMP tests are parallelized with the > default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the > machine a lot, the higher number of hw threads the more. Do you agree that we have two classes of test cases in libgomp: 1) test cases that don't place a considerably higher load on the machine compared to "normal" (single-threaded) execution tests, because they're just testing some functionality that is not expected to actively depend on/interfere with parallelism. If needed, and/or if not already done, such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs, num_workers, vector_length clauses, and so on) for low parallelism levels. And, 2) test cases that place a considerably higher load on the machine compared to "normal" (single-threaded) execution tests, because they're testing some functionality that actively depends on/interferes with some kind of parallelism. What about marking such tests specially, such that DejaGnu will only ever schedule one of them for execution at the same time? For example, a new dg-* directive to run them wrapped through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such? > If we go forward with some parallelization of the tests, we at least should > try to export something like OMP_WAIT_POLICY=passive so that the > oversubscribed machine would at least not spend too much time in spinning. (Will again have the problem that DejaGnu doesn't provide infrastructure to communicate environment variables to boards in remote testing.) > And perhaps reconsider running all OpenACC threads 3 times, just allow > user to select which offloading target they want to test (host fallback, > the host nonshm hack, PTX, XeonPHI in the future?), and test just that > (that is pretty much how OpenMP offloading testing works). My rationale is: if you configure GCC to support a set of offloading devices (more than one), you'll also want to get the test coverage that indeed all these work as expected. (It currently doesn't matter, but...) that's something I'd like to see improved in the libgomp OpenMP offloading testing (once it supports more than one architecture for offloading). > For tests that > always want to test host fallback, I hope OpenACC offers clauses to force > the host fallback. Yes. Grüße, Thomas
On 05/08/2015 10:40 AM, Thomas Schwinge wrote: > Hi! > > On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek <jakub@redhat.com> wrote: >> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote: >>> As reported in the PR, with the addition of all those OpenACC tests, >>> libgomp make check times have skyrocketed since the testsuite is still >>> run sequentially. > > ACK. And, thanks for looking into that! > >>> Fixing this proved trivial: I managed to almost literally copy the >>> solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change >>> to libgomp.exp so the generated libgomp-test-support.exp file is found >>> in both the sequential and parallel cases. This isn't an issue in >>> libstdc++ since all necessary variables are stored in a single >>> site.exp. >> >> It is far from trivial though. >> The point is that most of the OpenMP tests are parallelized with the >> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the >> machine a lot, the higher number of hw threads the more. > > Do you agree that we have two classes of test cases in libgomp: 1) test > cases that don't place a considerably higher load on the machine compared > to "normal" (single-threaded) execution tests, because they're just > testing some functionality that is not expected to actively depend > on/interfere with parallelism. If needed, and/or if not already done, > such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs, > num_workers, vector_length clauses, and so on) for low parallelism > levels. And, 2) test cases that place a considerably higher load on the > machine compared to "normal" (single-threaded) execution tests, because > they're testing some functionality that actively depends on/interferes > with some kind of parallelism. What about marking such tests specially, > such that DejaGnu will only ever schedule one of them for execution at > the same time? For example, a new dg-* directive to run them wrapped > through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such? Looks the thread got stuck. Anyway I've just noticed how slow libgomp.exp tests are on a recent Intel Machine with 160 HT cores. I'm attaching graph with CPU utilization and 'ps ax | grep expect' log file that shows which tests are running. Roughly, after 10 minutes I see drop in utilization and then libgomp.exp is running mainly serially. So I believe splitting tests in libgomp.exp to serial and parallel would make sense. Another another idea is to overwrite OMP_NUM_THREADS to a reasonable number which will enable also parallel execution of parallel tests? Thanks, Martin > >> If we go forward with some parallelization of the tests, we at least should >> try to export something like OMP_WAIT_POLICY=passive so that the >> oversubscribed machine would at least not spend too much time in spinning. > > (Will again have the problem that DejaGnu doesn't provide infrastructure > to communicate environment variables to boards in remote testing.) > >> And perhaps reconsider running all OpenACC threads 3 times, just allow >> user to select which offloading target they want to test (host fallback, >> the host nonshm hack, PTX, XeonPHI in the future?), and test just that >> (that is pretty much how OpenMP offloading testing works). > > My rationale is: if you configure GCC to support a set of offloading > devices (more than one), you'll also want to get the test coverage that > indeed all these work as expected. (It currently doesn't matter, but...) > that's something I'd like to see improved in the libgomp OpenMP > offloading testing (once it supports more than one architecture for > offloading). > >> For tests that >> always want to test host fallback, I hope OpenACC offers clauses to force >> the host fallback. > > Yes. > > > Grüße, > Thomas >
# HG changeset patch # Parent 56a827256364c7b567b751287defdb0c9eabc666 Support parallel testing in libgomp (PR libgomp/66005) diff --git a/libgomp/testsuite/Makefile.am b/libgomp/testsuite/Makefile.am --- a/libgomp/testsuite/Makefile.am +++ b/libgomp/testsuite/Makefile.am @@ -12,6 +12,71 @@ EXPECT = $(shell if test -f $(top_buildd echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi) RUNTEST = "$(_RUNTEST) $(AM_RUNTESTFLAGS)" +PWD_COMMAND = $${PWDCMD-pwd} + +%/site.exp: site.exp + -@test -d $* || mkdir $* + @srcdir=`cd $(srcdir); ${PWD_COMMAND}`; + @objdir=`${PWD_COMMAND}`/$*; \ + sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \ + -e "s|^set objdir .*$$|set objdir $$objdir|" \ + site.exp > $*/site.exp.tmp + @-rm -f $*/site.bak + @test ! -f $*/site.exp || mv $*/site.exp $*/site.bak + @mv $*/site.exp.tmp $*/site.exp + +check_p_numbers0:=1 2 3 4 5 6 7 8 9 +check_p_numbers1:=0 $(check_p_numbers0) +check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers1))) +check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2) +check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers3))) +check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4) +check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers5))) +check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) $(check_p_numbers6) +check_p_subdirs=$(wordlist 1,$(if $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128),$(check_p_numbers)) +check_DEJAGNU_normal_targets = $(addprefix check-DEJAGNUnormal,$(check_p_subdirs)) +$(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp + +check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp + $(if $*,@)AR="$(AR)"; export AR; \ + RANLIB="$(RANLIB)"; export RANLIB; \ + if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \ + rm -rf normal-parallel || true; \ + mkdir normal-parallel; \ + $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \ + rm -rf normal-parallel || true; \ + for idx in $(check_p_subdirs); do \ + if [ -d normal$$idx ]; then \ + mv -f normal$$idx/libgomp.sum normal$$idx/libgomp.sum.sep; \ + mv -f normal$$idx/libgomp.log normal$$idx/libgomp.log.sep; \ + fi; \ + done; \ + $(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh \ + normal[0-9]*/libgomp.sum.sep > libgomp.sum; \ + $(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh -L \ + normal[0-9]*/libgomp.log.sep > libgomp.log; \ + exit 0; \ + fi; \ + srcdir=`$(am__cd) $(srcdir) && pwd`; export srcdir; \ + EXPECT=$(EXPECT); export EXPECT; \ + runtest=$(RUNTEST); \ + if [ -z "$$runtest" ]; then runtest=runtest; fi; \ + tool=libgomp; \ + if [ -n "$*" ]; then \ + if [ -f normal-parallel/finished ]; then rm -rf "$*"; exit 0; fi; \ + GCC_RUNTEST_PARALLELIZE_DIR=`${PWD_COMMAND}`/normal-parallel; \ + export GCC_RUNTEST_PARALLELIZE_DIR; \ + cd "$*"; \ + fi; \ + if $(SHELL) -c "$$runtest --version" > /dev/null 2>&1; then \ + $$runtest $(AM_RUNTESTFLAGS) $(RUNTESTDEFAULTFLAGS) \ + $(RUNTESTFLAGS); \ + if [ -n "$*" ]; then \ + touch $$GCC_RUNTEST_PARALLELIZE_DIR/finished; \ + fi; \ + else \ + echo "WARNING: could not find \`runtest'" 1>&2; :;\ + fi # Instead of directly in ../testsuite/libgomp-test-support.exp.in, the # following variables have to be "routed through" this Makefile, for expansion diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp --- a/libgomp/testsuite/lib/libgomp.exp +++ b/libgomp/testsuite/lib/libgomp.exp @@ -33,7 +33,8 @@ load_gcc_lib torture-options.exp load_gcc_lib fortran-modules.exp # Try to load a test support file, built during libgomp configuration. -load_file libgomp-test-support.exp +# Search in both .. and . to support parallel and sequential testing. +load_file -1 ../libgomp-test-support.exp libgomp-test-support.exp # Populate offload_targets_s (offloading targets separated by a space), and # offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells