diff mbox

[gomp] Recycle non-nested team if possible

Message ID 1436786144-8625-1-git-send-email-sebastian.huber@embedded-brains.de
State New
Headers show

Commit Message

Sebastian Huber July 13, 2015, 11:15 a.m. UTC
Try to recycle the last non-nested team to avoid the use of malloc() and
free() in the normal case where the number of threads is the same.
Avoid superfluous destruction and initialization of team synchronization
objects.

Using the microbenchmark posted here

https://gcc.gnu.org/ml/gcc-patches/2008-03/msg00930.html

shows an improvement in the parallel bench test case (target
x86_64-unknown-linux-gnu, median out of 9 test runs, iteration count
increased to 200000).

Before the patch:

parallel bench 11.2284 seconds

After the patch:

parallel bench 10.5912 seconds

libgomp/ChangeLog
2015-07-13  Sebastian Huber  <sebastian.huber@embedded-brains.de>

	* team.c (get_recycable_team): New.
	(gomp_new_team): Recycle last non-nested team if possible.
	(free_team): Destroy more team synchronization objects.
	(gomp_team_end): Move some team synchronization object
	destructions to free_team().
---
 libgomp/team.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 39 insertions(+), 15 deletions(-)

Comments

Jakub Jelinek July 13, 2015, 2:17 p.m. UTC | #1
On Mon, Jul 13, 2015 at 01:15:44PM +0200, Sebastian Huber wrote:
> diff --git a/libgomp/team.c b/libgomp/team.c
> index b98b233..0bcbaf8 100644
> --- a/libgomp/team.c
> +++ b/libgomp/team.c
> @@ -134,6 +134,25 @@ gomp_thread_start (void *xdata)
>    return NULL;
>  }
>  
> +static struct gomp_team *
> +get_recycable_team (unsigned nthreads)

That would be recyclable.
But I think get_last_team would be better.
Also, please make it static inline.

> +      team = gomp_malloc (sizeof (*team) + nthreads * extra);
> +
> +#ifndef HAVE_SYNC_BUILTINS
> +      gomp_mutex_init (&team->work_share_list_free_lock);
> +#endif

Avoiding gomp_mutex_destroy/gomp_mutex_init is fine,
but I must say I'm far less sure about gomp_sem_init (can you
add there a temporary assert that it has the expected value)
and even less about gomp_barrier_init (I think e.g. on Linux
generation will be very unlikely 0 that it should be, and not
sure about awaited_final value either).

	Jakub
Sebastian Huber July 13, 2015, 2:33 p.m. UTC | #2
On 13/07/15 16:17, Jakub Jelinek wrote:
> On Mon, Jul 13, 2015 at 01:15:44PM +0200, Sebastian Huber wrote:
>> diff --git a/libgomp/team.c b/libgomp/team.c
>> index b98b233..0bcbaf8 100644
>> --- a/libgomp/team.c
>> +++ b/libgomp/team.c
>> @@ -134,6 +134,25 @@ gomp_thread_start (void *xdata)
>>     return NULL;
>>   }
>>   
>> +static struct gomp_team *
>> +get_recycable_team (unsigned nthreads)
> That would be recyclable.
> But I think get_last_team would be better.

Ok.

> Also, please make it static inline.

Out of curiosity, does this make a difference for a static function in a 
module if it has the inline or not?

>
>> +      team = gomp_malloc (sizeof (*team) + nthreads * extra);
>> +
>> +#ifndef HAVE_SYNC_BUILTINS
>> +      gomp_mutex_init (&team->work_share_list_free_lock);
>> +#endif
> Avoiding gomp_mutex_destroy/gomp_mutex_init is fine,
> but I must say I'm far less sure about gomp_sem_init (can you
> add there a temporary assert that it has the expected value)
> and even less about gomp_barrier_init (I think e.g. on Linux
> generation will be very unlikely 0 that it should be, and not
> sure about awaited_final value either).
>
> 	Jakub

I didn't observe any testsuite failures on x86_64-unknown-linux-gnu with 
this patch. I will add asserts and re-run the testsuite tomorrow.
Sebastian Huber July 14, 2015, 7:04 a.m. UTC | #3
On 13/07/15 16:33, Sebastian Huber wrote:
>
>
> On 13/07/15 16:17, Jakub Jelinek wrote:
>> On Mon, Jul 13, 2015 at 01:15:44PM +0200, Sebastian Huber wrote:
>>> diff --git a/libgomp/team.c b/libgomp/team.c
>>> index b98b233..0bcbaf8 100644
>>> --- a/libgomp/team.c
>>> +++ b/libgomp/team.c
>>> @@ -134,6 +134,25 @@ gomp_thread_start (void *xdata)
>>>     return NULL;
>>>   }
>>>   +static struct gomp_team *
>>> +get_recycable_team (unsigned nthreads)
>> That would be recyclable.
>> But I think get_last_team would be better.
>
> Ok.
>
>> Also, please make it static inline.
>
> Out of curiosity, does this make a difference for a static function in 
> a module if it has the inline or not?
>
>>
>>> +      team = gomp_malloc (sizeof (*team) + nthreads * extra);
>>> +
>>> +#ifndef HAVE_SYNC_BUILTINS
>>> +      gomp_mutex_init (&team->work_share_list_free_lock);
>>> +#endif
>> Avoiding gomp_mutex_destroy/gomp_mutex_init is fine,
>> but I must say I'm far less sure about gomp_sem_init (can you
>> add there a temporary assert that it has the expected value)
>> and even less about gomp_barrier_init (I think e.g. on Linux
>> generation will be very unlikely 0 that it should be, and not
>> sure about awaited_final value either).
>>
>>     Jakub
>
> I didn't observe any testsuite failures on x86_64-unknown-linux-gnu 
> with this patch. I will add asserts and re-run the testsuite tomorrow.
>

The team->master_release semaphore is not always zero after use. I got a 
test failure here:

pr29947-2.exe: libgomp/team.c:152: get_last_team: Assertion 
`last_team->master_release == 0' failed.

The state of barrier seems to be all right. I added these assertions:

#include <assert.h>

static inline struct gomp_team *
get_last_team (unsigned nthreads)
{
   struct gomp_thread *thr = gomp_thread ();
   if (thr->ts.team == NULL)
     {
       struct gomp_thread_pool *pool = thr->thread_pool;
       if (pool != NULL)
     {
       struct gomp_team *last_team = pool->last_team;
       if (last_team != NULL && last_team->nthreads == nthreads)
         {
           pool->last_team = NULL;
           assert (last_team->barrier.total == nthreads);
           assert (last_team->barrier.awaited == nthreads);
           assert (last_team->barrier.awaited_final == nthreads);
           assert (last_team->barrier.generation % BAR_INCR != 0);
           return last_team;
         }
     }
     }
   return NULL;
}

None of them fails if I run the test suite.

Running libgomp/testsuite/libgomp.c/c.exp ...
Running libgomp/testsuite/libgomp.c++/c++.exp ...
Running libgomp/testsuite/libgomp.fortran/fortran.exp ...
Running libgomp/testsuite/libgomp.graphite/graphite.exp ...
Running libgomp/testsuite/libgomp.oacc-c/c.exp ...
Running libgomp/testsuite/libgomp.oacc-c++/c++.exp ...
Running libgomp/testsuite/libgomp.oacc-fortran/fortran.exp ...

                 === libgomp Summary ===

# of expected passes            5987
# of expected failures          8
# of unsupported tests          278

Should I still leave the barrier init as is?
Sebastian Huber July 14, 2015, 7:09 a.m. UTC | #4
On 14/07/15 09:04, Sebastian Huber wrote:
> #include <assert.h>
>
> static inline struct gomp_team *
> get_last_team (unsigned nthreads)
> {
>   struct gomp_thread *thr = gomp_thread ();
>   if (thr->ts.team == NULL)
>     {
>       struct gomp_thread_pool *pool = thr->thread_pool;
>       if (pool != NULL)
>     {
>       struct gomp_team *last_team = pool->last_team;
>       if (last_team != NULL && last_team->nthreads == nthreads)
>         {
>           pool->last_team = NULL;
>           assert (last_team->barrier.total == nthreads);
>           assert (last_team->barrier.awaited == nthreads);
>           assert (last_team->barrier.awaited_final == nthreads);
>           assert (last_team->barrier.generation % BAR_INCR != 0);

Sorry, this should be:

assert (last_team->barrier.generation % BAR_INCR == 0);

I pasted the wrong version, since I wanted to check that I get a failure 
in case generation % BAR_INCR != 0.

>           return last_team;
>         }
>     }
>     }
>   return NULL;
> }
Jakub Jelinek July 14, 2015, 7:19 a.m. UTC | #5
On Tue, Jul 14, 2015 at 09:09:01AM +0200, Sebastian Huber wrote:
> I pasted the wrong version, since I wanted to check that I get a failure in
> case generation % BAR_INCR != 0.

There is also config/posix/bar* implementation, which would need to be
checked too.  I'd suggest first committing a patch with both sem and mutex
being destroyed (note, config/posix/bar* has two semaphores in it), and
perhaps if you feel strongly about it an incremental patch to avoid the
destroying/initialization of the barrier if it works even with
config/posix/bar*.

	Jakub
Sebastian Huber July 14, 2015, 9:28 a.m. UTC | #6
On 14/07/15 09:19, Jakub Jelinek wrote:
> On Tue, Jul 14, 2015 at 09:09:01AM +0200, Sebastian Huber wrote:
>> I pasted the wrong version, since I wanted to check that I get a failure in
>> case generation % BAR_INCR != 0.
> There is also config/posix/bar* implementation, which would need to be
> checked too.  I'd suggest first committing a patch with both sem and mutex
> being destroyed (note, config/posix/bar* has two semaphores in it), and
> perhaps if you feel strongly about it an incremental patch to avoid the
> destroying/initialization of the barrier if it works even with
> config/posix/bar*.
>
> 	Jakub

If I destroy the barrier of the last team, and then initialize it later 
in gomp_new_team()

+static inline struct gomp_team *
+get_last_team (unsigned nthreads)
+{
+  struct gomp_thread *thr = gomp_thread ();
+  if (thr->ts.team == NULL)
+    {
+      struct gomp_thread_pool *pool = thr->thread_pool;
+      if (pool != NULL)
+       {
+         struct gomp_team *last_team = pool->last_team;
+         if (last_team != NULL && last_team->nthreads == nthreads)
+           {
+             pool->last_team = NULL;
+             gomp_barrier_destroy (&last_team->barrier);
+             return last_team;
+           }
+       }
+    }
+  return NULL;
+}

then I get test suite failures. Is it safe to destroy the team barrier 
here or do we circumvent the last team mechanism which is supposed to 
delay the destruction?

Example failure:

WARNING: program timed out.
FAIL: libgomp.c/appendix-a/a.15.1.c execution test
Jakub Jelinek July 14, 2015, 10:44 a.m. UTC | #7
On Tue, Jul 14, 2015 at 11:28:18AM +0200, Sebastian Huber wrote:
> If I destroy the barrier of the last team, and then initialize it later in
> gomp_new_team()
> 
> +static inline struct gomp_team *
> +get_last_team (unsigned nthreads)
> +{
> +  struct gomp_thread *thr = gomp_thread ();
> +  if (thr->ts.team == NULL)
> +    {
> +      struct gomp_thread_pool *pool = thr->thread_pool;
> +      if (pool != NULL)
> +       {
> +         struct gomp_team *last_team = pool->last_team;
> +         if (last_team != NULL && last_team->nthreads == nthreads)
> +           {
> +             pool->last_team = NULL;
> +             gomp_barrier_destroy (&last_team->barrier);
> +             return last_team;
> +           }
> +       }
> +    }
> +  return NULL;
> +}
> 
> then I get test suite failures. Is it safe to destroy the team barrier here
> or do we circumvent the last team mechanism which is supposed to delay the
> destruction?

Then you indeed supposedly hit the reason why last_team exists.
Thus, if not reinitializing the team barrier works even with
--disable-linux-futex configured libgomp, I guess it is ok not to
destroy it and reinit it again.

	Jakub
diff mbox

Patch

diff --git a/libgomp/team.c b/libgomp/team.c
index b98b233..0bcbaf8 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -134,6 +134,25 @@  gomp_thread_start (void *xdata)
   return NULL;
 }
 
+static struct gomp_team *
+get_recycable_team (unsigned nthreads)
+{
+  struct gomp_thread *thr = gomp_thread ();
+  if (thr->ts.team == NULL)
+    {
+      struct gomp_thread_pool *pool = thr->thread_pool;
+      if (pool != NULL)
+	{
+	  struct gomp_team *last_team = pool->last_team;
+	  if (last_team != NULL && last_team->nthreads == nthreads)
+	    {
+	      pool->last_team = NULL;
+	      return last_team;
+	    }
+	}
+    }
+  return NULL;
+}
 
 /* Create a new team data structure.  */
 
@@ -141,18 +160,28 @@  struct gomp_team *
 gomp_new_team (unsigned nthreads)
 {
   struct gomp_team *team;
-  size_t size;
   int i;
 
-  size = sizeof (*team) + nthreads * (sizeof (team->ordered_release[0])
-				      + sizeof (team->implicit_task[0]));
-  team = gomp_malloc (size);
+  team = get_recycable_team (nthreads);
+  if (team == NULL)
+    {
+      size_t extra = sizeof (team->ordered_release[0])
+                     + sizeof (team->implicit_task[0]);
+      team = gomp_malloc (sizeof (*team) + nthreads * extra);
+
+#ifndef HAVE_SYNC_BUILTINS
+      gomp_mutex_init (&team->work_share_list_free_lock);
+#endif
+      gomp_barrier_init (&team->barrier, nthreads);
+      gomp_sem_init (&team->master_release, 0);
+      gomp_mutex_init (&team->task_lock);
+
+      team->nthreads = nthreads;
+    }
 
   team->work_share_chunk = 8;
 #ifdef HAVE_SYNC_BUILTINS
   team->single_count = 0;
-#else
-  gomp_mutex_init (&team->work_share_list_free_lock);
 #endif
   team->work_shares_to_free = &team->work_shares[0];
   gomp_init_work_share (&team->work_shares[0], false, nthreads);
@@ -163,14 +192,9 @@  gomp_new_team (unsigned nthreads)
     team->work_shares[i].next_free = &team->work_shares[i + 1];
   team->work_shares[i].next_free = NULL;
 
-  team->nthreads = nthreads;
-  gomp_barrier_init (&team->barrier, nthreads);
-
-  gomp_sem_init (&team->master_release, 0);
   team->ordered_release = (void *) &team->implicit_task[nthreads];
   team->ordered_release[0] = &team->master_release;
 
-  gomp_mutex_init (&team->task_lock);
   team->task_queue = NULL;
   team->task_count = 0;
   team->task_queued_count = 0;
@@ -187,6 +211,10 @@  gomp_new_team (unsigned nthreads)
 static void
 free_team (struct gomp_team *team)
 {
+  gomp_sem_destroy (&team->master_release);
+#ifndef HAVE_SYNC_BUILTINS
+  gomp_mutex_destroy (&team->work_share_list_free_lock);
+#endif
   gomp_barrier_destroy (&team->barrier);
   gomp_mutex_destroy (&team->task_lock);
   free (team);
@@ -894,10 +922,6 @@  gomp_team_end (void)
 	}
       while (ws != NULL);
     }
-  gomp_sem_destroy (&team->master_release);
-#ifndef HAVE_SYNC_BUILTINS
-  gomp_mutex_destroy (&team->work_share_list_free_lock);
-#endif
 
   if (__builtin_expect (thr->ts.team != NULL, 0)
       || __builtin_expect (team->nthreads == 1, 0))