diff mbox

[RFC,02/26] task_work: Replace spin_unlock_wait() with lock/unlock pair

Message ID 20170630152010.GA6935@redhat.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Oleg Nesterov June 30, 2017, 3:20 p.m. UTC
On 06/30, Paul E. McKenney wrote:
>
> > > +		raw_spin_lock_irq(&task->pi_lock);
> > > +		raw_spin_unlock_irq(&task->pi_lock);
>
> I agree that the spin_unlock_wait() implementations would avoid the
> deadlock with an acquisition from an interrupt handler, while also
> avoiding the need to momentarily disable interrupts.  The ->pi_lock is
> a per-task lock, so I am assuming (perhaps naively) that contention is
> not a problem.  So is the overhead of interrupt disabling likely to be
> noticeable here?

I do not think the overhead will be noticeable in this particular case.

But I am not sure I understand why do we want to unlock_wait. Yes I agree,
it has some problems, but still...

The code above looks strange for me. If we are going to repeat this pattern
the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)

If not, we should probably change this code more:


performance-wise this is almost the same, and if we do not really care about
overhead we can simplify the code: this way it is obvious that we can't race
with task_work_cancel().

Oleg.

Comments

Paul E. McKenney June 30, 2017, 4:16 p.m. UTC | #1
On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> On 06/30, Paul E. McKenney wrote:
> >
> > > > +		raw_spin_lock_irq(&task->pi_lock);
> > > > +		raw_spin_unlock_irq(&task->pi_lock);
> >
> > I agree that the spin_unlock_wait() implementations would avoid the
> > deadlock with an acquisition from an interrupt handler, while also
> > avoiding the need to momentarily disable interrupts.  The ->pi_lock is
> > a per-task lock, so I am assuming (perhaps naively) that contention is
> > not a problem.  So is the overhead of interrupt disabling likely to be
> > noticeable here?
> 
> I do not think the overhead will be noticeable in this particular case.
> 
> But I am not sure I understand why do we want to unlock_wait. Yes I agree,
> it has some problems, but still...
> 
> The code above looks strange for me. If we are going to repeat this pattern
> the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> 
> If not, we should probably change this code more:

This looks -much- better than my patch!  May I have your Signed-off-by?

							Thanx, Paul

> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -96,20 +96,16 @@ void task_work_run(void)
>  		 * work->func() can do task_work_add(), do not set
>  		 * work_exited unless the list is empty.
>  		 */
> +		raw_spin_lock_irq(&task->pi_lock);
>  		do {
>  			work = READ_ONCE(task->task_works);
>  			head = !work && (task->flags & PF_EXITING) ?
>  				&work_exited : NULL;
>  		} while (cmpxchg(&task->task_works, work, head) != work);
> +		raw_spin_unlock_irq(&task->pi_lock);
> 
>  		if (!work)
>  			break;
> -		/*
> -		 * Synchronize with task_work_cancel(). It can't remove
> -		 * the first entry == work, cmpxchg(task_works) should
> -		 * fail, but it can play with *work and other entries.
> -		 */
> -		raw_spin_unlock_wait(&task->pi_lock);
> 
>  		do {
>  			next = work->next;
> 
> performance-wise this is almost the same, and if we do not really care about
> overhead we can simplify the code: this way it is obvious that we can't race
> with task_work_cancel().
> 
> Oleg.
>
Paul E. McKenney June 30, 2017, 5:21 p.m. UTC | #2
On Fri, Jun 30, 2017 at 09:16:07AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > On 06/30, Paul E. McKenney wrote:
> > >
> > > > > +		raw_spin_lock_irq(&task->pi_lock);
> > > > > +		raw_spin_unlock_irq(&task->pi_lock);
> > >
> > > I agree that the spin_unlock_wait() implementations would avoid the
> > > deadlock with an acquisition from an interrupt handler, while also
> > > avoiding the need to momentarily disable interrupts.  The ->pi_lock is
> > > a per-task lock, so I am assuming (perhaps naively) that contention is
> > > not a problem.  So is the overhead of interrupt disabling likely to be
> > > noticeable here?
> > 
> > I do not think the overhead will be noticeable in this particular case.
> > 
> > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
> > it has some problems, but still...

Well, I tried documenting exactly what it did and did not do, which got
an ack from Peter.

	https://marc.info/?l=linux-kernel&m=149575078313105

However, my later pull request spawned a bit of discussion:

	https://marc.info/?l=linux-kernel&m=149730349001044

This discussion led me to propose strengthening spin_unlock_wait()
to act as a lock/unlock pair.  This can be implemented on x86 as
an smp_mb() followed by a read-only spinloop, as shown on branch
spin_unlock_wait.2017.06.23a on my -rcu tree.

Linus was not amused, and said that if we were going to make
spin_unlock_wait() have the semantics of lock+unlock, we should just
open-code that, especially given that there are way more definitions
of spin_unlock_wait() than there are uses.  He also suggested making
spin_unlock_wait() have only acquire semantics (x86 spin loop with
no memory-barrier instructions) and add explicit barriers where
required.

	https://marc.info/?l=linux-kernel&m=149860012913036

I did a series for this which may be found on branch
spin_unlock_wait.2017.06.27a on my -rcu tree.

This approach was not loved by others (see later on the above thread), and
Linus's reply (which reiterated his opposition to lock+unlock semantics)
suggested the possibility of removing spin_unlock_wait() entirely.

	https://marc.info/?l=linux-kernel&m=149869476911620

So I figured, in for a penny, in for a pound, and therefore did the series
that includes this patch.  The most recent update (which does not yet
include your improved version) is on branch spin_unlock_wait.2017.06.30b
of my -rcu tree.

Hey, you asked!  ;-)

							Thanx, Paul

> > The code above looks strange for me. If we are going to repeat this pattern
> > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> > 
> > If not, we should probably change this code more:
> 
> This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> 							Thanx, Paul
> 
> > --- a/kernel/task_work.c
> > +++ b/kernel/task_work.c
> > @@ -96,20 +96,16 @@ void task_work_run(void)
> >  		 * work->func() can do task_work_add(), do not set
> >  		 * work_exited unless the list is empty.
> >  		 */
> > +		raw_spin_lock_irq(&task->pi_lock);
> >  		do {
> >  			work = READ_ONCE(task->task_works);
> >  			head = !work && (task->flags & PF_EXITING) ?
> >  				&work_exited : NULL;
> >  		} while (cmpxchg(&task->task_works, work, head) != work);
> > +		raw_spin_unlock_irq(&task->pi_lock);
> > 
> >  		if (!work)
> >  			break;
> > -		/*
> > -		 * Synchronize with task_work_cancel(). It can't remove
> > -		 * the first entry == work, cmpxchg(task_works) should
> > -		 * fail, but it can play with *work and other entries.
> > -		 */
> > -		raw_spin_unlock_wait(&task->pi_lock);
> > 
> >  		do {
> >  			next = work->next;
> > 
> > performance-wise this is almost the same, and if we do not really care about
> > overhead we can simplify the code: this way it is obvious that we can't race
> > with task_work_cancel().
> > 
> > Oleg.
> >
Oleg Nesterov June 30, 2017, 7:21 p.m. UTC | #3
On 06/30, Paul E. McKenney wrote:
>
> On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> >
> > I do not think the overhead will be noticeable in this particular case.
> >
> > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".

> > it has some problems, but still...
> >
> > The code above looks strange for me. If we are going to repeat this pattern
> > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> >
> > If not, we should probably change this code more:
>
> This looks -much- better than my patch!  May I have your Signed-off-by?

Only if you promise to replace all RCU flavors with a single simple implementation
based on rwlock ;)

Seriously, of course I won't argue, and it seems that nobody except me likes
this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
sometimes it makes sense.

Including this particular case. task_work_run() is going to flush/destroy the
->task_works list, so it needs to wait until all currently executing "readers"
(task_work_cancel()'s which have started before ->task_works was updated) have
completed.

Oleg.
Alan Stern June 30, 2017, 7:50 p.m. UTC | #4
On Fri, 30 Jun 2017, Oleg Nesterov wrote:

> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this pattern
> > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple implementation
> based on rwlock ;)
> 
> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

If it looks like synchronize_rcu(), why not actually use 
synchronize_rcu()?

Alan Stern

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.
Paul E. McKenney June 30, 2017, 8:02 p.m. UTC | #5
On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote:
> On 06/30, Paul E. McKenney wrote:
> >
> > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > >
> > > I do not think the overhead will be noticeable in this particular case.
> > >
> > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
>                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> 
> > > it has some problems, but still...
> > >
> > > The code above looks strange for me. If we are going to repeat this pattern
> > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> > >
> > > If not, we should probably change this code more:
> >
> > This looks -much- better than my patch!  May I have your Signed-off-by?
> 
> Only if you promise to replace all RCU flavors with a single simple implementation
> based on rwlock ;)

;-) ;-) ;-)

Here you go:

https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf

> Seriously, of course I won't argue, and it seems that nobody except me likes
> this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> sometimes it makes sense.

Well, that analogy was what led me to propose that its semantics be
defined as spin_lock() immediately followed by spin_unlock().  But that
didn't go over well.

> Including this particular case. task_work_run() is going to flush/destroy the
> ->task_works list, so it needs to wait until all currently executing "readers"
> (task_work_cancel()'s which have started before ->task_works was updated) have
> completed.

Understood!

							Thanx, Paul
Paul E. McKenney June 30, 2017, 8:04 p.m. UTC | #6
On Fri, Jun 30, 2017 at 03:50:33PM -0400, Alan Stern wrote:
> On Fri, 30 Jun 2017, Oleg Nesterov wrote:
> 
> > On 06/30, Paul E. McKenney wrote:
> > >
> > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote:
> > > >
> > > > I do not think the overhead will be noticeable in this particular case.
> > > >
> > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree,
> >                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait".
> > 
> > > > it has some problems, but still...
> > > >
> > > > The code above looks strange for me. If we are going to repeat this pattern
> > > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;)
> > > >
> > > > If not, we should probably change this code more:
> > >
> > > This looks -much- better than my patch!  May I have your Signed-off-by?
> > 
> > Only if you promise to replace all RCU flavors with a single simple implementation
> > based on rwlock ;)
> > 
> > Seriously, of course I won't argue, and it seems that nobody except me likes
> > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and
> > sometimes it makes sense.
> 
> If it looks like synchronize_rcu(), why not actually use 
> synchronize_rcu()?

My guess is that the latencies of synchronize_rcu() don't suit his needs.
When the lock is not held, spin_unlock_wait() is quite fast, even
compared to expedited grace periods.

							Thanx, Paul

> Alan Stern
> 
> > Including this particular case. task_work_run() is going to flush/destroy the
> > ->task_works list, so it needs to wait until all currently executing "readers"
> > (task_work_cancel()'s which have started before ->task_works was updated) have
> > completed.
>
diff mbox

Patch

--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -96,20 +96,16 @@  void task_work_run(void)
 		 * work->func() can do task_work_add(), do not set
 		 * work_exited unless the list is empty.
 		 */
+		raw_spin_lock_irq(&task->pi_lock);
 		do {
 			work = READ_ONCE(task->task_works);
 			head = !work && (task->flags & PF_EXITING) ?
 				&work_exited : NULL;
 		} while (cmpxchg(&task->task_works, work, head) != work);
+		raw_spin_unlock_irq(&task->pi_lock);
 
 		if (!work)
 			break;
-		/*
-		 * Synchronize with task_work_cancel(). It can't remove
-		 * the first entry == work, cmpxchg(task_works) should
-		 * fail, but it can play with *work and other entries.
-		 */
-		raw_spin_unlock_wait(&task->pi_lock);
 
 		do {
 			next = work->next;