Message ID | 20170630152010.GA6935@redhat.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > > > + raw_spin_lock_irq(&task->pi_lock); > > > > + raw_spin_unlock_irq(&task->pi_lock); > > > > I agree that the spin_unlock_wait() implementations would avoid the > > deadlock with an acquisition from an interrupt handler, while also > > avoiding the need to momentarily disable interrupts. The ->pi_lock is > > a per-task lock, so I am assuming (perhaps naively) that contention is > > not a problem. So is the overhead of interrupt disabling likely to be > > noticeable here? > > I do not think the overhead will be noticeable in this particular case. > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > it has some problems, but still... > > The code above looks strange for me. If we are going to repeat this pattern > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > If not, we should probably change this code more: This looks -much- better than my patch! May I have your Signed-off-by? Thanx, Paul > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -96,20 +96,16 @@ void task_work_run(void) > * work->func() can do task_work_add(), do not set > * work_exited unless the list is empty. > */ > + raw_spin_lock_irq(&task->pi_lock); > do { > work = READ_ONCE(task->task_works); > head = !work && (task->flags & PF_EXITING) ? > &work_exited : NULL; > } while (cmpxchg(&task->task_works, work, head) != work); > + raw_spin_unlock_irq(&task->pi_lock); > > if (!work) > break; > - /* > - * Synchronize with task_work_cancel(). It can't remove > - * the first entry == work, cmpxchg(task_works) should > - * fail, but it can play with *work and other entries. > - */ > - raw_spin_unlock_wait(&task->pi_lock); > > do { > next = work->next; > > performance-wise this is almost the same, and if we do not really care about > overhead we can simplify the code: this way it is obvious that we can't race > with task_work_cancel(). > > Oleg. >
On Fri, Jun 30, 2017 at 09:16:07AM -0700, Paul E. McKenney wrote: > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > On 06/30, Paul E. McKenney wrote: > > > > > > > > + raw_spin_lock_irq(&task->pi_lock); > > > > > + raw_spin_unlock_irq(&task->pi_lock); > > > > > > I agree that the spin_unlock_wait() implementations would avoid the > > > deadlock with an acquisition from an interrupt handler, while also > > > avoiding the need to momentarily disable interrupts. The ->pi_lock is > > > a per-task lock, so I am assuming (perhaps naively) that contention is > > > not a problem. So is the overhead of interrupt disabling likely to be > > > noticeable here? > > > > I do not think the overhead will be noticeable in this particular case. > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > > it has some problems, but still... Well, I tried documenting exactly what it did and did not do, which got an ack from Peter. https://marc.info/?l=linux-kernel&m=149575078313105 However, my later pull request spawned a bit of discussion: https://marc.info/?l=linux-kernel&m=149730349001044 This discussion led me to propose strengthening spin_unlock_wait() to act as a lock/unlock pair. This can be implemented on x86 as an smp_mb() followed by a read-only spinloop, as shown on branch spin_unlock_wait.2017.06.23a on my -rcu tree. Linus was not amused, and said that if we were going to make spin_unlock_wait() have the semantics of lock+unlock, we should just open-code that, especially given that there are way more definitions of spin_unlock_wait() than there are uses. He also suggested making spin_unlock_wait() have only acquire semantics (x86 spin loop with no memory-barrier instructions) and add explicit barriers where required. https://marc.info/?l=linux-kernel&m=149860012913036 I did a series for this which may be found on branch spin_unlock_wait.2017.06.27a on my -rcu tree. This approach was not loved by others (see later on the above thread), and Linus's reply (which reiterated his opposition to lock+unlock semantics) suggested the possibility of removing spin_unlock_wait() entirely. https://marc.info/?l=linux-kernel&m=149869476911620 So I figured, in for a penny, in for a pound, and therefore did the series that includes this patch. The most recent update (which does not yet include your improved version) is on branch spin_unlock_wait.2017.06.30b of my -rcu tree. Hey, you asked! ;-) Thanx, Paul > > The code above looks strange for me. If we are going to repeat this pattern > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > > > If not, we should probably change this code more: > > This looks -much- better than my patch! May I have your Signed-off-by? > > Thanx, Paul > > > --- a/kernel/task_work.c > > +++ b/kernel/task_work.c > > @@ -96,20 +96,16 @@ void task_work_run(void) > > * work->func() can do task_work_add(), do not set > > * work_exited unless the list is empty. > > */ > > + raw_spin_lock_irq(&task->pi_lock); > > do { > > work = READ_ONCE(task->task_works); > > head = !work && (task->flags & PF_EXITING) ? > > &work_exited : NULL; > > } while (cmpxchg(&task->task_works, work, head) != work); > > + raw_spin_unlock_irq(&task->pi_lock); > > > > if (!work) > > break; > > - /* > > - * Synchronize with task_work_cancel(). It can't remove > > - * the first entry == work, cmpxchg(task_works) should > > - * fail, but it can play with *work and other entries. > > - */ > > - raw_spin_unlock_wait(&task->pi_lock); > > > > do { > > next = work->next; > > > > performance-wise this is almost the same, and if we do not really care about > > overhead we can simplify the code: this way it is obvious that we can't race > > with task_work_cancel(). > > > > Oleg. > >
On 06/30, Paul E. McKenney wrote: > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > I do not think the overhead will be noticeable in this particular case. > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > it has some problems, but still... > > > > The code above looks strange for me. If we are going to repeat this pattern > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > > > If not, we should probably change this code more: > > This looks -much- better than my patch! May I have your Signed-off-by? Only if you promise to replace all RCU flavors with a single simple implementation based on rwlock ;) Seriously, of course I won't argue, and it seems that nobody except me likes this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and sometimes it makes sense. Including this particular case. task_work_run() is going to flush/destroy the ->task_works list, so it needs to wait until all currently executing "readers" (task_work_cancel()'s which have started before ->task_works was updated) have completed. Oleg.
On Fri, 30 Jun 2017, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > > > it has some problems, but still... > > > > > > The code above looks strange for me. If we are going to repeat this pattern > > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > > > > > If not, we should probably change this code more: > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > Only if you promise to replace all RCU flavors with a single simple implementation > based on rwlock ;) > > Seriously, of course I won't argue, and it seems that nobody except me likes > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and > sometimes it makes sense. If it looks like synchronize_rcu(), why not actually use synchronize_rcu()? Alan Stern > Including this particular case. task_work_run() is going to flush/destroy the > ->task_works list, so it needs to wait until all currently executing "readers" > (task_work_cancel()'s which have started before ->task_works was updated) have > completed.
On Fri, Jun 30, 2017 at 09:21:23PM +0200, Oleg Nesterov wrote: > On 06/30, Paul E. McKenney wrote: > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > > > it has some problems, but still... > > > > > > The code above looks strange for me. If we are going to repeat this pattern > > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > > > > > If not, we should probably change this code more: > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > Only if you promise to replace all RCU flavors with a single simple implementation > based on rwlock ;) ;-) ;-) ;-) Here you go: https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/poormanurcu-2015.pdf > Seriously, of course I won't argue, and it seems that nobody except me likes > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and > sometimes it makes sense. Well, that analogy was what led me to propose that its semantics be defined as spin_lock() immediately followed by spin_unlock(). But that didn't go over well. > Including this particular case. task_work_run() is going to flush/destroy the > ->task_works list, so it needs to wait until all currently executing "readers" > (task_work_cancel()'s which have started before ->task_works was updated) have > completed. Understood! Thanx, Paul
On Fri, Jun 30, 2017 at 03:50:33PM -0400, Alan Stern wrote: > On Fri, 30 Jun 2017, Oleg Nesterov wrote: > > > On 06/30, Paul E. McKenney wrote: > > > > > > On Fri, Jun 30, 2017 at 05:20:10PM +0200, Oleg Nesterov wrote: > > > > > > > > I do not think the overhead will be noticeable in this particular case. > > > > > > > > But I am not sure I understand why do we want to unlock_wait. Yes I agree, > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > if it was not clear, I tried to say "why do we want to _remove_ unlock_wait". > > > > > > it has some problems, but still... > > > > > > > > The code above looks strange for me. If we are going to repeat this pattern > > > > the perhaps we should add a helper for lock+unlock and name it unlock_wait2 ;) > > > > > > > > If not, we should probably change this code more: > > > > > > This looks -much- better than my patch! May I have your Signed-off-by? > > > > Only if you promise to replace all RCU flavors with a single simple implementation > > based on rwlock ;) > > > > Seriously, of course I won't argue, and it seems that nobody except me likes > > this primitive, but to me spin_unlock_wait() looks like synchronize_rcu(() and > > sometimes it makes sense. > > If it looks like synchronize_rcu(), why not actually use > synchronize_rcu()? My guess is that the latencies of synchronize_rcu() don't suit his needs. When the lock is not held, spin_unlock_wait() is quite fast, even compared to expedited grace periods. Thanx, Paul > Alan Stern > > > Including this particular case. task_work_run() is going to flush/destroy the > > ->task_works list, so it needs to wait until all currently executing "readers" > > (task_work_cancel()'s which have started before ->task_works was updated) have > > completed. >
--- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -96,20 +96,16 @@ void task_work_run(void) * work->func() can do task_work_add(), do not set * work_exited unless the list is empty. */ + raw_spin_lock_irq(&task->pi_lock); do { work = READ_ONCE(task->task_works); head = !work && (task->flags & PF_EXITING) ? &work_exited : NULL; } while (cmpxchg(&task->task_works, work, head) != work); + raw_spin_unlock_irq(&task->pi_lock); if (!work) break; - /* - * Synchronize with task_work_cancel(). It can't remove - * the first entry == work, cmpxchg(task_works) should - * fail, but it can play with *work and other entries. - */ - raw_spin_unlock_wait(&task->pi_lock); do { next = work->next;