Message ID | 20120420060101.GA16563@zhy |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On 4/19/2012 11:01 PM, Yong Zhang wrote: > On Fri, Apr 20, 2012 at 01:26:33PM +0800, Yong Zhang wrote: >> On Thu, Apr 19, 2012 at 11:36:32AM -0700, Stephen Boyd wrote: >>> Does looking at the second patch help? Basically schedule_work() can run >>> the callback right between the time the mutex is acquired and >>> flush_work() is called: >>> >>> CPU0 CPU1 >>> >>> <irq> >>> schedule_work() mutex_lock(&mutex) >>> <irq return> >>> my_work() flush_work() >>> mutex_lock(&mutex) >>> <deadlock> >> Get you point. It is a problem. But your patch could introduece false >> positive since when flush_work() is called that very work may finish >> running already. >> >> So I think we need the lock_map_acquire()/lock_map_release() only when >> the work is under processing, no? > But start_flush_work() has tried take care of this issue except it > doesn't add work->lockdep_map into the chain. > > So does below patch help? > [snip] > @@ -2461,6 +2461,8 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, > lock_map_acquire(&cwq->wq->lockdep_map); > else > lock_map_acquire_read(&cwq->wq->lockdep_map); > + lock_map_acquire(&work->lockdep_map); > + lock_map_release(&work->lockdep_map); > lock_map_release(&cwq->wq->lockdep_map); > > return true; No this doesn't help. The whole point of the patch is to get lockdep to complain in the case where the work is not queued. That case is not a false positive. We will get a lockdep warning if the work is running (when start_flush_work() returns true) solely with the lock_map_acquire() on cwq->wq->lockdep_map.
On Thu, Apr 19, 2012 at 11:26:47PM -0700, Stephen Boyd wrote: > complain in the case where the work is not queued. That case is not a > false positive. We will get a lockdep warning if the work is running IIRC, flush_work() is just a nop when a work is not queued nor running. > (when start_flush_work() returns true) solely with the > lock_map_acquire() on cwq->wq->lockdep_map. Yeah, that is the point we use lockdep to detect deadlock for workqueue. But when looking at start_flush_work(), for some case !(cwq->wq->saved_max_active == 1 || cwq->wq->flags & WQ_RESCUER), lock_map_acquire_read() is called, but recursive read is not added to the chain list. So when lock_map_acquire_read(&cwq->wq->lockdep_map) is called, deadlock will not be detected. I hope you don't hit that special case. Thanks, Yong -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 4/20/2012 12:18 AM, Yong Zhang wrote: > On Thu, Apr 19, 2012 at 11:26:47PM -0700, Stephen Boyd wrote: >> complain in the case where the work is not queued. That case is not a >> false positive. We will get a lockdep warning if the work is running > IIRC, flush_work() is just a nop when a work is not queued nor running. Agreed, but it's better to always print a lockdep warning instead of only when the deadlock is going to occur. > >> (when start_flush_work() returns true) solely with the >> lock_map_acquire() on cwq->wq->lockdep_map. > Yeah, that is the point we use lockdep to detect deadlock for workqueue. > > But when looking at start_flush_work(), for some case > !(cwq->wq->saved_max_active == 1 || cwq->wq->flags & WQ_RESCUER), > lock_map_acquire_read() is called, but recursive read is not added to > the chain list. So when lock_map_acquire_read(&cwq->wq->lockdep_map) > is called, deadlock will not be detected. I hope you don't hit that > special case. Hmm. Originally I had what you suggested in my patch but I left it out because I wasn't sure if it would cause false positives. Do you see any possibility for false positives? I'll add it back in if not.
On Fri, Apr 20, 2012 at 01:18:19AM -0700, Stephen Boyd wrote: > On 4/20/2012 12:18 AM, Yong Zhang wrote: > > On Thu, Apr 19, 2012 at 11:26:47PM -0700, Stephen Boyd wrote: > >> complain in the case where the work is not queued. That case is not a > >> false positive. We will get a lockdep warning if the work is running > > IIRC, flush_work() is just a nop when a work is not queued nor running. > > Agreed, but it's better to always print a lockdep warning instead of > only when the deadlock is going to occur. It will IMHO. > > > > >> (when start_flush_work() returns true) solely with the > >> lock_map_acquire() on cwq->wq->lockdep_map. > > Yeah, that is the point we use lockdep to detect deadlock for workqueue. > > > > But when looking at start_flush_work(), for some case > > !(cwq->wq->saved_max_active == 1 || cwq->wq->flags & WQ_RESCUER), > > lock_map_acquire_read() is called, but recursive read is not added to > > the chain list. So when lock_map_acquire_read(&cwq->wq->lockdep_map) > > is called, deadlock will not be detected. I hope you don't hit that > > special case. > > Hmm. Originally I had what you suggested in my patch but I left it out > because I wasn't sure if it would cause false positives. > Do you see any > possibility for false positives? No, I don't. My test indeed show nothing (just build and boot). >I'll add it back in if not. It's great if you can try it :) Thanks, Yong -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 20, 2012 at 4:32 PM, Yong Zhang <yong.zhang0@gmail.com> wrote: > On Fri, Apr 20, 2012 at 01:18:19AM -0700, Stephen Boyd wrote: >> On 4/20/2012 12:18 AM, Yong Zhang wrote: >> > On Thu, Apr 19, 2012 at 11:26:47PM -0700, Stephen Boyd wrote: >> >> complain in the case where the work is not queued. That case is not a >> >> false positive. We will get a lockdep warning if the work is running >> > IIRC, flush_work() is just a nop when a work is not queued nor running. >> >> Agreed, but it's better to always print a lockdep warning instead of >> only when the deadlock is going to occur. > > It will IMHO. After reconsidering this issue, I recognize I'm wrong from the beginning. My suggestion will only warn on the real deadlock. It doesn't follow what lockdep want to achieve. Sorry for that. BTW, your latest patch should work :) Thanks, Yong > >> >> > >> >> (when start_flush_work() returns true) solely with the >> >> lock_map_acquire() on cwq->wq->lockdep_map. >> > Yeah, that is the point we use lockdep to detect deadlock for workqueue. >> > >> > But when looking at start_flush_work(), for some case >> > !(cwq->wq->saved_max_active == 1 || cwq->wq->flags & WQ_RESCUER), >> > lock_map_acquire_read() is called, but recursive read is not added to >> > the chain list. So when lock_map_acquire_read(&cwq->wq->lockdep_map) >> > is called, deadlock will not be detected. I hope you don't hit that >> > special case. >> >> Hmm. Originally I had what you suggested in my patch but I left it out >> because I wasn't sure if it would cause false positives. >> Do you see any >> possibility for false positives? > > No, I don't. My test indeed show nothing (just build and boot). > >>I'll add it back in if not. > > It's great if you can try it :) > > Thanks, > Yong
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index bc867e8..c096b05 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2461,6 +2461,8 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, lock_map_acquire(&cwq->wq->lockdep_map); else lock_map_acquire_read(&cwq->wq->lockdep_map); + lock_map_acquire(&work->lockdep_map); + lock_map_release(&work->lockdep_map); lock_map_release(&cwq->wq->lockdep_map); return true;