Message ID | 20181018084313.oopu34iwfwgkcwwc@linutronix.de |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | virtio_net: add local_bh_disable() around u64_stats_update_begin | expand |
On 2018/10/18 17:43, Sebastian Andrzej Siewior wrote: > on 32bit, lockdep notices that virtnet_open() and refill_work() invoke > try_fill_recv() from process context while virtnet_receive() invokes the > same function from BH context. The problem that the seqcounter within > u64_stats_update_begin() may deadlock if it is interrupted by BH and > then acquired again. > > Introduce u64_stats_update_begin_bh() which disables BH on 32bit > architectures. Since the BH might interrupt the worker, this new > function should not limited to SMP like the others which are expected > to be used in softirq. > > With this change we might lose increments but this is okay. The > important part that the two 32bit parts of the 64bit counter are not > corrupted. > > Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats"). > Suggested-by: Stephen Hemminger <stephen@networkplumber.org> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> NACK. Again, this race should not happen because of NAPI guard. We need to investigate why this warning happened.
On 2018-10-18 18:06:57 [+0900], Toshiaki Makita wrote: > NACK. Again, this race should not happen because of NAPI guard. > We need to investigate why this warning happened. I tried to explain this. Please see 20181018090812.rry5qgnqxxrjxaii@linutronix.de Sebastian
On 2018/10/18 18:11, Sebastian Andrzej Siewior wrote: > On 2018-10-18 18:06:57 [+0900], Toshiaki Makita wrote: >> NACK. Again, this race should not happen because of NAPI guard. >> We need to investigate why this warning happened. > > I tried to explain this. Please see > 20181018090812.rry5qgnqxxrjxaii@linutronix.de Anyway, if this really happens, then it means try_fill_recv() can be called concurrently for the same rq, which looks like a far more severe problem to me. If so, we need to fix it instead.
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Date: Thu, 18 Oct 2018 10:43:13 +0200 > on 32bit, lockdep notices that virtnet_open() and refill_work() invoke > try_fill_recv() from process context while virtnet_receive() invokes the > same function from BH context. The problem that the seqcounter within > u64_stats_update_begin() may deadlock if it is interrupted by BH and > then acquired again. > > Introduce u64_stats_update_begin_bh() which disables BH on 32bit > architectures. Since the BH might interrupt the worker, this new > function should not limited to SMP like the others which are expected > to be used in softirq. > > With this change we might lose increments but this is okay. The > important part that the two 32bit parts of the 64bit counter are not > corrupted. > > Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats"). > Suggested-by: Stephen Hemminger <stephen@networkplumber.org> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Trying to get down to the bottom of this: 1) virtnet_receive() runs from softirq but only if NAPI is active and enabled. It is in this context that it invokes try_fill_recv(). 2) refill_work() runs from process context, but disables NAPI (and thus invocation of virtnet_receive()) before calling try_fill_recv(). 3) virtnet_open() invokes from process context as well, but before the NAPI instances are enabled, it is same as case #2. 4) virtnet_restore_up() is the same situations as #3. Therefore I agree that this is a false positive, and simply lockdep cannot see the NAPI synchronization done by case #2. I think we shouldn't add unnecessary BH disabling here, and instead find some way to annotate this for lockdep's sake. Thank you.
On 2018/10/19 上午7:23, David Miller wrote: > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Date: Thu, 18 Oct 2018 10:43:13 +0200 > >> on 32bit, lockdep notices that virtnet_open() and refill_work() invoke >> try_fill_recv() from process context while virtnet_receive() invokes the >> same function from BH context. The problem that the seqcounter within >> u64_stats_update_begin() may deadlock if it is interrupted by BH and >> then acquired again. >> >> Introduce u64_stats_update_begin_bh() which disables BH on 32bit >> architectures. Since the BH might interrupt the worker, this new >> function should not limited to SMP like the others which are expected >> to be used in softirq. >> >> With this change we might lose increments but this is okay. The >> important part that the two 32bit parts of the 64bit counter are not >> corrupted. >> >> Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats"). >> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Trying to get down to the bottom of this: > > 1) virtnet_receive() runs from softirq but only if NAPI is active and > enabled. It is in this context that it invokes try_fill_recv(). > > 2) refill_work() runs from process context, but disables NAPI (and > thus invocation of virtnet_receive()) before calling > try_fill_recv(). > > 3) virtnet_open() invokes from process context as well, but before the > NAPI instances are enabled, it is same as case #2. > > 4) virtnet_restore_up() is the same situations as #3. > > Therefore I agree that this is a false positive, and simply lockdep > cannot see the NAPI synchronization done by case #2. > > I think we shouldn't add unnecessary BH disabling here, and instead > find some way to annotate this for lockdep's sake. > > Thank you. > +1 Thanks
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index dab504ec5e502..fbcfb4d272336 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1206,9 +1206,9 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, break; } while (rq->vq->num_free); if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) { - u64_stats_update_begin(&rq->stats.syncp); + u64_stats_update_begin_bh(&rq->stats.syncp); rq->stats.kicks++; - u64_stats_update_end(&rq->stats.syncp); + u64_stats_update_end_bh(&rq->stats.syncp); } return !oom; diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h index a27604f99ed04..46b6ad6175628 100644 --- a/include/linux/u64_stats_sync.h +++ b/include/linux/u64_stats_sync.h @@ -90,6 +90,22 @@ static inline void u64_stats_update_end(struct u64_stats_sync *syncp) #endif } +static inline void u64_stats_update_begin_bh(struct u64_stats_sync *syncp) +{ +#if BITS_PER_LONG==32 + local_bh_disable(); + write_seqcount_begin(&syncp->seq); +#endif +} + +static inline void u64_stats_update_end_bh(struct u64_stats_sync *syncp) +{ +#if BITS_PER_LONG==32 + write_seqcount_end(&syncp->seq); + local_bh_enable(); +#endif +} + static inline unsigned long u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp) {
on 32bit, lockdep notices that virtnet_open() and refill_work() invoke try_fill_recv() from process context while virtnet_receive() invokes the same function from BH context. The problem that the seqcounter within u64_stats_update_begin() may deadlock if it is interrupted by BH and then acquired again. Introduce u64_stats_update_begin_bh() which disables BH on 32bit architectures. Since the BH might interrupt the worker, this new function should not limited to SMP like the others which are expected to be used in softirq. With this change we might lose increments but this is okay. The important part that the two 32bit parts of the 64bit counter are not corrupted. Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats"). Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/net/virtio_net.c | 4 ++-- include/linux/u64_stats_sync.h | 16 ++++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-)