Message ID | 20130101210033.GA13255@dcvr.yhbt.net |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Eric Wong <normalperson@yhbt.net> wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > commit 626cf236608505d376e4799adb4f7eb00a8594af should not have this > > side effect, at least for poll()/select() functions. The epoll() changes > > I am not yet very confident. > > I have a better explanation of the epoll problem below. > > An alternate version (limited to epoll) would be: > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index cd96649..ca5f3d0 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even > * Get current event bits. We can safely use the file* here because > * its usage count has been increased by the caller of this function. > */ > + smp_mb(); > revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt); > > /* > > > I suspect a race already existed before this commit, it would be nice to > > track it properly. > > I don't believe this race existed before that change. I was wrong, rereading 626cf236608505d376e4799adb4f7eb00a8594af, I think this race existed before. Perhaps my alternate patch above is a better fix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 1, 2013 at 1:17 PM, Eric Wong <normalperson@yhbt.net> wrote: >> >> An alternate version (limited to epoll) would be: >> >> diff --git a/fs/eventpoll.c b/fs/eventpoll.c >> index cd96649..ca5f3d0 100644 >> --- a/fs/eventpoll.c >> +++ b/fs/eventpoll.c >> @@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even >> * Get current event bits. We can safely use the file* here because >> * its usage count has been increased by the caller of this function. >> */ >> + smp_mb(); >> revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt); >> >> /* > > I was wrong, rereading 626cf236608505d376e4799adb4f7eb00a8594af, > I think this race existed before. > > Perhaps my alternate patch above is a better fix. Please document the barrier that this mb() pairs with, and then give an explanation for the fix in the commit message, and I'll happily take it. Even if it's just duplicating the comments above the wq_has_sleeper() function, except modified for the ep_modify() case. Of course, it would be good to get verification from Jason and Andreas that the alternate patch also works for them. Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi all, The alternate patch from Eric works well too. Even though I didn't see a performance boost compared with the old version, this one is clearer to me. Thanks your guys. Cheers! --Jason On Tue, Jan 1, 2013 at 5:53 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Jan 1, 2013 at 1:17 PM, Eric Wong <normalperson@yhbt.net> wrote: >>> >>> An alternate version (limited to epoll) would be: >>> >>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c >>> index cd96649..ca5f3d0 100644 >>> --- a/fs/eventpoll.c >>> +++ b/fs/eventpoll.c >>> @@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even >>> * Get current event bits. We can safely use the file* here because >>> * its usage count has been increased by the caller of this function. >>> */ >>> + smp_mb(); >>> revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt); >>> >>> /* >> >> I was wrong, rereading 626cf236608505d376e4799adb4f7eb00a8594af, >> I think this race existed before. >> >> Perhaps my alternate patch above is a better fix. > > Please document the barrier that this mb() pairs with, and then give > an explanation for the fix in the commit message, and I'll happily > take it. Even if it's just duplicating the comments above the > wq_has_sleeper() function, except modified for the ep_modify() case. > > Of course, it would be good to get verification from Jason and Andreas > that the alternate patch also works for them. > > Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cd96649..ca5f3d0 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1299,6 +1299,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * Get current event bits. We can safely use the file* here because * its usage count has been increased by the caller of this function. */ + smp_mb(); revents = epi->ffd.file->f_op->poll(epi->ffd.file, &pt); /* > I suspect a race already existed before this commit, it would be nice to > track it properly. I don't believe this race existed before that change. Updated commit message below: From 87bca82bc39a941d9b8d5b8bc08b39a071a9884f Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Mon, 31 Dec 2012 13:20:23 +0000 Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD ep_modify() works on files that are already registered with a wait queue (and thus should not reregister). For sockets, this means sk_sleep() will return a non-NULL wait address. ep_modify() must check for events that were received and ignored _before_ ep_modify() was called. So it must call f_op->poll() to fish for events _after_ changing epi->event.events. When f_op->poll() calls tcp_poll() (and thus sock_poll_wait()), wait_address is non-NULL because the socket was already registered by epoll. Thus, ep_modify() passes a NULL pt to prevent re-registration. When ep_modify() is called, sock_poll_wait() will see a wait_address, but a NULL pt, and this caused the memory barrier to get skipped and events to be missed (this memory barrier is described in the documentation for wq_has_sleeper). This regression appeared with the change to sock_poll_wait() in commit 626cf236608505d376e4799adb4f7eb00a8594af (poll: add poll_requested_events() and poll_does_not_wait() functions) This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong <normalperson@yhbt.net> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Andreas Voellmy <andreas.voellmy@yale.edu> Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu> Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org --- include/net/sock.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index c945fba..1923e48 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1925,8 +1925,9 @@ static inline bool wq_has_sleeper(struct socket_wq *wq) static inline void sock_poll_wait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p) { - if (!poll_does_not_wait(p) && wait_address) { - poll_wait(filp, wait_address, p); + if (wait_address) { + if (!poll_does_not_wait(p)) + poll_wait(filp, wait_address, p); /* We need to be sure we are in sync with the * socket flags modification. *