From patchwork Tue Jan 1 23:56:05 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Wong X-Patchwork-Id: 208964 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 18C6C2C008D for ; Wed, 2 Jan 2013 10:56:29 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752272Ab3AAX4I (ORCPT ); Tue, 1 Jan 2013 18:56:08 -0500 Received: from dcvr.yhbt.net ([64.71.152.64]:36584 "EHLO dcvr.yhbt.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751637Ab3AAX4G (ORCPT ); Tue, 1 Jan 2013 18:56:06 -0500 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 583633ED8A; Tue, 1 Jan 2013 23:56:05 +0000 (UTC) Date: Tue, 1 Jan 2013 23:56:05 +0000 From: Eric Wong To: Linus Torvalds Cc: Eric Dumazet , Linux Kernel Mailing List , Hans Verkuil , Jiri Olsa , Jonathan Corbet , Al Viro , Davide Libenzi , Hans de Goede , Mauro Carvalho Chehab , David Miller , Andrew Morton , Andreas Voellmy , "Junchang(Jason) Wang" , Network Development , linux-fsdevel Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD Message-ID: <20130101235605.GA17168@dcvr.yhbt.net> References: <20121228014503.GA5017@dcvr.yhbt.net> <1356960060-1263-1-git-send-email-normalperson@yhbt.net> <1357065750.21409.12527.camel@edumazet-glaptop> <20130101210033.GA13255@dcvr.yhbt.net> <20130101211728.GA13380@dcvr.yhbt.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Linus Torvalds wrote: > Please document the barrier that this mb() pairs with, and then give > an explanation for the fix in the commit message, and I'll happily > take it. Even if it's just duplicating the comments above the > wq_has_sleeper() function, except modified for the ep_modify() case. Hopefully my explanation is correct and makes sense below, I think both effects of the barrier are needed > Of course, it would be good to get verification from Jason and Andreas > that the alternate patch also works for them. Jason just confirmed it. ------------------------------- 8< ---------------------------- From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Tue, 1 Jan 2013 21:20:27 +0000 Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong Cc: Hans Verkuil Cc: Jiri Olsa Cc: Jonathan Corbet Cc: Al Viro Cc: Davide Libenzi Cc: Hans de Goede Cc: Mauro Carvalho Chehab Cc: David Miller Cc: Eric Dumazet Cc: Andrew Morton Cc: Linus Torvalds Cc: Andreas Voellmy Tested-by: "Junchang(Jason) Wang" Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org --- fs/eventpoll.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index cd96649..39573ee 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even * otherwise we might miss an event that happens between the * f_op->poll() call and the new event set registering. */ - epi->event.events = event->events; + epi->event.events = event->events; /* need barrier below */ pt._key = event->events; epi->event.data = event->data; /* protected by mtx */ if (epi->event.events & EPOLLWAKEUP) { @@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even } /* + * The following barrier has two effects: + * + * 1) Flush epi changes above to other CPUs. This ensures + * we do not miss events from ep_poll_callback if an + * event occurs immediately after we call f_op->poll(). + * We need this because we did not take ep->lock while + * changing epi above (but ep_poll_callback does take + * ep->lock). + * + * 2) We also need to ensure we do not miss _past_ events + * when calling f_op->poll(). This barrier also + * pairs with the barrier in wq_has_sleeper (see + * comments for wq_has_sleeper). + * + * This barrier will now guarantee ep_poll_callback or f_op->poll + * (or both) will notice the readiness of an item. + */ + smp_mb(); + + /* * Get current event bits. We can safely use the file* here because * its usage count has been increased by the caller of this function. */