From patchwork Fri Jun 14 18:33:48 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 251506 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 13FA72C008A for ; Sat, 15 Jun 2013 04:35:48 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYqc-0000Bb-43; Fri, 14 Jun 2013 18:35:42 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYoq-0007pY-PL for kernel-team@lists.ubuntu.com; Fri, 14 Jun 2013 18:33:52 +0000 Received: from c-67-160-231-42.hsd1.ca.comcast.net ([67.160.231.42] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1UnYoo-00022S-Nx; Fri, 14 Jun 2013 18:33:51 +0000 Received: from kamal by fourier with local (Exim 4.80) (envelope-from ) id 1UnYom-00007A-MV; Fri, 14 Jun 2013 11:33:48 -0700 From: Kamal Mostafa To: Rafael Aquini Subject: [ 3.8.y.z extended stable ] Patch "swap: avoid read_swap_cache_async() race to deadlock while waiting on" has been added to staging queue Date: Fri, 14 Jun 2013 11:33:48 -0700 Message-Id: <1371234828-410-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.8.1.2 X-Extended-Stable: 3.8 Cc: Linus Torvalds , Kamal Mostafa , Hugh Dickins , kernel-team@lists.ubuntu.com, KOSAKI Motohiro , Johannes Weiner , Andrew Morton , Shaohua Li X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled swap: avoid read_swap_cache_async() race to deadlock while waiting on to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue This patch is scheduled to be released in version 3.8.13.3. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.8.y.z tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From 8e51f2012e0023450abbba1ea61416687bb5cf4c Mon Sep 17 00:00:00 2001 From: Rafael Aquini Date: Wed, 12 Jun 2013 14:04:49 -0700 Subject: swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion commit cbab0e4eec299e9059199ebe6daf48730be46d2b upstream. read_swap_cache_async() can race against get_swap_page(), and stumble across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the swapcache yet. This transient swap_map state is expected to be transitory, but the actual placement of discard at scan_swap_map() inserts a wait for I/O completion thus making the thread at read_swap_cache_async() to loop around its -EEXIST case, while the other end at get_swap_page() is scheduled away at scan_swap_map(). This can leave the system deadlocked if the I/O completion happens to be waiting on the CPU waitqueue where read_swap_cache_async() is busy looping and !CONFIG_PREEMPT. This patch introduces a cond_resched() call to make the aforementioned read_swap_cache_async() busy loop condition to bail out when necessary, thus avoiding the subtle race window. Signed-off-by: Rafael Aquini Acked-by: Johannes Weiner Acked-by: KOSAKI Motohiro Acked-by: Hugh Dickins Cc: Shaohua Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Kamal Mostafa --- mm/swap_state.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) -- 1.8.1.2 diff --git a/mm/swap_state.c b/mm/swap_state.c index 0cb36fb..f854fbd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -314,8 +314,24 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * Swap entry may have been freed since our caller observed it. */ err = swapcache_prepare(entry); - if (err == -EEXIST) { /* seems racy */ + if (err == -EEXIST) { radix_tree_preload_end(); + /* + * We might race against get_swap_page() and stumble + * across a SWAP_HAS_CACHE swap_map entry whose page + * has not been brought into the swapcache yet, while + * the other end is scheduled away waiting on discard + * I/O completion at scan_swap_map(). + * + * In order to avoid turning this transitory state + * into a permanent loop around this -EEXIST case + * if !CONFIG_PREEMPT and the I/O completion happens + * to be waiting on the CPU waitqueue where we are now + * busy looping, we just conditionally invoke the + * scheduler here, if there are some more important + * tasks to run. + */ + cond_resched(); continue; } if (err) { /* swp entry is obsolete ? */