From patchwork Fri Jun 14 18:33:49 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 251494 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 42A652C0079 for ; Sat, 15 Jun 2013 04:34:44 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYpZ-0008AU-8C; Fri, 14 Jun 2013 18:34:37 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UnYop-0007ou-UA for kernel-team@lists.ubuntu.com; Fri, 14 Jun 2013 18:33:52 +0000 Received: from c-67-160-231-42.hsd1.ca.comcast.net ([67.160.231.42] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1UnYop-00022e-My; Fri, 14 Jun 2013 18:33:51 +0000 Received: from kamal by fourier with local (Exim 4.80) (envelope-from ) id 1UnYon-00009I-Gg; Fri, 14 Jun 2013 11:33:49 -0700 From: Kamal Mostafa To: Alex Lyakas Subject: [ 3.8.y.z extended stable ] Patch "md/raid1: consider WRITE as successful only if at least one" has been added to staging queue Date: Fri, 14 Jun 2013 11:33:49 -0700 Message-Id: <1371234829-539-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.8.1.2 X-Extended-Stable: 3.8 Cc: NeilBrown , Kamal Mostafa , kernel-team@lists.ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled md/raid1: consider WRITE as successful only if at least one to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue This patch is scheduled to be released in version 3.8.13.3. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.8.y.z tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From c9c93a54cf3d798a789fdd181881018144a6c52b Mon Sep 17 00:00:00 2001 From: Alex Lyakas Date: Tue, 4 Jun 2013 20:42:21 +0300 Subject: md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it. commit 3056e3aec8d8ba61a0710fb78b2d562600aa2ea7 upstream. Without that fix, the following scenario could happen: - RAID1 with drives A and B; drive B was freshly-added and is rebuilding - Drive A fails - WRITE request arrives to the array. It is failed by drive A, so r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B succeeds in writing it, so the same r1_bio is marked as R1BIO_Uptodate. - r1_bio arrives to handle_write_finished, badblocks are disabled, md_error()->error() does nothing because we don't fail the last drive of raid1 - raid_end_bio_io() calls call_bio_endio() - As a result, in call_bio_endio(): if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) clear_bit(BIO_UPTODATE, &bio->bi_flags); this code doesn't clear the BIO_UPTODATE flag, and the whole master WRITE succeeds, back to the upper layer. So we returned success to the upper layer, even though we had written the data onto the rebuilding drive only. But when we want to read the data back, we would not read from the rebuilding drive, so this data is lost. [neilb - applied identical change to raid10 as well] This bug can result in lost data, so it is suitable for any -stable kernel. Signed-off-by: Alex Lyakas Signed-off-by: NeilBrown Signed-off-by: Kamal Mostafa --- drivers/md/raid1.c | 12 +++++++++++- drivers/md/raid10.c | 12 +++++++++++- 2 files changed, 22 insertions(+), 2 deletions(-) -- 1.8.1.2 diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6af167f..8430c0d 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -427,7 +427,17 @@ static void raid1_end_write_request(struct bio *bio, int error) r1_bio->bios[mirror] = NULL; to_put = bio; - set_bit(R1BIO_Uptodate, &r1_bio->state); + /* + * Do not set R1BIO_Uptodate if the current device is + * rebuilding or Faulty. This is because we cannot use + * such device for properly reading the data back (we could + * potentially use it, if the current write would have felt + * before rdev->recovery_offset, but for simplicity we don't + * check this here. + */ + if (test_bit(In_sync, &conf->mirrors[mirror].rdev->flags) && + !test_bit(Faulty, &conf->mirrors[mirror].rdev->flags)) + set_bit(R1BIO_Uptodate, &r1_bio->state); /* Maybe we can clear some bad blocks. */ if (is_badblock(conf->mirrors[mirror].rdev, diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 61ab219..0d01e03 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -475,7 +475,17 @@ static void raid10_end_write_request(struct bio *bio, int error) sector_t first_bad; int bad_sectors; - set_bit(R10BIO_Uptodate, &r10_bio->state); + /* + * Do not set R10BIO_Uptodate if the current device is + * rebuilding or Faulty. This is because we cannot use + * such device for properly reading the data back (we could + * potentially use it, if the current write would have felt + * before rdev->recovery_offset, but for simplicity we don't + * check this here. + */ + if (test_bit(In_sync, &rdev->flags) && + !test_bit(Faulty, &rdev->flags)) + set_bit(R10BIO_Uptodate, &r10_bio->state); /* Maybe we can clear some bad blocks. */ if (is_badblock(rdev,