From patchwork Wed Apr 11 02:34:19 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 151723 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 02932B7027 for ; Wed, 11 Apr 2012 12:18:43 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760013Ab2DKCSl (ORCPT ); Tue, 10 Apr 2012 22:18:41 -0400 Received: from mga02.intel.com ([134.134.136.20]:35494 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754958Ab2DKCSl (ORCPT ); Tue, 10 Apr 2012 22:18:41 -0400 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 10 Apr 2012 19:18:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.67,352,1309762800"; d="scan'208";a="127560285" Received: from dwillia2-linux.jf.intel.com ([10.23.45.110]) by orsmga001.jf.intel.com with ESMTP; 10 Apr 2012 19:18:40 -0700 Received: from dwillia2-linux.jf.intel.com (localhost.localdomain [IPv6:::1]) by dwillia2-linux.jf.intel.com (Postfix) with ESMTP id BC3F880002; Tue, 10 Apr 2012 19:34:19 -0700 (PDT) Subject: [PATCH] scsi: fix eh wakeup (scsi_schedule_eh vs scsi_restart_operations) To: linux-scsi@vger.kernel.org From: Dan Williams Cc: Tejun Heo , linux-ide@vger.kernel.org, Tom Jackson Date: Tue, 10 Apr 2012 19:34:19 -0700 Message-ID: <20120411023054.791.37365.stgit@dwillia2-linux.jf.intel.com> User-Agent: StGit/0.16-1-g7004 MIME-Version: 1.0 Sender: linux-ide-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ide@vger.kernel.org Rapid ata hotplug on a libsas controller results in cases where libsas is waiting indefinitely on eh to perform an ata probe. A race exists between scsi_schedule_eh() and scsi_restart_operations() in the case when scsi_restart_operations() issues i/o to other devices in the sas domain. When this happens the host state transitions from SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and ->host_busy is non-zero so we put the eh thread to sleep even though ->host_eh_scheduled is active. Before putting the error handler to sleep we need to check if the host_state needs to return to SHOST_RECOVERY for another trip through eh. Cc: Tejun Heo Reported-by: Tom Jackson Tested-by: Tom Jackson Signed-off-by: Dan Williams --- More host_eh_scheduled fallout that I would like to roll into a libsas-fixes pull request along with the other fixes that were pending when the merge window closed. I'll resend the entire patch series when posting the branch, but wanted to call this fix out separately since it is new. -- Dan drivers/scsi/scsi_error.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 2cfcbff..0945d47 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -1687,6 +1687,20 @@ static void scsi_restart_operations(struct Scsi_Host *shost) * requests are started. */ scsi_run_host_queues(shost); + + /* + * if eh is active and host_eh_scheduled is pending we need to re-run + * recovery. we do this check after scsi_run_host_queues() to allow + * everything pent up since the last eh run a chance to make forward + * progress before we sync again. Either we'll immediately re-run + * recovery or scsi_device_unbusy() will wake us again when these + * pending commands complete. + */ + spin_lock_irqsave(shost->host_lock, flags); + if (shost->host_eh_scheduled) + if (scsi_host_set_state(shost, SHOST_RECOVERY)) + WARN_ON(scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY)); + spin_unlock_irqrestore(shost->host_lock, flags); } /**