From patchwork Tue May 31 11:39:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vipin K Parashar X-Patchwork-Id: 628160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rJs3n57v0z9t0t for ; Tue, 31 May 2016 21:40:05 +1000 (AEST) Received: from ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3rJs3n4HMtzDqdr for ; Tue, 31 May 2016 21:40:05 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3rJs3h0xp2zDqc6 for ; Tue, 31 May 2016 21:39:59 +1000 (AEST) Received: from pps.filterd (m0049461.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.11/8.16.0.11) with SMTP id u4VBdR4h016885 for ; Tue, 31 May 2016 07:39:57 -0400 Message-Id: <201605311139.u4VBdR4h016885@mx0a-001b2d01.pphosted.com> Received: from e23smtp04.au.ibm.com (e23smtp04.au.ibm.com [202.81.31.146]) by mx0a-001b2d01.pphosted.com with ESMTP id 2395rrf1ys-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 31 May 2016 07:39:57 -0400 Received: from localhost by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 31 May 2016 21:39:54 +1000 Received: from d23dlp03.au.ibm.com (202.81.31.214) by e23smtp04.au.ibm.com (202.81.31.210) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 31 May 2016 21:39:53 +1000 X-IBM-Helo: d23dlp03.au.ibm.com X-IBM-MailFrom: vipin@linux.vnet.ibm.com X-IBM-RcptTo: skiboot@lists.ozlabs.org Received: from d23relay10.au.ibm.com (d23relay10.au.ibm.com [9.190.26.77]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 33C493578052 for ; Tue, 31 May 2016 21:39:52 +1000 (EST) Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay10.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4VBdilj41156796 for ; Tue, 31 May 2016 21:39:52 +1000 Received: from d23av01.au.ibm.com (localhost [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u4VBdJC0003683 for ; Tue, 31 May 2016 21:39:19 +1000 Received: from Thinkpad420.in.ibm.com ([9.124.210.99]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u4VBdH3Z003297; Tue, 31 May 2016 21:39:18 +1000 From: Vipin K Parashar To: skiboot@lists.ozlabs.org Date: Tue, 31 May 2016 17:09:01 +0530 X-Mailer: git-send-email 2.1.4 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16053111-0012-0000-0000-00000199B628 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16053111-0013-0000-0000-0000054F2586 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-05-31_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=13 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1605310136 Subject: [Skiboot] [PATCH v3] hw/xscom: Reset XSCOM engine after finite number of retries when busy X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" OPAL retries XSCOM read/write operations forever till it succeeds. This can cause XSCOM ops to hang forever when XSCOM engine remains busy for some reason. Changed it to retry XSCOM operations only XSCOM_BUSY_MAX_RETRIES number of times instead of retrying forever. Also added logic to reset XSCOM engine after XSCOM_BUSY_RESET_THRESHOLD number of retries to unblock it when it remains busy. Cc: stable # 9c2d82394fd2 ("xscom: Return OPAL_WRONG_STATE on XSCOM ops..") Signed-off-by: Vipin K Parashar Signed-off-by: Vaidyanathan Srinivasan --- Changes in v3: - Added delay of 10ms after XSCOM engine reset when XSCOM is found busy. - Modified 'if' condition to check return value of xscom_handle_error(). Changes in v2: - Changed newly added macro names to slightly more intuitive names. Used XSCOM_BUSY_MAX_RETRIES to signify total retries allowed if XSCOM remains busy and XSCOM_BUSY_RESET_THRESHOLD to hold threshold count for resetting XSCOM before retrying XSCOM operation again. hw/xscom.c | 76 ++++++++++++++++++++++++++++++++++++++++++------------ include/errorlog.h | 1 + include/xscom.h | 6 +++++ 3 files changed, 66 insertions(+), 17 deletions(-) diff --git a/hw/xscom.c b/hw/xscom.c index 84f72f5..2649a50 100644 --- a/hw/xscom.c +++ b/hw/xscom.c @@ -23,6 +23,7 @@ #include #include #include +#include /* Mask of bits to clear in HMER before an access */ #define HMER_CLR_MASK (~(SPR_HMER_XSCOM_FAIL | \ @@ -41,6 +42,10 @@ DEFINE_LOG_ENTRY(OPAL_RC_XSCOM_RESET, OPAL_PLATFORM_ERR_EVT, OPAL_XSCOM, OPAL_CEC_HARDWARE, OPAL_PREDICTIVE_ERR_GENERAL, OPAL_NA); +DEFINE_LOG_ENTRY(OPAL_RC_XSCOM_BUSY, OPAL_PLATFORM_ERR_EVT, OPAL_XSCOM, + OPAL_CEC_HARDWARE, OPAL_PREDICTIVE_ERR_GENERAL, + OPAL_NA); + /* xscom details to trigger xstop */ static struct { uint64_t addr; @@ -118,18 +123,46 @@ static void xscom_reset(uint32_t gcid) */ } -static int xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, - bool is_write) +static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr, + bool is_write, int64_t retries) { + struct timespec ts; unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer); /* XXX Figure out error codes from doc and error * recovery procedures */ switch(stat) { - /* XSCOM blocked, just retry */ + /* + * XSCOM engine is blocked, need to retry. Reset XSCOM engine + * after crossing retry threshold before retrying again. + */ case 1: + if (retries && !(retries % XSCOM_BUSY_RESET_THRESHOLD)) { + prlog(PR_NOTICE, "XSCOM: Busy even after %d retries, " + "resetting XSCOM now. Total retries = %lld\n", + XSCOM_BUSY_RESET_THRESHOLD, retries); + xscom_reset(gcid); + + /* + * Its observed that sometimes immediate retry of + * XSCOM operation returns wrong data. Adding a + * delay for XSCOM reset to be effective. Delay of + * 10 ms is found to be working fine experimentally. + */ + ts.tv_sec = 0; + ts.tv_nsec = 10 * 1000; + nanosleep_nopoll(&ts, NULL); + } + + /* Log error if we have retried enough and its still busy */ + if (retries == XSCOM_BUSY_MAX_RETRIES) + log_simple_error(&e_info(OPAL_RC_XSCOM_BUSY), + "XSCOM: %s-busy error gcid=0x%x pcb_addr=0x%x " + "stat=0x%x\n", is_write ? "write" : "read", + gcid, pcb_addr, stat); return OPAL_BUSY; + /* CPU is asleep, don't retry */ case 2: return OPAL_WRONG_STATE; @@ -177,15 +210,16 @@ static bool xscom_gcid_ok(uint32_t gcid) */ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) { + int i; uint64_t hmer; - int64_t ret; + int64_t ret, retries = 0; if (!xscom_gcid_ok(gcid)) { prerror("%s: invalid XSCOM gcid 0x%x\n", __func__, gcid); return OPAL_PARAMETER; } - for (;;) { + for (i = 0; i <= XSCOM_BUSY_MAX_RETRIES; i++) { /* Clear status bits in HMER (HMER is special * writing to it *ands* bits */ @@ -199,27 +233,31 @@ static int __xscom_read(uint32_t gcid, uint32_t pcb_addr, uint64_t *val) /* Check for error */ if (!(hmer & SPR_HMER_XSCOM_FAIL)) - break; + return OPAL_SUCCESS; /* Handle error and possibly eventually retry */ - ret = xscom_handle_error(hmer, gcid, pcb_addr, false); - if (ret == OPAL_HARDWARE || ret == OPAL_WRONG_STATE) - return ret; + ret = xscom_handle_error(hmer, gcid, pcb_addr, false, retries); + if (ret != OPAL_BUSY) + break; + retries++; } - return OPAL_SUCCESS; + + prerror("XSCOM: Read failed, ret = %lld\n", ret); + return ret; } static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) { + int i; uint64_t hmer; - int64_t ret; + int64_t ret, retries = 0; if (!xscom_gcid_ok(gcid)) { prerror("%s: invalid XSCOM gcid 0x%x\n", __func__, gcid); return OPAL_PARAMETER; } - for (;;) { + for (i = 0; i <= XSCOM_BUSY_MAX_RETRIES; i++) { /* Clear status bits in HMER (HMER is special * writing to it *ands* bits */ @@ -233,14 +271,18 @@ static int __xscom_write(uint32_t gcid, uint32_t pcb_addr, uint64_t val) /* Check for error */ if (!(hmer & SPR_HMER_XSCOM_FAIL)) - break; + return OPAL_SUCCESS; /* Handle error and possibly eventually retry */ - ret = xscom_handle_error(hmer, gcid, pcb_addr, true); - if (ret == OPAL_HARDWARE || ret == OPAL_WRONG_STATE) - return ret; + ret = xscom_handle_error(hmer, gcid, pcb_addr, true, retries); + if (ret == OPAL_BUSY) + retries++; + else + break; } - return OPAL_SUCCESS; + + prerror("XSCOM: Write failed, ret = %lld\n", ret); + return ret; } /* diff --git a/include/errorlog.h b/include/errorlog.h index ed90dab..214aed2 100644 --- a/include/errorlog.h +++ b/include/errorlog.h @@ -275,6 +275,7 @@ enum opal_reasoncode { OPAL_RC_XSCOM_RW = OPAL_XS | 0x10, OPAL_RC_XSCOM_INDIRECT_RW = OPAL_XS | 0x11, OPAL_RC_XSCOM_RESET = OPAL_XS | 0x12, + OPAL_RC_XSCOM_BUSY = OPAL_XS | 0x13, /* PCI */ OPAL_RC_PCI_INIT_SLOT = OPAL_PC | 0x10, OPAL_RC_PCI_ADD_SLOT = OPAL_PC | 0x11, diff --git a/include/xscom.h b/include/xscom.h index 933af6a..1aee40e 100644 --- a/include/xscom.h +++ b/include/xscom.h @@ -167,6 +167,12 @@ /* HB folks say: try 10 time for now */ #define XSCOM_IND_MAX_RETRIES 10 +/* Max number of retries when XSCOM remains busy */ +#define XSCOM_BUSY_MAX_RETRIES 3000 + +/* Retry count after which to reset XSCOM, if still busy */ +#define XSCOM_BUSY_RESET_THRESHOLD 1000 + /* * Error handling: *