From patchwork Mon Apr 23 01:43:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 902723 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40Tq49691pz9s0p for ; Mon, 23 Apr 2018 11:43:37 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nSIlxOqG"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40Tq494LFTzF24w for ; Mon, 23 Apr 2018 11:43:37 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nSIlxOqG"; dkim-atps=neutral X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::242; helo=mail-pf0-x242.google.com; envelope-from=bsingharora@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nSIlxOqG"; dkim-atps=neutral Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com [IPv6:2607:f8b0:400e:c00::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40Tq423HyQzF22Z for ; Mon, 23 Apr 2018 11:43:30 +1000 (AEST) Received: by mail-pf0-x242.google.com with SMTP id q9so7875642pff.1 for ; Sun, 22 Apr 2018 18:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=7Ypzg/B7pU1BhWFuFEKSAFQyk8OLd1Fs+4hS0k7enqQ=; b=nSIlxOqGcJTzJLyUEP6C7nbVdjVyqtmXpJhyqmF/JCoVU95x9963NItI2dC+xPnr7Y uVxixsWaqrR7293/3PCFwhOP08N1+XkfcdHtyfHVUqEagoDByK3kB7osjgT9bSkyR25f On4GoCN/NkOflHRzsSWf2ZzbQbVGvFQu2Xk7EqKFoT9Su7CmwZdlVym3pmceMq0tBqY7 WXhgd829TDGofrW+byIV+jci0lX3HwD9qR0noHZTDmuaazdZPP9UcgHhqLJKCrRo5vVk OUlsJSLOGfMa1baeoSgX7/dAyC9aTC/jMVeUyaNiGfCYMstoYbkb2rEio5GsqIy3QFG3 37zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=7Ypzg/B7pU1BhWFuFEKSAFQyk8OLd1Fs+4hS0k7enqQ=; b=fzITyy/rXolUmODnt5HlxqF0d4LI1OCRGNyjSkQb0kcVEgKID/69lbwMabNMAcRwDf /+X3Nvmtfc9583WHppZRfdp9kwM6qg4BmDte9tLxpRQ/2hu6Ve/Kl9V19aE6SPmKFc2d cQELoh6RXJu6RTjPTKFK8msf0F6jY0Qm+p8FF26IeVeprBFe71IycvgRuKgybdIXSN1n KUL4616oy1KNQNC/0Q/Kg7ZOHan9Kfvvi0qGoZDX2USFrwN2slGCnsvgw9U6UujZYphH Ues1+2Ss21KhNMjf/A/f+jQUT7n4eHLI8sCj/fcB51y8zo5WHT7zWh7+TmMI0QnDmktC VOeQ== X-Gm-Message-State: ALQs6tAU4NzQWABifiZZs/VLwydnetqeuvQ+MlunM9U7dfB7kYqaoSe4 MuZfoCsc4L9yCHE+qe5NfolbSlgZ X-Google-Smtp-Source: AIpwx4/KQSVEQMNC3lSMyKZnFt6PZcsAbJd4Vbh740tWhfHkH7qOl/Xxp87OcRY6cxvbAgyhSqJBIA== X-Received: by 10.98.33.151 with SMTP id o23mr14695517pfj.202.1524447807801; Sun, 22 Apr 2018 18:43:27 -0700 (PDT) Received: from localhost.au.ibm.com (14-202-194-140.static.tpgi.com.au. [14.202.194.140]) by smtp.googlemail.com with ESMTPSA id v12sm12995961pgo.78.2018.04.22.18.43.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 22 Apr 2018 18:43:27 -0700 (PDT) From: Balbir Singh To: skiboot@lists.ozlabs.org, alistair@popple.id.au Date: Mon, 23 Apr 2018 11:43:12 +1000 Message-Id: <20180423014312.26060-1-bsingharora@gmail.com> X-Mailer: git-send-email 2.13.6 Subject: [Skiboot] [PATCH] npu2/hw-procedures: fence bricks on GPU reset X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stewart@linux.vnet.ibm.com, arbab@linux.vnet.ibm.com MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" The NPU workbook defines a way of fencing a brick and getting the brick out of fence state. We do have an implementation of bringing the brick out of fenced/quiesced state. We do the latter in our procedures, but to support run time reset we need to do the former. The fencing ensures that access to memory behind the links will not lead to HMI's, but instead SUE's will be populated in cache (in the case of speculation). The expectation is then that prior to and after reset, the operating system components will flush the cache for the region of memory behind the GPU. This patch does the following: 1. Implements a npu2_dev_fence_brick() function to set/clear fence state 2. Clear FIR bits prior to clearing the fence status 3. Clear's the fence status 4. We take the powerbus out of CQ fence much later now, in credits_check() which is the last hardware procedure called after link training. Signed-off-by: Balbir Singh Reviewed-By: Alistair Popple --- Notes for reviewer - Clearing FIR bits, will clear full NPU FIR, but I don't think it's a problem, any major link or powerbus issues will retrigger back. We don't do a whole lot of mitigation in our HMI handling, just reporting, so we can't we papering over a problem from what I can see. - I've tested this on a 4 GPU box with several reset cycles over a couple of days hw/npu2-hw-procedures.c | 52 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/hw/npu2-hw-procedures.c b/hw/npu2-hw-procedures.c index 9e4a4316..e25b85c5 100644 --- a/hw/npu2-hw-procedures.c +++ b/hw/npu2-hw-procedures.c @@ -232,6 +232,26 @@ static bool poll_fence_status(struct npu2_dev *ndev, uint64_t val) return false; } +static int64_t npu2_dev_fence_brick(struct npu2_dev *ndev, bool set) +{ + /* + * Add support for queisce/fence the brick at + * procedure reset time. + */ + uint32_t brick; + uint64_t val; + + brick = ndev->index; + if (set) + brick += 6; + + val = PPC_BIT(brick); + NPU2DEVINF(ndev, "%s fence brick %d, val %llx\n", set ? "set" : "clear", + ndev->index, val); + npu2_write(ndev->npu, NPU2_MISC_FENCE_STATE, val); + return 0; +} + /* Procedure 1.2.1 - Reset NPU/NDL */ uint32_t reset_ntl(struct npu2_dev *ndev) { @@ -288,19 +308,28 @@ static uint32_t reset_ndl(struct npu2_dev *ndev) static uint32_t reset_ntl_release(struct npu2_dev *ndev) { uint64_t val; + uint64_t npu2_fir; + uint64_t npu2_fir_addr; + int i; - val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev)); - val &= 0xFFBFFFFFFFFFFFFF; - npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val); + /* Clear FIR bits */ + npu2_fir_addr = NPU2_FIR_REGISTER_0; + npu2_fir = 0; - if (!poll_fence_status(ndev, 0x8000000000000000)) - return PROCEDURE_COMPLETE | PROCEDURE_FAILED; + for (i = 0; i < NPU2_TOTAL_FIR_REGISTERS; i++) { + npu2_write(ndev->npu, npu2_fir_addr, npu2_fir); + npu2_fir_addr += NPU2_FIR_OFFSET; + + } + + /* Release the fence */ + npu2_dev_fence_brick(ndev, false); val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev)); - val &= 0xFF3FFFFFFFFFFFFF; + val &= 0xFFBFFFFFFFFFFFFF; npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val); - if (!poll_fence_status(ndev, 0x0)) + if (!poll_fence_status(ndev, 0x8000000000000000)) return PROCEDURE_COMPLETE | PROCEDURE_FAILED; return PROCEDURE_NEXT; @@ -718,6 +747,7 @@ static uint32_t check_credit(struct npu2_dev *ndev, uint64_t reg, static uint32_t check_credits(struct npu2_dev *ndev) { int fail = 0; + uint64_t val; fail += CHECK_CREDIT(ndev, NPU2_NTL_CRED_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); fail += CHECK_CREDIT(ndev, NPU2_NTL_RSP_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); @@ -728,6 +758,13 @@ static uint32_t check_credits(struct npu2_dev *ndev) assert(!fail); + val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev)); + val &= 0xFF3FFFFFFFFFFFFF; + npu2_write(ndev->npu, NPU2_NTL_MISC_CFG1(ndev), val); + + if (!poll_fence_status(ndev, 0x0)) + return PROCEDURE_COMPLETE | PROCEDURE_FAILED; + return PROCEDURE_COMPLETE; } DEFINE_PROCEDURE(check_credits); @@ -885,5 +922,6 @@ int64_t npu2_dev_procedure(void *dev, struct pci_cfg_reg_filter *pcrf, void npu2_dev_procedure_reset(struct npu2_dev *dev) { + npu2_dev_fence_brick(dev, true); npu2_clear_link_flag(dev, NPU2_DEV_DL_RESET); }