From patchwork Wed Apr 24 07:20:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Donnellan X-Patchwork-Id: 1089984 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44psFL56zYz9s3q for ; Wed, 24 Apr 2019 17:21:42 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=au1.ibm.com Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44psFL2PPZzDqTM for ; Wed, 24 Apr 2019 17:21:42 +1000 (AEST) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=au1.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=andrew.donnellan@au1.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=au1.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44psFB3hWwzDqSn for ; Wed, 24 Apr 2019 17:21:33 +1000 (AEST) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x3O7LFOk012774 for ; Wed, 24 Apr 2019 03:21:31 -0400 Received: from e06smtp01.uk.ibm.com (e06smtp01.uk.ibm.com [195.75.94.97]) by mx0a-001b2d01.pphosted.com with ESMTP id 2s2k61gdsq-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Wed, 24 Apr 2019 03:21:30 -0400 Received: from localhost by e06smtp01.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 24 Apr 2019 08:20:21 +0100 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp01.uk.ibm.com (192.168.101.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 24 Apr 2019 08:20:17 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x3O7KHFS33358004 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Apr 2019 07:20:17 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D9BC852051; Wed, 24 Apr 2019 07:20:16 +0000 (GMT) Received: from ozlabs.au.ibm.com (unknown [9.192.253.14]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 51E885205A; Wed, 24 Apr 2019 07:20:16 +0000 (GMT) Received: from intelligence.ozlabs.ibm.com (haven.au.ibm.com [9.192.254.114]) (using TLSv1.2 with cipher DHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.au.ibm.com (Postfix) with ESMTPSA id 0053AA01DB; Wed, 24 Apr 2019 17:20:14 +1000 (AEST) From: Andrew Donnellan To: skiboot@lists.ozlabs.org Date: Wed, 24 Apr 2019 17:20:09 +1000 X-Mailer: git-send-email 2.11.0 X-TM-AS-GCONF: 00 x-cbid: 19042407-4275-0000-0000-0000032BC94B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19042407-4276-0000-0000-0000383B10BF Message-Id: <20190424072009.6205-1-andrew.donnellan@au1.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-24_05:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904240063 Subject: [Skiboot] [PATCH] hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alastair@d-silva.org MIME-Version: 1.0 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Lowest Point of Coherency (LPC) memory allows the host to access memory on an OpenCAPI device. Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for assigning and clearing the memory BAR. (We try to avoid using the term "LPC" to avoid confusion with Low Pin Count.) At present, we use a fixed location in the address space, which means we are restricted to a single range of 4TB, on a single OpenCAPI device per chip. In future, we'll use some chip ID extension magic to give us more space, and some sort of allocator to assign ranges to more than one device. Signed-off-by: Andrew Donnellan --- This code is currently being used for some internal testing of LPC memory devices and seems to work acceptably for that purpose. We haven't tested all the corner cases... this is really just intended to enable prototyping and bringup at this stage. --- hw/npu2-opencapi.c | 182 ++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/phys-map.c | 11 ++++ include/npu2-regs.h | 7 ++ include/npu2.h | 5 ++ include/opal-api.h | 4 +- include/phys-map.h | 1 + 6 files changed, 208 insertions(+), 2 deletions(-) diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c index 9df51b22eda5..b98335e48daf 100644 --- a/hw/npu2-opencapi.c +++ b/hw/npu2-opencapi.c @@ -2025,3 +2025,185 @@ static int64_t opal_npu_tl_set(uint64_t phb_id, uint32_t __unused bdfn, return OPAL_SUCCESS; } opal_call(OPAL_NPU_TL_SET, opal_npu_tl_set, 5); + +static void set_mem_bar(struct npu2_dev *dev, uint64_t base, uint64_t size) +{ + uint64_t stack, val, reg, bar_offset, pa_config_offset; + uint8_t memsel; + + stack = index_to_stack(dev->brick_index); + switch (dev->brick_index) { + case 2: + case 4: + bar_offset = NPU2_GPU0_MEM_BAR; + pa_config_offset = NPU2_CQ_CTL_MISC_PA0_CONFIG; + break; + case 3: + case 5: + bar_offset = NPU2_GPU1_MEM_BAR; + pa_config_offset = NPU2_CQ_CTL_MISC_PA1_CONFIG; + break; + default: + assert(false); + } + + /* + * Memory select configuration: + * - 0b000 - BAR disabled + * - 0b001 - match 0b00, 0b01 + * - 0b010 - match 0b01, 0b10 + * - 0b011 - match 0b00, 0b10 + * - 0b100 - match 0b00 + * - 0b101 - match 0b01 + * - 0b110 - match 0b10 + * - 0b111 - match 0b00, 0b01, 0b10 + */ + memsel = GETFIELD(PPC_BITMASK(13, 14), base); + val = SETFIELD(NPU2_MEM_BAR_EN | NPU2_MEM_BAR_SEL_MEM, 0ULL, 0b100 + memsel); + + /* Base address - 12 bits, 1G aligned */ + val = SETFIELD(NPU2_MEM_BAR_NODE_ADDR, val, GETFIELD(PPC_BITMASK(22, 33), base)); + + /* GCID */ + val = SETFIELD(NPU2_MEM_BAR_GROUP, val, GETFIELD(PPC_BITMASK(15, 18), base)); + val = SETFIELD(NPU2_MEM_BAR_CHIP, val, GETFIELD(PPC_BITMASK(19, 21), base)); + + /* Other settings */ + val = SETFIELD(NPU2_MEM_BAR_POISON, val, 1); + val = SETFIELD(NPU2_MEM_BAR_GRANULE, val, 0); + val = SETFIELD(NPU2_MEM_BAR_BAR_SIZE, val, ilog2(size >> 30)); + val = SETFIELD(NPU2_MEM_BAR_MODE, val, 0); + + for (int block = NPU2_BLOCK_SM_0; block <= NPU2_BLOCK_SM_3; block++) { + reg = NPU2_REG_OFFSET(stack, block, bar_offset); + npu2_write(dev->npu, reg, val); + } + + /* Set PA config */ + val = SETFIELD(NPU2_CQ_CTL_MISC_PA_CONFIG_MEMSELMATCH, 0ULL, 0b100 + memsel); + val = SETFIELD(NPU2_CQ_CTL_MISC_PA_CONFIG_GRANULE, val, 0); + val = SETFIELD(NPU2_CQ_CTL_MISC_PA_CONFIG_SIZE, val, ilog2(size >> 30)); + val = SETFIELD(NPU2_CQ_CTL_MISC_PA_CONFIG_MODE, val, 0); + val = SETFIELD(NPU2_CQ_CTL_MISC_PA_CONFIG_MASK, val, 0); + reg = NPU2_REG_OFFSET(stack, NPU2_BLOCK_CTL, pa_config_offset); + npu2_write(dev->npu, reg, val); +} + +static int64_t alloc_mem_bar(struct npu2_dev *dev, uint64_t size, uint64_t *bar) +{ + uint64_t phys_map_base, phys_map_size; + + /* + * Right now, we support 1 allocation per chip, of up to 4TB. + * + * In future, we will use chip address extension to support + * >4TB ranges, and we will implement a more sophisticated + * allocator to allow an allocation for every link on a chip. + */ + + if (dev->npu->lpc_mem_allocated) + return OPAL_RESOURCE; + + phys_map_get(dev->npu->chip_id, OCAPI_MEM, 0, &phys_map_base, &phys_map_size); + + if (size > phys_map_size) { + /** + * @fwts-label OCAPIInvalidLPCMemoryBARSize + * @fwts-advice The operating system requested an unsupported + * amount of OpenCAPI LPC memory. This is possibly a kernel + * bug, or you may need to upgrade your firmware. + */ + prlog(PR_ERR, + "OCAPI: Invalid LPC memory BAR allocation size requested: 0x%llx bytes (limit 0x%llx)\n", + size, phys_map_size); + return OPAL_PARAMETER; + } + + /* Minimum BAR size is 1 GB */ + if (size < (2 << 29)) { + size = 2 << 29; + } + + if (!is_pow2(size)) { + size = 2 << ilog2(size); + } + + set_mem_bar(dev, phys_map_base, size); + *bar = phys_map_base; + dev->npu->lpc_mem_allocated = dev; + + return OPAL_SUCCESS; +} + +static int64_t release_mem_bar(struct npu2_dev *dev) +{ + uint64_t stack, reg, bar_offset, pa_config_offset; + + if (dev->npu->lpc_mem_allocated != dev) + return OPAL_PARAMETER; + + stack = index_to_stack(dev->brick_index); + switch (dev->brick_index) { + case 2: + case 4: + bar_offset = NPU2_GPU0_MEM_BAR; + pa_config_offset = NPU2_CQ_CTL_MISC_PA0_CONFIG; + break; + case 3: + case 5: + bar_offset = NPU2_GPU1_MEM_BAR; + pa_config_offset = NPU2_CQ_CTL_MISC_PA1_CONFIG; + break; + default: + assert(false); + } + + for (int block = NPU2_BLOCK_SM_0; block <= NPU2_BLOCK_SM_3; block++) { + reg = NPU2_REG_OFFSET(stack, block, bar_offset); + npu2_write(dev->npu, reg, 0ull); + } + reg = NPU2_REG_OFFSET(stack, NPU2_BLOCK_CTL, pa_config_offset); + npu2_write(dev->npu, reg, 0ull); + + dev->npu->lpc_mem_allocated = NULL; + + return OPAL_SUCCESS; +} + +static int64_t opal_npu_mem_alloc(uint64_t phb_id, uint32_t __unused bdfn, + uint64_t size, uint64_t *bar) +{ + struct phb *phb = pci_get_phb(phb_id); + struct npu2_dev *dev; + + + if (!phb || phb->phb_type != phb_type_npu_v2_opencapi) + return OPAL_PARAMETER; + + dev = phb_to_npu2_dev_ocapi(phb); + if (!dev) + return OPAL_PARAMETER; + + if (!opal_addr_valid(bar)) + return OPAL_PARAMETER; + + return alloc_mem_bar(dev, size, bar); +} +opal_call(OPAL_NPU_MEM_ALLOC, opal_npu_mem_alloc, 4); + +static int64_t opal_npu_mem_release(uint64_t phb_id, uint32_t __unused bdfn) +{ + struct phb *phb = pci_get_phb(phb_id); + struct npu2_dev *dev; + + + if (!phb || phb->phb_type != phb_type_npu_v2_opencapi) + return OPAL_PARAMETER; + + dev = phb_to_npu2_dev_ocapi(phb); + if (!dev) + return OPAL_PARAMETER; + + return release_mem_bar(dev); +} +opal_call(OPAL_NPU_MEM_RELEASE, opal_npu_mem_release, 2); diff --git a/hw/phys-map.c b/hw/phys-map.c index fe949e4043ff..75836297c2f9 100644 --- a/hw/phys-map.c +++ b/hw/phys-map.c @@ -52,6 +52,17 @@ static const struct phys_map_entry phys_map_table_nimbus[] = { { GPU_MEM_4T_UP, 2, 0x0000044000000000ull, 0x0000002000000000ull }, { GPU_MEM_4T_UP, 3, 0x0000046000000000ull, 0x0000002000000000ull }, + /* + * OpenCAPI LPC Memory - single 4TB range per chip, fills + * whole second non-mirrored region. + * + * Longer term, we're going to use chip address extension to + * enable >4TB to be allocated per chip. At that point, we + * may have to find another way of assigning these ranges + * outside of phys-map. + */ + { OCAPI_MEM, 0, 0x0002000000000000ull, 0x0000040000000000ull }, + /* 0 TB offset @ MMIO 0x0006000000000000ull */ { PHB4_64BIT_MMIO, 0, 0x0006000000000000ull, 0x0000004000000000ull }, { PHB4_64BIT_MMIO, 1, 0x0006004000000000ull, 0x0000004000000000ull }, diff --git a/include/npu2-regs.h b/include/npu2-regs.h index ba10b8eaf88d..ecf47abf6c96 100644 --- a/include/npu2-regs.h +++ b/include/npu2-regs.h @@ -239,6 +239,13 @@ void npu2_scom_write(uint64_t gcid, uint64_t scom_base, #define NPU2_CQ_CTL_STATUS 0x090 #define NPU2_CQ_CTL_STATUS_BRK0_AM_FENCED PPC_BITMASK(48, 49) #define NPU2_CQ_CTL_STATUS_BRK1_AM_FENCED PPC_BITMASK(50, 51) +#define NPU2_CQ_CTL_MISC_PA0_CONFIG 0x0A0 /* or should that be CS */ +#define NPU2_CQ_CTL_MISC_PA1_CONFIG 0x0A8 /* or should that be CS */ +#define NPU2_CQ_CTL_MISC_PA_CONFIG_MEMSELMATCH PPC_BITMASK(0,2) +#define NPU2_CQ_CTL_MISC_PA_CONFIG_GRANULE PPC_BIT(3) +#define NPU2_CQ_CTL_MISC_PA_CONFIG_SIZE PPC_BITMASK(4,7) +#define NPU2_CQ_CTL_MISC_PA_CONFIG_MODE PPC_BITMASK(8,11) +#define NPU2_CQ_CTL_MISC_PA_CONFIG_MASK PPC_BITMASK(13,19) #define NPU2_CQ_C_ERR_RPT_MSG0 0x0C0 #define NPU2_CQ_C_ERR_RPT_MSG1 0x0C8 #define NPU2_CQ_C_ERR_RPT_FIRST0 0x0D0 diff --git a/include/npu2.h b/include/npu2.h index d58aab47bb30..9febf1a343ef 100644 --- a/include/npu2.h +++ b/include/npu2.h @@ -193,6 +193,11 @@ struct npu2 { struct lock i2c_lock; uint8_t i2c_pin_mode; uint8_t i2c_pin_wr_state; + /* + * Which device currently has an LPC allocation. + * Temporary as long as we only support 1 LPC alloc per chip. + */ + struct npu2_dev *lpc_mem_allocated; }; static inline struct npu2 *phb_to_npu2_nvlink(struct phb *phb) diff --git a/include/opal-api.h b/include/opal-api.h index e461c9d278c2..b0ad435539fb 100644 --- a/include/opal-api.h +++ b/include/opal-api.h @@ -227,8 +227,8 @@ #define OPAL_NPU_SET_RELAXED_ORDER 168 #define OPAL_NPU_GET_RELAXED_ORDER 169 #define OPAL_XIVE_GET_VP_STATE 170 /* Get NVT state */ -#define OPAL_NPU_RESERVED1 171 /* LPC Allocate */ -#define OPAL_NPU_RESERVED2 172 /* LPC Release */ +#define OPAL_NPU_MEM_ALLOC 171 +#define OPAL_NPU_MEM_RELEASE 172 #define OPAL_LAST 172 #define QUIESCE_HOLD 1 /* Spin all calls at entry */ diff --git a/include/phys-map.h b/include/phys-map.h index 73adda079e23..0cf48b6628af 100644 --- a/include/phys-map.h +++ b/include/phys-map.h @@ -28,6 +28,7 @@ enum phys_map_type { SYSTEM_MEM, GPU_MEM_4T_DOWN, GPU_MEM_4T_UP, + OCAPI_MEM, PHB4_64BIT_MMIO, PHB4_32BIT_MMIO, PHB4_XIVE_ESB,