diff mbox series

[v2] npu2-opencapi: don't fence on masked XSL errors

Message ID 20200108153350.4724-1-fbarrat@linux.ibm.com
State Accepted
Headers show
Series [v2] npu2-opencapi: don't fence on masked XSL errors | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (d75e82dbfbb9443efeb3f9a5921ac23605aab469)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Frederic Barrat Jan. 8, 2020, 3:33 p.m. UTC
An upcoming change in the initfile is going to modify the default
action and fence behavior of some of the NPU FIR2 bits. We're already
overriding the settings of most of those. The one exception is for
bits 41 and 42, which are XSL errors impacting 2 links that we
mask (instead we rely on the subsequent OTL error, which is per link).

The new initfile will fence-on-error for bits 41 and 42. And even if
the FIRs are masked, the NPU logic could fence the links, which is not
what we want. So this patch makes sure we don't fence on the FIRs we
want to ignore. It has no effect on existing firmware.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
---
Changelog:
v2: add comment and use macro for the xsl bits we ignore (Andrew)

 hw/npu2-opencapi.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Andrew Donnellan Jan. 16, 2020, 8:44 a.m. UTC | #1
On 9/1/20 2:33 am, Frederic Barrat wrote:
> An upcoming change in the initfile is going to modify the default
> action and fence behavior of some of the NPU FIR2 bits. We're already
> overriding the settings of most of those. The one exception is for
> bits 41 and 42, which are XSL errors impacting 2 links that we
> mask (instead we rely on the subsequent OTL error, which is per link).
> 
> The new initfile will fence-on-error for bits 41 and 42. And even if
> the FIRs are masked, the NPU logic could fence the links, which is not
> what we want. So this patch makes sure we don't fence on the FIRs we
> want to ignore. It has no effect on existing firmware.
> 
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

> ---
> Changelog:
> v2: add comment and use macro for the xsl bits we ignore (Andrew)
> 
>   hw/npu2-opencapi.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
> index ed6650f4..07e81d23 100644
> --- a/hw/npu2-opencapi.c
> +++ b/hw/npu2-opencapi.c
> @@ -1649,7 +1649,7 @@ static int enable_interrupts(struct npu2 *p)
>   	 *   the systems, since we can just fence the brick and keep
>   	 *   the system alive.
>   	 * - the exception to the above is 2 FIRs for XSL errors
> -	 *   resulting of bad AFU behavior, for which we don't want to
> +	 *   resulting from bad AFU behavior, for which we don't want to
>   	 *   checkstop but can't configure to send an error interrupt
>   	 *   either, as the XSL errors are reported on 2 links (the
>   	 *   XSL is shared between 2 links). Instead, we mask
> @@ -1661,7 +1661,8 @@ static int enable_interrupts(struct npu2 *p)
>   	 */
>   	xsl_fault = PPC_BIT(0) | PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3);
>   	xstop_override = 0x0FFFEFC00F91B000;
> -	xsl_mask = PPC_BIT(41) | PPC_BIT(42);
> +	xsl_mask = NPU2_CHECKSTOP_REG2_XSL_XLAT_REQ_WHILE_SPAP_INVALID |
> +		   NPU2_CHECKSTOP_REG2_XSL_INVALID_PEE;
>   
>   	xscom_read(p->chip_id, p->xscom_base + NPU2_MISC_FIR2_MASK, &reg);
>   	reg |= xsl_fault | xstop_override | xsl_mask;
> @@ -1677,10 +1678,16 @@ static int enable_interrupts(struct npu2 *p)
>   	 * Make sure the brick is fenced on those errors.
>   	 * Fencing is incompatible with freezing, but there's no
>   	 * freeze defined for FIR2, so we don't have to worry about it
> +	 *
> +	 * For the 2 XSL bits we ignore, we need to make sure they
> +	 * don't fence the link, as the NPU logic could allow it even
> +	 * when masked.
>   	 */
>   	reg = npu2_scom_read(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
>   			     NPU2_MISC_DA_LEN_8B);
>   	reg |= xstop_override;
> +	reg &= ~NPU2_CHECKSTOP_REG2_XSL_XLAT_REQ_WHILE_SPAP_INVALID;
> +	reg &= ~NPU2_CHECKSTOP_REG2_XSL_INVALID_PEE;
>   	npu2_scom_write(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
>   			NPU2_MISC_DA_LEN_8B, reg);
>   
>
Oliver O'Halloran Feb. 3, 2020, 1:45 a.m. UTC | #2
On Thu, Jan 9, 2020 at 2:34 AM Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
> An upcoming change in the initfile is going to modify the default
> action and fence behavior of some of the NPU FIR2 bits. We're already
> overriding the settings of most of those. The one exception is for
> bits 41 and 42, which are XSL errors impacting 2 links that we
> mask (instead we rely on the subsequent OTL error, which is per link).
>
> The new initfile will fence-on-error for bits 41 and 42. And even if
> the FIRs are masked, the NPU logic could fence the links, which is not
> what we want. So this patch makes sure we don't fence on the FIRs we
> want to ignore. It has no effect on existing firmware.
>
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> ---
> Changelog:
> v2: add comment and use macro for the xsl bits we ignore (Andrew)

Thanks, merged as 09478eaeef8dc272586a29190d58f47b50ec821b
diff mbox series

Patch

diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
index ed6650f4..07e81d23 100644
--- a/hw/npu2-opencapi.c
+++ b/hw/npu2-opencapi.c
@@ -1649,7 +1649,7 @@  static int enable_interrupts(struct npu2 *p)
 	 *   the systems, since we can just fence the brick and keep
 	 *   the system alive.
 	 * - the exception to the above is 2 FIRs for XSL errors
-	 *   resulting of bad AFU behavior, for which we don't want to
+	 *   resulting from bad AFU behavior, for which we don't want to
 	 *   checkstop but can't configure to send an error interrupt
 	 *   either, as the XSL errors are reported on 2 links (the
 	 *   XSL is shared between 2 links). Instead, we mask
@@ -1661,7 +1661,8 @@  static int enable_interrupts(struct npu2 *p)
 	 */
 	xsl_fault = PPC_BIT(0) | PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3);
 	xstop_override = 0x0FFFEFC00F91B000;
-	xsl_mask = PPC_BIT(41) | PPC_BIT(42);
+	xsl_mask = NPU2_CHECKSTOP_REG2_XSL_XLAT_REQ_WHILE_SPAP_INVALID |
+		   NPU2_CHECKSTOP_REG2_XSL_INVALID_PEE;
 
 	xscom_read(p->chip_id, p->xscom_base + NPU2_MISC_FIR2_MASK, &reg);
 	reg |= xsl_fault | xstop_override | xsl_mask;
@@ -1677,10 +1678,16 @@  static int enable_interrupts(struct npu2 *p)
 	 * Make sure the brick is fenced on those errors.
 	 * Fencing is incompatible with freezing, but there's no
 	 * freeze defined for FIR2, so we don't have to worry about it
+	 *
+	 * For the 2 XSL bits we ignore, we need to make sure they
+	 * don't fence the link, as the NPU logic could allow it even
+	 * when masked.
 	 */
 	reg = npu2_scom_read(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
 			     NPU2_MISC_DA_LEN_8B);
 	reg |= xstop_override;
+	reg &= ~NPU2_CHECKSTOP_REG2_XSL_XLAT_REQ_WHILE_SPAP_INVALID;
+	reg &= ~NPU2_CHECKSTOP_REG2_XSL_INVALID_PEE;
 	npu2_scom_write(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
 			NPU2_MISC_DA_LEN_8B, reg);