Message ID | 20180827101354.678-1-vaibhav@linux.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | [v2] opal/hmi: Wakeup the cpu before reading core_fir | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | master/apply_patch Successfully applied |
snowpatch_ozlabs/make_check | success | Test make_check on branch master |
On Mon, Aug 27, 2018 at 03:43:54PM +0530, Vaibhav Jain wrote: > When stop state 5 is enabled, reading the core_fir during an HMI can > result in a xscom read error with xscom_read() returning an > OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At > present this return error code is not handled in decode_core_fir() > hence the invalid core_fir value is sent to the kernel where it > interprets it as a FATAL hmi causing a system check-stop. > > This can be prevented by forcing the core to wake-up using before > reading the core_fir. Hence this patch wraps the call to > read_core_fir() within calls to dctl_set_special_wakeup() and > dctl_clear_special_wakeup(). > > Suggested-by: Michael Neuling <mikey@neuling.org> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Looks ok to me. Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > --- > Change-log: > > v2 -> Simplified the flow to just wrap read_core_fir() in > dctl_set/clear_special_wakeup instead. [Mahesh] > --- > core/hmi.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/core/hmi.c b/core/hmi.c > index 1f665a2f..d9240cee 100644 > --- a/core/hmi.c > +++ b/core/hmi.c > @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > { > uint64_t core_fir; > uint32_t core_id; > - int i; > + int i, swkup_rc; > bool found = false; > int64_t ret; > const char *loc; > @@ -390,14 +390,19 @@ static bool decode_core_fir(struct cpu_thread *cpu, > > core_id = pir_to_core_id(cpu->pir); > > + /* Force the core to wakeup, otherwise reading core_fir is unrealiable > + * if stop-state 5 is enabled. > + */ > + swkup_rc = dctl_set_special_wakeup(cpu); > + > /* Get CORE FIR register value. */ > ret = read_core_fir(cpu->chip_id, core_id, &core_fir); > > - if (ret == OPAL_HARDWARE) { > - prerror("XSCOM error reading CORE FIR\n"); > - /* If the FIR can't be read, we should checkstop. */ > - return true; > - } else if (ret == OPAL_WRONG_STATE) { > + if (!swkup_rc) > + dctl_clear_special_wakeup(cpu); > + > + > + if (ret == OPAL_WRONG_STATE) { > /* > * CPU is asleep, so it probably didn't cause the checkstop. > * If no other HMI cause is found a "catchall" checkstop > @@ -408,6 +413,10 @@ static bool decode_core_fir(struct cpu_thread *cpu, > "FIR read failed, chip %d core %d asleep\n", > cpu->chip_id, core_id); > return false; > + } else if (ret != OPAL_SUCCESS) { > + prerror("XSCOM error reading CORE FIR\n"); > + /* If the FIR can't be read, we should checkstop. */ > + return true; > } > > if (!core_fir) > -- > 2.17.1 >
On 08/27/2018 03:43 PM, Vaibhav Jain wrote: > When stop state 5 is enabled, reading the core_fir during an HMI can > result in a xscom read error with xscom_read() returning an > OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At > present this return error code is not handled in decode_core_fir() > hence the invalid core_fir value is sent to the kernel where it > interprets it as a FATAL hmi causing a system check-stop. > > This can be prevented by forcing the core to wake-up using before > reading the core_fir. Hence this patch wraps the call to > read_core_fir() within calls to dctl_set_special_wakeup() and > dctl_clear_special_wakeup(). > > Suggested-by: Michael Neuling <mikey@neuling.org> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> > Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> > --- > Change-log: > > v2 -> Simplified the flow to just wrap read_core_fir() in > dctl_set/clear_special_wakeup instead. [Mahesh] > --- > core/hmi.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/core/hmi.c b/core/hmi.c > index 1f665a2f..d9240cee 100644 > --- a/core/hmi.c > +++ b/core/hmi.c > @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > { > uint64_t core_fir; > uint32_t core_id; > - int i; > + int i, swkup_rc; > bool found = false; > int64_t ret; > const char *loc; > @@ -390,14 +390,19 @@ static bool decode_core_fir(struct cpu_thread *cpu, > > core_id = pir_to_core_id(cpu->pir); > > + /* Force the core to wakeup, otherwise reading core_fir is unrealiable > + * if stop-state 5 is enabled. > + */ > + swkup_rc = dctl_set_special_wakeup(cpu); > + > /* Get CORE FIR register value. */ > ret = read_core_fir(cpu->chip_id, core_id, &core_fir); > > - if (ret == OPAL_HARDWARE) { > - prerror("XSCOM error reading CORE FIR\n"); > - /* If the FIR can't be read, we should checkstop. */ > - return true; > - } else if (ret == OPAL_WRONG_STATE) { > + if (!swkup_rc) > + dctl_clear_special_wakeup(cpu); > + > + > + if (ret == OPAL_WRONG_STATE) { > /* > * CPU is asleep, so it probably didn't cause the checkstop. > * If no other HMI cause is found a "catchall" checkstop > @@ -408,6 +413,10 @@ static bool decode_core_fir(struct cpu_thread *cpu, > "FIR read failed, chip %d core %d asleep\n", > cpu->chip_id, core_id); > return false; > + } else if (ret != OPAL_SUCCESS) { > + prerror("XSCOM error reading CORE FIR\n"); > + /* If the FIR can't be read, we should checkstop. */ > + return true; > } > > if (!core_fir) >
Vaibhav Jain <vaibhav@linux.ibm.com> writes: > When stop state 5 is enabled, reading the core_fir during an HMI can > result in a xscom read error with xscom_read() returning an > OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At > present this return error code is not handled in decode_core_fir() > hence the invalid core_fir value is sent to the kernel where it > interprets it as a FATAL hmi causing a system check-stop. > > This can be prevented by forcing the core to wake-up using before > reading the core_fir. Hence this patch wraps the call to > read_core_fir() within calls to dctl_set_special_wakeup() and > dctl_clear_special_wakeup(). > > Suggested-by: Michael Neuling <mikey@neuling.org> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Thanks! Merged to master as of 7cef472ed1fea75cc63cb0d8ac86d6a13d08517d
diff --git a/core/hmi.c b/core/hmi.c index 1f665a2f..d9240cee 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, { uint64_t core_fir; uint32_t core_id; - int i; + int i, swkup_rc; bool found = false; int64_t ret; const char *loc; @@ -390,14 +390,19 @@ static bool decode_core_fir(struct cpu_thread *cpu, core_id = pir_to_core_id(cpu->pir); + /* Force the core to wakeup, otherwise reading core_fir is unrealiable + * if stop-state 5 is enabled. + */ + swkup_rc = dctl_set_special_wakeup(cpu); + /* Get CORE FIR register value. */ ret = read_core_fir(cpu->chip_id, core_id, &core_fir); - if (ret == OPAL_HARDWARE) { - prerror("XSCOM error reading CORE FIR\n"); - /* If the FIR can't be read, we should checkstop. */ - return true; - } else if (ret == OPAL_WRONG_STATE) { + if (!swkup_rc) + dctl_clear_special_wakeup(cpu); + + + if (ret == OPAL_WRONG_STATE) { /* * CPU is asleep, so it probably didn't cause the checkstop. * If no other HMI cause is found a "catchall" checkstop @@ -408,6 +413,10 @@ static bool decode_core_fir(struct cpu_thread *cpu, "FIR read failed, chip %d core %d asleep\n", cpu->chip_id, core_id); return false; + } else if (ret != OPAL_SUCCESS) { + prerror("XSCOM error reading CORE FIR\n"); + /* If the FIR can't be read, we should checkstop. */ + return true; } if (!core_fir)