Message ID | 20180820140605.11846-1-vaibhav@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | opal/hmi: Wakeup the cpu before reading core_fir | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | master/apply_patch Successfully applied |
snowpatch_ozlabs/make_check | success | Test make_check on branch master |
On 08/20/2018 07:36 PM, Vaibhav Jain wrote: > When stop state 5 is enabled, reading the core_fir during an HMI can > result in a xscom read error with xscom_read() returning the > OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At > present this return error code is not handled in decode_core_fir() > hence the invalid core_fir value is sent to the kernel where it > interprets it as a FATAL hmi causing a system check-stop. > > This can be prevented by forcing the core to wake-up using before > reading the core_fir. Hence this patch wraps the call to > read_core_fir() within calls to dctl_set_special_wakeup() and > dctl_clear_special_wakeup(). > > Suggested-by: Michael Neuling <mikey@neuling.org> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> > --- > core/hmi.c | 25 +++++++++++++++++-------- > 1 file changed, 17 insertions(+), 8 deletions(-) > > diff --git a/core/hmi.c b/core/hmi.c > index 1f665a2f..67c520a0 100644 > --- a/core/hmi.c > +++ b/core/hmi.c > @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > { > uint64_t core_fir; > uint32_t core_id; > - int i; > + int i, swkup_rc = OPAL_UNSUPPORTED; > bool found = false; > int64_t ret; > const char *loc; > @@ -390,14 +390,15 @@ static bool decode_core_fir(struct cpu_thread *cpu, > > core_id = pir_to_core_id(cpu->pir); > > + /* Force the core to wakeup, otherwise reading core_fir is unrealiable > + * if stop-state 5 is enabled. Malfunction Alert is a broadcast HMI and all core should have been already out of stop state. It is strange that with stop 5 the behaviour changes. Not sure whether this is a hardware BUG OR stop 5 never wakeup with HMI. > + */ > + swkup_rc = dctl_set_special_wakeup(cpu); > + > /* Get CORE FIR register value. */ > ret = read_core_fir(cpu->chip_id, core_id, &core_fir); May be we should clear special wakeup here once we are done with reading, instead of doing that in goto out label. Thanks, -Mahesh.
diff --git a/core/hmi.c b/core/hmi.c index 1f665a2f..67c520a0 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -379,7 +379,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, { uint64_t core_fir; uint32_t core_id; - int i; + int i, swkup_rc = OPAL_UNSUPPORTED; bool found = false; int64_t ret; const char *loc; @@ -390,14 +390,15 @@ static bool decode_core_fir(struct cpu_thread *cpu, core_id = pir_to_core_id(cpu->pir); + /* Force the core to wakeup, otherwise reading core_fir is unrealiable + * if stop-state 5 is enabled. + */ + swkup_rc = dctl_set_special_wakeup(cpu); + /* Get CORE FIR register value. */ ret = read_core_fir(cpu->chip_id, core_id, &core_fir); - if (ret == OPAL_HARDWARE) { - prerror("XSCOM error reading CORE FIR\n"); - /* If the FIR can't be read, we should checkstop. */ - return true; - } else if (ret == OPAL_WRONG_STATE) { + if (ret == OPAL_WRONG_STATE) { /* * CPU is asleep, so it probably didn't cause the checkstop. * If no other HMI cause is found a "catchall" checkstop @@ -407,11 +408,16 @@ static bool decode_core_fir(struct cpu_thread *cpu, prlog(PR_DEBUG, "FIR read failed, chip %d core %d asleep\n", cpu->chip_id, core_id); - return false; + goto out; + } else if (ret != OPAL_SUCCESS) { + prerror("XSCOM error reading CORE FIR\n"); + /* If the FIR can't be read, we should checkstop. */ + found = true; + goto out; } if (!core_fir) - return false; + goto out; loc = chip_loc_code(cpu->chip_id); prlog(PR_INFO, "[Loc: %s]: CHIP ID: %x, CORE ID: %x, FIR: %016llx\n", @@ -426,6 +432,9 @@ static bool decode_core_fir(struct cpu_thread *cpu, |= xstop_bits[i].reason; } } +out: + if (!swkup_rc) + dctl_clear_special_wakeup(cpu); return found; }