Message ID | 1458522006-8846-6-git-send-email-ruscur@russell.cc |
---|---|
State | Accepted |
Headers | show |
On 21/03/16 12:00, Russell Currey wrote: > If reading the FIR with XSCOM failed, the existing code would not raise > a HMI under the assumption that the CPU was asleep and nothing is wrong. > Now that it is possible to check whether or not the CPU was asleep, > raise an unrecoverable HMI if the read failed for other reasons, and > skip the CPU if it was asleep. > > If the CPU is asleep when it's not meant to be and that is the cause of > the HMI, an unrecoverable "catchall" HMI will be raised since no other > components will claim responsibility for it. > > Signed-off-by: Russell Currey <ruscur@russell.cc> > --- > V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword > --- > core/hmi.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/core/hmi.c b/core/hmi.c > index 127686f..71fdf48 100644 > --- a/core/hmi.c > +++ b/core/hmi.c > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > uint32_t core_id; > int i; > bool found = false; > + int64_t ret; As per comment on patch 1, should this just be "int" to match the declaration of xscom_read()? > > /* Sanity check */ > if (!cpu || !hmi_evt) > @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread *cpu, > core_id = pir_to_core_id(cpu->pir); > > /* Get CORE FIR register value. */ > - if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR), > - &core_fir) != 0) { > + ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR), > + &core_fir); > + > + if (ret == OPAL_HARDWARE) { > prerror("HMI: XSCOM error reading CORE FIR\n"); > + /* If the FIR can't be read, we should checkstop. */ > + return true; > + } else if (ret == OPAL_WRONG_STATE) { > + /* > + * CPU is asleep, so it probably didn't cause the checkstop. > + * If no other HMI cause is found a "catchall" checkstop > + * will be raised, so if this CPU should've been awake the > + * error will be handled appropriately. > + */ > + prlog(PR_DEBUG, > + "HMI: FIR read failed, chip %d core %d asleep\n", > + cpu->chip_id, core_id); > return false; > } > > Rest doesn't look bad.
On Mon, 2016-03-21 at 12:29 +1100, Andrew Donnellan wrote: > On 21/03/16 12:00, Russell Currey wrote: > > > > If reading the FIR with XSCOM failed, the existing code would not raise > > a HMI under the assumption that the CPU was asleep and nothing is > > wrong. > > Now that it is possible to check whether or not the CPU was asleep, > > raise an unrecoverable HMI if the read failed for other reasons, and > > skip the CPU if it was asleep. > > > > If the CPU is asleep when it's not meant to be and that is the cause of > > the HMI, an unrecoverable "catchall" HMI will be raised since no other > > components will claim responsibility for it. > > > > Signed-off-by: Russell Currey <ruscur@russell.cc> > > --- > > V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword > > --- > > core/hmi.c | 19 +++++++++++++++++-- > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > diff --git a/core/hmi.c b/core/hmi.c > > index 127686f..71fdf48 100644 > > --- a/core/hmi.c > > +++ b/core/hmi.c > > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > > uint32_t core_id; > > int i; > > bool found = false; > > + int64_t ret; > As per comment on patch 1, should this just be "int" to match the > declaration of xscom_read()? > I used int64_t since xscom_read is an OPAL call, and since it returns OPAL return codes which are defined as int64_t. I do agree since it's being used as an internal API even though it's external too, I can change it if I spin up a v3. I don't think it's a big deal though. > > > > > > /* Sanity check */ > > if (!cpu || !hmi_evt) > > @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread > > *cpu, > > core_id = pir_to_core_id(cpu->pir); > > > > /* Get CORE FIR register value. */ > > - if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, > > CORE_FIR), > > - &core_fir) != > > 0) { > > + ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, > > CORE_FIR), > > + &core_fir); > > + > > + if (ret == OPAL_HARDWARE) { > > prerror("HMI: XSCOM error reading CORE FIR\n"); > > + /* If the FIR can't be read, we should checkstop. */ > > + return true; > > + } else if (ret == OPAL_WRONG_STATE) { > > + /* > > + * CPU is asleep, so it probably didn't cause the > > checkstop. > > + * If no other HMI cause is found a "catchall" > > checkstop > > + * will be raised, so if this CPU should've been awake > > the > > + * error will be handled appropriately. > > + */ > > + prlog(PR_DEBUG, > > + "HMI: FIR read failed, chip %d core %d > > asleep\n", > > + cpu->chip_id, core_id); > > return false; > > } > > > > > Rest doesn't look bad. >
On Mon, 21 Mar 2016 13:20:04 Russell Currey wrote: > On Mon, 2016-03-21 at 12:29 +1100, Andrew Donnellan wrote: > > On 21/03/16 12:00, Russell Currey wrote: > > > > > > If reading the FIR with XSCOM failed, the existing code would not raise > > > a HMI under the assumption that the CPU was asleep and nothing is > > > wrong. > > > Now that it is possible to check whether or not the CPU was asleep, > > > raise an unrecoverable HMI if the read failed for other reasons, and > > > skip the CPU if it was asleep. > > > > > > If the CPU is asleep when it's not meant to be and that is the cause of > > > the HMI, an unrecoverable "catchall" HMI will be raised since no other > > > components will claim responsibility for it. > > > > > > Signed-off-by: Russell Currey <ruscur@russell.cc> > > > --- > > > V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword > > > --- > > > core/hmi.c | 19 +++++++++++++++++-- > > > 1 file changed, 17 insertions(+), 2 deletions(-) > > > > > > diff --git a/core/hmi.c b/core/hmi.c > > > index 127686f..71fdf48 100644 > > > --- a/core/hmi.c > > > +++ b/core/hmi.c > > > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, > > > uint32_t core_id; > > > int i; > > > bool found = false; > > > + int64_t ret; > > As per comment on patch 1, should this just be "int" to match the > > declaration of xscom_read()? > > > I used int64_t since xscom_read is an OPAL call, and since it returns OPAL > return codes which are defined as int64_t. I do agree since it's being > used as an internal API even though it's external too, I can change it if I > spin up a v3. I don't think it's a big deal though. I tend to agree, however I will leave it up to our benevolent maintainer Stewart to make the final call. Otherwise everything looks good to me. Reviewed-by: Alistair Popple <alistair@popple.id.au> > > > > > > > > > /* Sanity check */ > > > if (!cpu || !hmi_evt) > > > @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread > > > *cpu, > > > core_id = pir_to_core_id(cpu->pir); > > > > > > /* Get CORE FIR register value. */ > > > - if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, > > > CORE_FIR), > > > - &core_fir) != > > > 0) { > > > + ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, > > > CORE_FIR), > > > + &core_fir); > > > + > > > + if (ret == OPAL_HARDWARE) { > > > prerror("HMI: XSCOM error reading CORE FIR\n"); > > > + /* If the FIR can't be read, we should checkstop. */ > > > + return true; > > > + } else if (ret == OPAL_WRONG_STATE) { > > > + /* > > > + * CPU is asleep, so it probably didn't cause the > > > checkstop. > > > + * If no other HMI cause is found a "catchall" > > > checkstop > > > + * will be raised, so if this CPU should've been awake > > > the > > > + * error will be handled appropriately. > > > + */ > > > + prlog(PR_DEBUG, > > > + "HMI: FIR read failed, chip %d core %d > > > asleep\n", > > > + cpu->chip_id, core_id); > > > return false; > > > } > > > > > > > > Rest doesn't look bad. > > > > _______________________________________________ > Skiboot mailing list > Skiboot@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/skiboot >
diff --git a/core/hmi.c b/core/hmi.c index 127686f..71fdf48 100644 --- a/core/hmi.c +++ b/core/hmi.c @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu, uint32_t core_id; int i; bool found = false; + int64_t ret; /* Sanity check */ if (!cpu || !hmi_evt) @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread *cpu, core_id = pir_to_core_id(cpu->pir); /* Get CORE FIR register value. */ - if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR), - &core_fir) != 0) { + ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR), + &core_fir); + + if (ret == OPAL_HARDWARE) { prerror("HMI: XSCOM error reading CORE FIR\n"); + /* If the FIR can't be read, we should checkstop. */ + return true; + } else if (ret == OPAL_WRONG_STATE) { + /* + * CPU is asleep, so it probably didn't cause the checkstop. + * If no other HMI cause is found a "catchall" checkstop + * will be raised, so if this CPU should've been awake the + * error will be handled appropriately. + */ + prlog(PR_DEBUG, + "HMI: FIR read failed, chip %d core %d asleep\n", + cpu->chip_id, core_id); return false; }
If reading the FIR with XSCOM failed, the existing code would not raise a HMI under the assumption that the CPU was asleep and nothing is wrong. Now that it is possible to check whether or not the CPU was asleep, raise an unrecoverable HMI if the read failed for other reasons, and skip the CPU if it was asleep. If the CPU is asleep when it's not meant to be and that is the cause of the HMI, an unrecoverable "catchall" HMI will be raised since no other components will claim responsibility for it. Signed-off-by: Russell Currey <ruscur@russell.cc> --- V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword --- core/hmi.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)