diff mbox

[V2,5/7] hmi: Rework HMI event handling of FIR read failure

Message ID 1458522006-8846-6-git-send-email-ruscur@russell.cc
State Accepted
Headers show

Commit Message

Russell Currey March 21, 2016, 1 a.m. UTC
If reading the FIR with XSCOM failed, the existing code would not raise
a HMI under the assumption that the CPU was asleep and nothing is wrong.
Now that it is possible to check whether or not the CPU was asleep,
raise an unrecoverable HMI if the read failed for other reasons, and
skip the CPU if it was asleep.

If the CPU is asleep when it's not meant to be and that is the cause of
the HMI, an unrecoverable "catchall" HMI will be raised since no other
components will claim responsibility for it.

Signed-off-by: Russell Currey <ruscur@russell.cc>
---
V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword
---
 core/hmi.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Andrew Donnellan March 21, 2016, 1:29 a.m. UTC | #1
On 21/03/16 12:00, Russell Currey wrote:
> If reading the FIR with XSCOM failed, the existing code would not raise
> a HMI under the assumption that the CPU was asleep and nothing is wrong.
> Now that it is possible to check whether or not the CPU was asleep,
> raise an unrecoverable HMI if the read failed for other reasons, and
> skip the CPU if it was asleep.
>
> If the CPU is asleep when it's not meant to be and that is the cause of
> the HMI, an unrecoverable "catchall" HMI will be raised since no other
> components will claim responsibility for it.
>
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword
> ---
>   core/hmi.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/core/hmi.c b/core/hmi.c
> index 127686f..71fdf48 100644
> --- a/core/hmi.c
> +++ b/core/hmi.c
> @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>   	uint32_t core_id;
>   	int i;
>   	bool found = false;
> +	int64_t ret;

As per comment on patch 1, should this just be "int" to match the 
declaration of xscom_read()?

>
>   	/* Sanity check */
>   	if (!cpu || !hmi_evt)
> @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>   	core_id = pir_to_core_id(cpu->pir);
>
>   	/* Get CORE FIR register value. */
> -	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
> -							&core_fir) != 0) {
> +	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
> +			 &core_fir);
> +
> +	if (ret == OPAL_HARDWARE) {
>   		prerror("HMI: XSCOM error reading CORE FIR\n");
> +		/* If the FIR can't be read, we should checkstop. */
> +		return true;
> +	} else if (ret == OPAL_WRONG_STATE) {
> +		/*
> +		 * CPU is asleep, so it probably didn't cause the checkstop.
> +		 * If no other HMI cause is found a "catchall" checkstop
> +		 * will be raised, so if this CPU should've been awake the
> +		 * error will be handled appropriately.
> +		 */
> +		prlog(PR_DEBUG,
> +		      "HMI: FIR read failed, chip %d core %d asleep\n",
> +		      cpu->chip_id, core_id);
>   		return false;
>   	}
>
>

Rest doesn't look bad.
Russell Currey March 21, 2016, 2:20 a.m. UTC | #2
On Mon, 2016-03-21 at 12:29 +1100, Andrew Donnellan wrote:
> On 21/03/16 12:00, Russell Currey wrote:
> > 
> > If reading the FIR with XSCOM failed, the existing code would not raise
> > a HMI under the assumption that the CPU was asleep and nothing is
> > wrong.
> > Now that it is possible to check whether or not the CPU was asleep,
> > raise an unrecoverable HMI if the read failed for other reasons, and
> > skip the CPU if it was asleep.
> > 
> > If the CPU is asleep when it's not meant to be and that is the cause of
> > the HMI, an unrecoverable "catchall" HMI will be raised since no other
> > components will claim responsibility for it.
> > 
> > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > ---
> > V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword
> > ---
> >   core/hmi.c | 19 +++++++++++++++++--
> >   1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/core/hmi.c b/core/hmi.c
> > index 127686f..71fdf48 100644
> > --- a/core/hmi.c
> > +++ b/core/hmi.c
> > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
> >   	uint32_t core_id;
> >   	int i;
> >   	bool found = false;
> > +	int64_t ret;
> As per comment on patch 1, should this just be "int" to match the 
> declaration of xscom_read()?
> 
I used int64_t since xscom_read is an OPAL call, and since it returns OPAL
return codes which are defined as int64_t.  I do agree since it's being
used as an internal API even though it's external too, I can change it if I
spin up a v3.  I don't think it's a big deal though.
> > 
> > 
> >   	/* Sanity check */
> >   	if (!cpu || !hmi_evt)
> > @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread
> > *cpu,
> >   	core_id = pir_to_core_id(cpu->pir);
> > 
> >   	/* Get CORE FIR register value. */
> > -	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > CORE_FIR),
> > -							&core_fir) !=
> > 0) {
> > +	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > CORE_FIR),
> > +			 &core_fir);
> > +
> > +	if (ret == OPAL_HARDWARE) {
> >   		prerror("HMI: XSCOM error reading CORE FIR\n");
> > +		/* If the FIR can't be read, we should checkstop. */
> > +		return true;
> > +	} else if (ret == OPAL_WRONG_STATE) {
> > +		/*
> > +		 * CPU is asleep, so it probably didn't cause the
> > checkstop.
> > +		 * If no other HMI cause is found a "catchall"
> > checkstop
> > +		 * will be raised, so if this CPU should've been awake
> > the
> > +		 * error will be handled appropriately.
> > +		 */
> > +		prlog(PR_DEBUG,
> > +		      "HMI: FIR read failed, chip %d core %d
> > asleep\n",
> > +		      cpu->chip_id, core_id);
> >   		return false;
> >   	}
> > 
> > 
> Rest doesn't look bad.
>
Alistair Popple March 24, 2016, 3:16 a.m. UTC | #3
On Mon, 21 Mar 2016 13:20:04 Russell Currey wrote:
> On Mon, 2016-03-21 at 12:29 +1100, Andrew Donnellan wrote:
> > On 21/03/16 12:00, Russell Currey wrote:
> > > 
> > > If reading the FIR with XSCOM failed, the existing code would not raise
> > > a HMI under the assumption that the CPU was asleep and nothing is
> > > wrong.
> > > Now that it is possible to check whether or not the CPU was asleep,
> > > raise an unrecoverable HMI if the read failed for other reasons, and
> > > skip the CPU if it was asleep.
> > > 
> > > If the CPU is asleep when it's not meant to be and that is the cause of
> > > the HMI, an unrecoverable "catchall" HMI will be raised since no other
> > > components will claim responsibility for it.
> > > 
> > > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > > ---
> > > V2: Raise HMI on XSCOM fail for non-sleep reasons, completely reword
> > > ---
> > >   core/hmi.c | 19 +++++++++++++++++--
> > >   1 file changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/core/hmi.c b/core/hmi.c
> > > index 127686f..71fdf48 100644
> > > --- a/core/hmi.c
> > > +++ b/core/hmi.c
> > > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
> > >   	uint32_t core_id;
> > >   	int i;
> > >   	bool found = false;
> > > +	int64_t ret;
> > As per comment on patch 1, should this just be "int" to match the 
> > declaration of xscom_read()?
> > 
> I used int64_t since xscom_read is an OPAL call, and since it returns OPAL
> return codes which are defined as int64_t.  I do agree since it's being
> used as an internal API even though it's external too, I can change it if I
> spin up a v3.  I don't think it's a big deal though.

I tend to agree, however I will leave it up to our benevolent maintainer 
Stewart to make the final call. Otherwise everything looks good to me.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

> > > 
> > > 
> > >   	/* Sanity check */
> > >   	if (!cpu || !hmi_evt)
> > > @@ -307,9 +308,23 @@ static bool decode_core_fir(struct cpu_thread
> > > *cpu,
> > >   	core_id = pir_to_core_id(cpu->pir);
> > > 
> > >   	/* Get CORE FIR register value. */
> > > -	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > > CORE_FIR),
> > > -							&core_fir) !=
> > > 0) {
> > > +	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > > CORE_FIR),
> > > +			 &core_fir);
> > > +
> > > +	if (ret == OPAL_HARDWARE) {
> > >   		prerror("HMI: XSCOM error reading CORE FIR\n");
> > > +		/* If the FIR can't be read, we should checkstop. */
> > > +		return true;
> > > +	} else if (ret == OPAL_WRONG_STATE) {
> > > +		/*
> > > +		 * CPU is asleep, so it probably didn't cause the
> > > checkstop.
> > > +		 * If no other HMI cause is found a "catchall"
> > > checkstop
> > > +		 * will be raised, so if this CPU should've been awake
> > > the
> > > +		 * error will be handled appropriately.
> > > +		 */
> > > +		prlog(PR_DEBUG,
> > > +		      "HMI: FIR read failed, chip %d core %d
> > > asleep\n",
> > > +		      cpu->chip_id, core_id);
> > >   		return false;
> > >   	}
> > > 
> > > 
> > Rest doesn't look bad.
> > 
> 
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
>
diff mbox

Patch

diff --git a/core/hmi.c b/core/hmi.c
index 127686f..71fdf48 100644
--- a/core/hmi.c
+++ b/core/hmi.c
@@ -299,6 +299,7 @@  static bool decode_core_fir(struct cpu_thread *cpu,
 	uint32_t core_id;
 	int i;
 	bool found = false;
+	int64_t ret;
 
 	/* Sanity check */
 	if (!cpu || !hmi_evt)
@@ -307,9 +308,23 @@  static bool decode_core_fir(struct cpu_thread *cpu,
 	core_id = pir_to_core_id(cpu->pir);
 
 	/* Get CORE FIR register value. */
-	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
-							&core_fir) != 0) {
+	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
+			 &core_fir);
+
+	if (ret == OPAL_HARDWARE) {
 		prerror("HMI: XSCOM error reading CORE FIR\n");
+		/* If the FIR can't be read, we should checkstop. */
+		return true;
+	} else if (ret == OPAL_WRONG_STATE) {
+		/*
+		 * CPU is asleep, so it probably didn't cause the checkstop.
+		 * If no other HMI cause is found a "catchall" checkstop
+		 * will be raised, so if this CPU should've been awake the
+		 * error will be handled appropriately.
+		 */
+		prlog(PR_DEBUG,
+		      "HMI: FIR read failed, chip %d core %d asleep\n",
+		      cpu->chip_id, core_id);
 		return false;
 	}