Message ID | 20230127182943.73073-1-ganeshgr@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] powerpc/mce: log the error for all unrecoverable errors | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
Ganesh Goudar <ganeshgr@linux.ibm.com> writes: > For all unrecoverable errors we are missing to log the > error, Since machine_check_log_err() is not getting called > for unrecoverable errors. > > Raise irq work in save_mce_event() for unrecoverable errors, > So that we log the error from MCE event handling block in > timer handler. But the patch also removes the irq work raise from machine_check_ue_event(). That's currently done unconditionally, regardless of the disposition. So doesn't this change also drop logging of recoverable UEs? Maybe that's OK, but the change log should explain it. > Log without this change > > MCE: CPU27: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] > MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4] > MCE: CPU27: Initiator CPU > MCE: CPU27: Unknown > > Log with this change > > MCE: CPU24: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] > MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48] > MCE: CPU24: Initiator CPU > MCE: CPU24: Unknown > RTAS: event: 5, Type: Platform Error (224), Severity: 3 > > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> > Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> > --- > V2: Rephrasing the commit message. > --- > arch/powerpc/kernel/mce.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c > index 6c5d30fba766..a1cb2172eb7b 100644 > --- a/arch/powerpc/kernel/mce.c > +++ b/arch/powerpc/kernel/mce.c > @@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled, > if (mce->error_type == MCE_ERROR_TYPE_UE) > mce->u.ue_error.ignore_event = mce_err->ignore_event; > > + /* > + * Raise irq work, So that we don't miss to log the error for > + * unrecoverable errors. > + */ > + if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED) > + mce_irq_work_queue(); > + > if (!addr) > return; > > @@ -235,7 +242,6 @@ static void machine_check_ue_event(struct machine_check_event *evt) > evt, sizeof(*evt)); > > /* Queue work to process this event later. */ This comment is meaningless without the function call it's commenting about, ie. the comment should be removed too. > - mce_irq_work_queue(); > } > cheers
On 1/31/23 4:59 PM, Michael Ellerman wrote: > Ganesh Goudar<ganeshgr@linux.ibm.com> writes: >> For all unrecoverable errors we are missing to log the >> error, Since machine_check_log_err() is not getting called >> for unrecoverable errors. >> >> Raise irq work in save_mce_event() for unrecoverable errors, >> So that we log the error from MCE event handling block in >> timer handler. > But the patch also removes the irq work raise from machine_check_ue_event(). > > That's currently done unconditionally, regardless of the disposition. So > doesn't this change also drop logging of recoverable UEs? > > Maybe that's OK, but the change log should explain it. Yes, its ok, exception vector code will do that for recoverable errors, ill explain this in commit message. > >> Log without this change >> >> MCE: CPU27: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] >> MCE: CPU27: PID: 10580 Comm: inject-ra-err NIP: [0000000010000df4] >> MCE: CPU27: Initiator CPU >> MCE: CPU27: Unknown >> >> Log with this change >> >> MCE: CPU24: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered] >> MCE: CPU24: PID: 1589811 Comm: inject-ra-err NIP: [0000000010000e48] >> MCE: CPU24: Initiator CPU >> MCE: CPU24: Unknown >> RTAS: event: 5, Type: Platform Error (224), Severity: 3 >> >> Signed-off-by: Ganesh Goudar<ganeshgr@linux.ibm.com> >> Reviewed-by: Mahesh Salgaonkar<mahesh@linux.ibm.com> >> --- >> V2: Rephrasing the commit message. >> --- >> arch/powerpc/kernel/mce.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c >> index 6c5d30fba766..a1cb2172eb7b 100644 >> --- a/arch/powerpc/kernel/mce.c >> +++ b/arch/powerpc/kernel/mce.c >> @@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled, >> if (mce->error_type == MCE_ERROR_TYPE_UE) >> mce->u.ue_error.ignore_event = mce_err->ignore_event; >> >> + /* >> + * Raise irq work, So that we don't miss to log the error for >> + * unrecoverable errors. >> + */ >> + if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED) >> + mce_irq_work_queue(); >> + >> if (!addr) >> return; >> >> @@ -235,7 +242,6 @@ static void machine_check_ue_event(struct machine_check_event *evt) >> evt, sizeof(*evt)); >> >> /* Queue work to process this event later. */ > This comment is meaningless without the function call it's commenting > about, ie. the comment should be removed too. ok. Thanks.
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 6c5d30fba766..a1cb2172eb7b 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -131,6 +131,13 @@ void save_mce_event(struct pt_regs *regs, long handled, if (mce->error_type == MCE_ERROR_TYPE_UE) mce->u.ue_error.ignore_event = mce_err->ignore_event; + /* + * Raise irq work, So that we don't miss to log the error for + * unrecoverable errors. + */ + if (mce->disposition == MCE_DISPOSITION_NOT_RECOVERED) + mce_irq_work_queue(); + if (!addr) return; @@ -235,7 +242,6 @@ static void machine_check_ue_event(struct machine_check_event *evt) evt, sizeof(*evt)); /* Queue work to process this event later. */ - mce_irq_work_queue(); } /*