Message ID | 1432787595-9946-1-git-send-email-kamalesh@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote: > We print the respective warning after parsing EPOW interrupts, > prompting user to take action depending upon the severity of the > event. > > Some times same EPOW event warning, such as below could flood kernel > log, within very short duration. So Limit the message by using > ratelimit variant of pr_err. > > May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared > May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared > May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared > May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared > May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared > May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared > May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared > May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared > May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared > May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared > May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared > May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared > May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared Looking at the time stamps those are actually all fairly far apart in time, aren't they? So do we actually see them within a short duration in practice? It does seem sensible to rate limit them though. > diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c > index 02e4a17..2556bc2 100644 > --- a/arch/powerpc/platforms/pseries/ras.c > +++ b/arch/powerpc/platforms/pseries/ras.c > @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log) > > switch (action_code) { > case EPOW_RESET: > - pr_err("Non critical power or cooling issue cleared"); > + pr_err_ratelimited("Non critical power or cooling issue cleared"); > break; > > case EPOW_WARN_COOLING: > - pr_err("Non critical cooling issue reported by firmware"); > - pr_err("Check RTAS error log for details"); > + pr_err_ratelimited("Non critical cooling issue reported by firmware"); > + pr_err_ratelimited("Check RTAS error log for details"); > break; > > case EPOW_WARN_POWER: > - pr_err("Non critical power issue reported by firmware"); > - pr_err("Check RTAS error log for details"); > + pr_err_ratelimited("Non critical power issue reported by firmware"); > + pr_err_ratelimited("Check RTAS error log for details"); > break; Those last two could be collapsed onto one line which would reduce the spam. cheers
* Michael Ellerman <mpe@ellerman.id.au> [2015-06-01 21:26:51]: > On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote: > > We print the respective warning after parsing EPOW interrupts, > > prompting user to take action depending upon the severity of the > > event. > > > > Some times same EPOW event warning, such as below could flood kernel > > log, within very short duration. So Limit the message by using > > ratelimit variant of pr_err. > > > > May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared > > May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared > > May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared > > May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared > > May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared > > May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared > > May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared > > May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared > > May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared > > May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared > > May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared > > May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared > > May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared > > Looking at the time stamps those are actually all fairly far apart in time, > aren't they? So do we actually see them within a short duration in practice? Thanks for the review. Agree, I should have phrased it better. My intend was to say, that these warnings keep flooding the kernel log, over a period of time. [..] > > case EPOW_WARN_POWER: > > - pr_err("Non critical power issue reported by firmware"); > > - pr_err("Check RTAS error log for details"); > > + pr_err_ratelimited("Non critical power issue reported by firmware"); > > + pr_err_ratelimited("Check RTAS error log for details"); > > break; > > Those last two could be collapsed onto one line which would reduce the spam. Yes, it could reduce the number of lines printed. Will resend the patch with the changes. Thanks, Kamalesh.
On Tue, 2015-06-02 at 10:33 +0530, Kamalesh Babulal wrote: > * Michael Ellerman <mpe@ellerman.id.au> [2015-06-01 21:26:51]: > > > On Thu, 2015-05-28 at 10:03 +0530, Kamalesh Babulal wrote: > > > We print the respective warning after parsing EPOW interrupts, > > > prompting user to take action depending upon the severity of the > > > event. > > > > > > Some times same EPOW event warning, such as below could flood kernel > > > log, within very short duration. So Limit the message by using > > > ratelimit variant of pr_err. > > > > > > May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared > > > May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared > > > May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared > > > May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared > > > May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared > > > May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared > > > May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared > > > > Looking at the time stamps those are actually all fairly far apart in time, > > aren't they? So do we actually see them within a short duration in practice? > > Thanks for the review. Agree, I should have phrased it better. My intend was to > say, that these warnings keep flooding the kernel log, over a period of time. OK. By default printk_ratelimited() allows up to 10 messages in five seconds, so it won't reduce the number of messages in the above example. But I'm still OK with a patch to ratelimit them. > [..] > > > case EPOW_WARN_POWER: > > > - pr_err("Non critical power issue reported by firmware"); > > > - pr_err("Check RTAS error log for details"); > > > + pr_err_ratelimited("Non critical power issue reported by firmware"); > > > + pr_err_ratelimited("Check RTAS error log for details"); > > > break; > > > > Those last two could be collapsed onto one line which would reduce the spam. > > Yes, it could reduce the number of lines printed. Will resend the patch with the > changes. Thanks. cheers
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 02e4a17..2556bc2 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -145,17 +145,17 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log) switch (action_code) { case EPOW_RESET: - pr_err("Non critical power or cooling issue cleared"); + pr_err_ratelimited("Non critical power or cooling issue cleared"); break; case EPOW_WARN_COOLING: - pr_err("Non critical cooling issue reported by firmware"); - pr_err("Check RTAS error log for details"); + pr_err_ratelimited("Non critical cooling issue reported by firmware"); + pr_err_ratelimited("Check RTAS error log for details"); break; case EPOW_WARN_POWER: - pr_err("Non critical power issue reported by firmware"); - pr_err("Check RTAS error log for details"); + pr_err_ratelimited("Non critical power issue reported by firmware"); + pr_err_ratelimited("Check RTAS error log for details"); break; case EPOW_SYSTEM_SHUTDOWN: @@ -177,7 +177,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log) break; default: - pr_err("Unknown power/cooling event (action code %d)", + pr_err_ratelimited("Unknown power/cooling event (action code %d)", action_code); } }
We print the respective warning after parsing EPOW interrupts, prompting user to take action depending upon the severity of the event. Some times same EPOW event warning, such as below could flood kernel log, within very short duration. So Limit the message by using ratelimit variant of pr_err. May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared May 25 04:04:24 alp kernel: Non critical power or cooling issue cleared May 25 04:07:18 alp kernel: Non critical power or cooling issue cleared May 25 04:13:04 alp kernel: Non critical power or cooling issue cleared May 25 04:22:04 alp kernel: Non critical power or cooling issue cleared May 25 04:22:26 alp kernel: Non critical power or cooling issue cleared May 25 04:22:36 alp kernel: Non critical power or cooling issue cleared Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> --- arch/powerpc/platforms/pseries/ras.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)