diff mbox

hw/fsp/rtc: read/write cached rtc tod on fsp hir.

Message ID 1491184641-19738-1-git-send-email-ppaidipe@linux.vnet.ibm.com
State Accepted
Headers show

Commit Message

ppaidipe April 3, 2017, 1:57 a.m. UTC
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
reset. Use latest fsp_in_rr() function to properly read the cached rtc
value when fsp reset initiated by the hir.

Below is the kernel trace when we set hw clock, when hir process starts.

[ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
[ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
[ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
[ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
[ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
[ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
[ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 1727.775889]   CR: 28024442  XER: 20000000
[ 1727.775890] CFAR: c00000000008472c SOFTE: 1
               GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
               GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
               GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
               GPR12: c0000000000846e8 c00000000fba0100
[ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
[ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
[ 1727.775899] Call Trace:
[ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
[ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
[ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
[ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
[ 1727.775908] Instruction dump:
[ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
[ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4

This is found when executing the testcase
https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py

With this fix ran fsp hir torture testcase in the above test
which is working fine.

Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
---
 hw/fsp/fsp-rtc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Ananth N Mavinakayanahalli April 4, 2017, 8:33 a.m. UTC | #1
On Mon, Apr 03, 2017 at 07:27:21AM +0530, Pridhiviraj Paidipeddi wrote:
> Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> reset. Use latest fsp_in_rr() function to properly read the cached rtc
> value when fsp reset initiated by the hir.
> 
> Below is the kernel trace when we set hw clock, when hir process starts.
> 
> [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
> [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 1727.775889]   CR: 28024442  XER: 20000000
> [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
>                GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
>                GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
>                GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
>                GPR12: c0000000000846e8 c00000000fba0100
> [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> [ 1727.775899] Call Trace:
> [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> [ 1727.775908] Instruction dump:
> [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
> 
> This is found when executing the testcase
> https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
> 
> With this fix ran fsp hir torture testcase in the above test
> which is working fine.
> 
> Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>

Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>

This will work, but we will need to audit the other FSP_RESET_START
cases also.

A particular sequence of actions (with timeouts) need to be executed to
put an FSP in a state ready to be reset (HIR case). The actual RESET_START
notification is sent after the HIR sequence actually triggers the FSP
reset. The window between the HIR sequence start to the actual
notification is where this problem can occur.
Ananth N Mavinakayanahalli June 13, 2017, 5:43 a.m. UTC | #2
On Tue, Apr 04, 2017 at 02:03:48PM +0530, Ananth N Mavinakayanahalli wrote:
> On Mon, Apr 03, 2017 at 07:27:21AM +0530, Pridhiviraj Paidipeddi wrote:
> > Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> > reset. Use latest fsp_in_rr() function to properly read the cached rtc
> > value when fsp reset initiated by the hir.
> > 
> > Below is the kernel trace when we set hw clock, when hir process starts.
> > 
> > [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> > [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> > [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> > [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> > [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> > [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
> > [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> > [ 1727.775889]   CR: 28024442  XER: 20000000
> > [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
> >                GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
> >                GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
> >                GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
> >                GPR12: c0000000000846e8 c00000000fba0100
> > [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> > [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> > [ 1727.775899] Call Trace:
> > [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> > [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> > [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> > [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> > [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> > [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> > [ 1727.775908] Instruction dump:
> > [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> > [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
> > 
> > This is found when executing the testcase
> > https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
> > 
> > With this fix ran fsp hir torture testcase in the above test
> > which is working fine.
> > 
> > Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> 
> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
> 
> This will work, but we will need to audit the other FSP_RESET_START
> cases also.
> 
> A particular sequence of actions (with timeouts) need to be executed to
> put an FSP in a state ready to be reset (HIR case). The actual RESET_START
> notification is sent after the HIR sequence actually triggers the FSP
> reset. The window between the HIR sequence start to the actual
> notification is where this problem can occur.

Stewart,

Can you please pull this patch in? We are auditing any further cases
where we need to plug this interval, but by itself, this patch is useful
and fixes a real bug. This also needs to go into the stable branches.

Regards,
Ananth
Vasant Hegde June 13, 2017, 6:07 a.m. UTC | #3
On 04/03/2017 07:27 AM, Pridhiviraj Paidipeddi wrote:
> Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> reset. Use latest fsp_in_rr() function to properly read the cached rtc
> value when fsp reset initiated by the hir.
>
> Below is the kernel trace when we set hw clock, when hir process starts.
>
> [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
> [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 1727.775889]   CR: 28024442  XER: 20000000
> [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
>                GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
>                GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
>                GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
>                GPR12: c0000000000846e8 c00000000fba0100
> [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> [ 1727.775899] Call Trace:
> [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> [ 1727.775908] Instruction dump:
> [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
>
> This is found when executing the testcase
> https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
>
> With this fix ran fsp hir torture testcase in the above test
> which is working fine.
>
> Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>

Patch itself looks good.

Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>

With this changes fsp_in_reset variables becomes redundant. We will have to 
cleanup that code.
I will send cleanup patch along with other R/R fix later.

-Vasant
Stewart Smith June 14, 2017, 6:55 a.m. UTC | #4
Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> writes:
> Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> reset. Use latest fsp_in_rr() function to properly read the cached rtc
> value when fsp reset initiated by the hir.
>
> Below is the kernel trace when we set hw clock, when hir process starts.
>
> [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
> [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 1727.775889]   CR: 28024442  XER: 20000000
> [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
>                GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
>                GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
>                GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
>                GPR12: c0000000000846e8 c00000000fba0100
> [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> [ 1727.775899] Call Trace:
> [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> [ 1727.775908] Instruction dump:
> [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
>
> This is found when executing the testcase
> https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
>
> With this fix ran fsp hir torture testcase in the above test
> which is working fine.
>
> Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> ---
>  hw/fsp/fsp-rtc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks.

Merged to master as of 447ccc4de529f001271fd4dfd78401bc4c90832e

and to 5.4.x as of bec5e69e8d9a8740ecb7842c8c551651a40bde89
diff mbox

Patch

diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
index df0f679..b908ce9 100644
--- a/hw/fsp/fsp-rtc.c
+++ b/hw/fsp/fsp-rtc.c
@@ -280,7 +280,7 @@  static int64_t fsp_opal_rtc_read(uint32_t *year_month_day,
 	}
 
 	/* During R/R of FSP, read cached TOD */
-	if (fsp_in_reset) {
+	if (fsp_in_rr()) {
 		if (rtc_tod_state == RTC_TOD_VALID) {
 			rtc_cache_get_datetime(year_month_day,
 					       hour_minute_second_millisecond);
@@ -362,7 +362,7 @@  static int64_t fsp_rtc_send_write_request(uint32_t year_month_day,
 	}
 	prlog(PR_TRACE, " -> req at %p\n", msg);
 
-	if (fsp_in_reset) {
+	if (fsp_in_rr()) {
 		datetime_to_tm(msg->data.words[0],
 			       (u64) msg->data.words[1] << 32,  &tm);
 		rtc_cache_update(&tm);