Message ID | 20200320081849.6651-1-oohall@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | hw/prd: Hold FSP notifications while PRD is inactive | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch master (e19dddc58280e6120459053dfcbf9c026b0ac4f9) |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot | success | Test snowpatch/job/snowpatch-skiboot on branch master |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco | success | Signed-off-by present |
On Fri, Mar 20, 2020 at 7:19 PM Oliver O'Halloran <oohall@gmail.com> wrote: > > On FSP systems we rely on a service on the FSP to send us a notification > when the OCCs become active. On systems with NVDIMMs this is especially > critical because the OCC is responsible for starting the NVDIMM save > procedure when power fails. > > The message sent from the FSP isn't sent to OPAL itself, rather it's > sent to the PRD service running on the host (via OPAL). If this service > is not running OPAL will currently send an error response back to the > FSP and drop the message. This causes problems because the OCCs active > message is generally sent while OPAL is still booting the system so > the PRD daemon never gets notified that the OCC is active. > > Once the OS is running we rely on PRD to report the protection status > of the NVDIMMs on the system. However, because it never recieves the > notification from the FSP it will always report the DIMMs as > un-protected because it thinks the OCCs are inactive. > > This patch fixes the issue by allowing a single message to be held in > OPAL while PRD is inactive. Once OPAL recieves a notification that PRD > has started we deliver the message. > > It's worth pointing out that this is kind of janky and brittle and would > probably break horribly if FSP notify messages were multi-part since > we could end up in a situation where only a single part of a multi-part > message is queued, with the rest being dropped. However, the only user > of the FSP notification message appears to be the OCC, and the OCC team > says it's not a problem. I'll take their word for it. > > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Merged as d703ad5b8ea93f2bcd98ca2642dcd3c66da82c91
diff --git a/hw/prd.c b/hw/prd.c index b9d04c79d543..a9c3b34c27ce 100644 --- a/hw/prd.c +++ b/hw/prd.c @@ -374,7 +374,7 @@ int prd_hbrt_fsp_msg_notify(void *data, u32 dsize) int size, fw_notify_size; int rc = FSP_STATUS_GENERIC_ERROR; - if (!prd_enabled || !prd_active) { + if (!prd_enabled) { prlog(PR_NOTICE, "PRD: %s: PRD daemon is not ready\n", __func__); return rc; @@ -415,6 +415,12 @@ int prd_hbrt_fsp_msg_notify(void *data, u32 dsize) fw_notify->type = cpu_to_be64(PRD_FW_MSG_TYPE_HBRT_FSP); memcpy(&(fw_notify->mbox_msg), data, dsize); + if (!prd_active) { + // save the message, we'll deliver it when prd starts + rc = FSP_STATUS_BUSY; + goto unlock_events; + } + rc = opal_queue_prd_msg(prd_msg_fsp_notify); if (!rc) prd_msg_inuse = true; @@ -455,6 +461,11 @@ static int prd_msg_handle_init(struct opal_prd_msg *msg) * interrupts */ lock(&events_lock); prd_active = true; + + if (prd_msg_fsp_notify) { + if (!opal_queue_prd_msg(prd_msg_fsp_notify)) + prd_msg_inuse = true; + } if (!prd_msg_inuse) send_next_pending_event(); unlock(&events_lock);
On FSP systems we rely on a service on the FSP to send us a notification when the OCCs become active. On systems with NVDIMMs this is especially critical because the OCC is responsible for starting the NVDIMM save procedure when power fails. The message sent from the FSP isn't sent to OPAL itself, rather it's sent to the PRD service running on the host (via OPAL). If this service is not running OPAL will currently send an error response back to the FSP and drop the message. This causes problems because the OCCs active message is generally sent while OPAL is still booting the system so the PRD daemon never gets notified that the OCC is active. Once the OS is running we rely on PRD to report the protection status of the NVDIMMs on the system. However, because it never recieves the notification from the FSP it will always report the DIMMs as un-protected because it thinks the OCCs are inactive. This patch fixes the issue by allowing a single message to be held in OPAL while PRD is inactive. Once OPAL recieves a notification that PRD has started we deliver the message. It's worth pointing out that this is kind of janky and brittle and would probably break horribly if FSP notify messages were multi-part since we could end up in a situation where only a single part of a multi-part message is queued, with the rest being dropped. However, the only user of the FSP notification message appears to be the OCC, and the OCC team says it's not a problem. I'll take their word for it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- --- hw/prd.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)