Message ID | 1517554952-23789-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | fast-boot: occ: Re-parse the pstate table during fast-boot | expand |
On Fri, 2 Feb 2018 12:32:32 +0530 Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote: > OCC shares the frequency list to host by copying the pstate table to > main memory in HOMER. This table is parsed during boot to create > device-tree properties for frequency and pstate IDs. OCC can update > the pstate table to present a new set of frequencies to the host. But > host will remain oblivious to these changes unless it is re-inited > with the updated device-tree CPU frequency properties. So this patch > allows to re-parse the pstate table and update the device-tree > properties during fast-reboot. > > OCC updates the pstate table when asked to do so using pstate-table > bias command. And this is mainly used by WOF team for > characterization purposes. Would this ever be used in production, I'm guessing not? I don't think that's a bad thing as such -- designing for test is always good. Perhaps a comment though to explain why you're re-parsing it. Without knowing much about OCC, I'll guess you're doing this so you can update the OCC at runtime without having to update firmware before each IPL? I guess we should always keep in mind fast reboot should match IPL as closely as possible and any undetected deviations are a pretty serious flaw. (e.g., you mess up your OCC state and want to return to normal, you would reboot). I'm just wondering, should this be under an nvram option? Thanks, Nick
* Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]: > > On Fri, 2 Feb 2018 12:32:32 +0530 > Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote: > > > OCC shares the frequency list to host by copying the pstate table to > > main memory in HOMER. This table is parsed during boot to create > > device-tree properties for frequency and pstate IDs. OCC can update > > the pstate table to present a new set of frequencies to the host. But > > host will remain oblivious to these changes unless it is re-inited > > with the updated device-tree CPU frequency properties. So this patch > > allows to re-parse the pstate table and update the device-tree > > properties during fast-reboot. > > > > OCC updates the pstate table when asked to do so using pstate-table > > bias command. And this is mainly used by WOF team for > > characterization purposes. > > Would this ever be used in production, I'm guessing not? I don't > think that's a bad thing as such -- designing for test is always > good. Perhaps a comment though to explain why you're re-parsing > it. Never say never :) At this time this facility is to enabling tooling to set OCC parameter at runtime and test the system without encoding all parameters and building a PNOR and dependent components. This enables a very efficient workflow with just a OCC reboot and fast-reboot on OPAL+Linux and we are back in about 2 minutes. This can be leveraged in automation/CI also to test various parameters. > Without knowing much about OCC, I'll guess you're doing this so > you can update the OCC at runtime without having to update firmware > before each IPL? Yep, you got the use case right. These OCC and PState parameters and tunings can be tested and later rolled up into the firmware. > I guess we should always keep in mind fast reboot should match IPL > as closely as possible and any undetected deviations are a pretty > serious flaw. (e.g., you mess up your OCC state and want to return > to normal, you would reboot). We expect PState table to change for these tests. If something goes wrong, OCC will crash or pull system to safe mode. No major change in system configuration like cpu, memory, IO. If something really bad happens, we will hang/checkstop and we will have to re-ipl to recover. If there was no OCC PState changes, then parsing it again should get us exact state compared to Power-ON and hence no risk. > I'm just wondering, should this be under an nvram option? The risk actually depends on what we ask OCC to do and hence not a major config change/risk for OPAL. I would like to leave it as default for fast-reboot. We add slight time factor to rebuild the relevant device tree. Given that fast-reboot itself is experimental, this is a acceptable risk and overhead. If we hit new error/fail scenarios in future, we can add settings. I would like to roll this into a single knob for fast-reboot like "safe", "risky" or something that can help us choose what we want to do in reinit path. We need not call out OCC reinit as explicit nvram option at this time. --Vaidy
On Sat, 3 Feb 2018 12:27:02 +0530 Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote: > * Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]: > > > > > On Fri, 2 Feb 2018 12:32:32 +0530 > > Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote: > > > > > OCC shares the frequency list to host by copying the pstate table to > > > main memory in HOMER. This table is parsed during boot to create > > > device-tree properties for frequency and pstate IDs. OCC can update > > > the pstate table to present a new set of frequencies to the host. But > > > host will remain oblivious to these changes unless it is re-inited > > > with the updated device-tree CPU frequency properties. So this patch > > > allows to re-parse the pstate table and update the device-tree > > > properties during fast-reboot. > > > > > > OCC updates the pstate table when asked to do so using pstate-table > > > bias command. And this is mainly used by WOF team for > > > characterization purposes. > > > > Would this ever be used in production, I'm guessing not? I don't > > think that's a bad thing as such -- designing for test is always > > good. Perhaps a comment though to explain why you're re-parsing > > it. > > Never say never :) > > At this time this facility is to enabling tooling to set OCC parameter > at runtime and test the system without encoding all parameters and > building a PNOR and dependent components. > > This enables a very efficient workflow with just a OCC reboot and > fast-reboot on OPAL+Linux and we are back in about 2 minutes. This > can be leveraged in automation/CI also to test various parameters. > > > Without knowing much about OCC, I'll guess you're doing this so > > you can update the OCC at runtime without having to update firmware > > before each IPL? > > Yep, you got the use case right. These OCC and PState parameters and > tunings can be tested and later rolled up into the firmware. > > > I guess we should always keep in mind fast reboot should match IPL > > as closely as possible and any undetected deviations are a pretty > > serious flaw. (e.g., you mess up your OCC state and want to return > > to normal, you would reboot). > > We expect PState table to change for these tests. If something goes > wrong, OCC will crash or pull system to safe mode. Is there a way to mark this and disable fast reboot if that happens. I don't mean to single out this one patch. I just think it's a good time to think about exactly how we're going to deal with fast reboot, and how it would be used effectively if we enable it for end users. We need to be very conservative and be sure to mark any possible errors or inconsistencies in hardware or firmware for full IPL. Other modes for debugging and testing like you're doing here are fine too. No problem if we gate them with an nvram parameter. > No major change in > system configuration like cpu, memory, IO. If something really bad > happens, we will hang/checkstop and we will have to re-ipl to recover. > > If there was no OCC PState changes, then parsing it again should get > us exact state compared to Power-ON and hence no risk. > > > I'm just wondering, should this be under an nvram option? > > The risk actually depends on what we ask OCC to do and hence not > a major config change/risk for OPAL. I would like to leave it as > default for fast-reboot. We add slight time factor to rebuild the > relevant device tree. Given that fast-reboot itself is experimental, > this is a acceptable risk and overhead. > > If we hit new error/fail scenarios in future, we can add settings. > I would like to roll this into a single knob for fast-reboot like > "safe", "risky" or something that can help us choose what we want to > do in reinit path. We need not call out OCC reinit as explicit nvram > option at this time. Thanks for the background. The patch is fine in principle from me, I'm not a fast reboot or OCC expert so my ack would not mean much :) But we should be thinking about a coherent way to manage fast reboot behaviour. Thanks, Nick
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes: >> I'm just wondering, should this be under an nvram option? > > The risk actually depends on what we ask OCC to do and hence not > a major config change/risk for OPAL. I would like to leave it as > default for fast-reboot. We add slight time factor to rebuild the > relevant device tree. Given that fast-reboot itself is experimental, > this is a acceptable risk and overhead. > > If we hit new error/fail scenarios in future, we can add settings. > I would like to roll this into a single knob for fast-reboot like > "safe", "risky" or something that can help us choose what we want to > do in reinit path. We need not call out OCC reinit as explicit nvram > option at this time. It's a good question what we should do there... in a secure boot world, we're probably better off as in that case you shouldn't be able to go and poke things into the OCCs to do bad things. I wonder if we could get an OCC command to "reset back to what you'd be on IPL" or something?
On 2018-02-08 10:14, Stewart Smith wrote: > Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes: >>> I'm just wondering, should this be under an nvram option? >> >> The risk actually depends on what we ask OCC to do and hence not >> a major config change/risk for OPAL. I would like to leave it as >> default for fast-reboot. We add slight time factor to rebuild the >> relevant device tree. Given that fast-reboot itself is experimental, >> this is a acceptable risk and overhead. >> >> If we hit new error/fail scenarios in future, we can add settings. >> I would like to roll this into a single knob for fast-reboot like >> "safe", "risky" or something that can help us choose what we want to >> do in reinit path. We need not call out OCC reinit as explicit nvram >> option at this time. > > It's a good question what we should do there... in a secure boot world, > we're probably better off as in that case you shouldn't be able to go > and poke things into the OCCs to do bad things. > > I wonder if we could get an OCC command to "reset back to what you'd be > on IPL" or something? I think this is one of the example failure case where occ is still in disabled state after fast reboot. As shilpa mentioned we should be able to clear max_occ_reset_count and do the occ re-init. https://github.com/open-power/skiboot/issues/49
ppaidipe <ppaidipe@linux.vnet.ibm.com> writes: > On 2018-02-08 10:14, Stewart Smith wrote: >> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes: >>>> I'm just wondering, should this be under an nvram option? >>> >>> The risk actually depends on what we ask OCC to do and hence not >>> a major config change/risk for OPAL. I would like to leave it as >>> default for fast-reboot. We add slight time factor to rebuild the >>> relevant device tree. Given that fast-reboot itself is experimental, >>> this is a acceptable risk and overhead. >>> >>> If we hit new error/fail scenarios in future, we can add settings. >>> I would like to roll this into a single knob for fast-reboot like >>> "safe", "risky" or something that can help us choose what we want to >>> do in reinit path. We need not call out OCC reinit as explicit nvram >>> option at this time. >> >> It's a good question what we should do there... in a secure boot world, >> we're probably better off as in that case you shouldn't be able to go >> and poke things into the OCCs to do bad things. >> >> I wonder if we could get an OCC command to "reset back to what you'd be >> on IPL" or something? > > I think this is one of the example failure case where occ is still in > disabled state after fast reboot. As shilpa mentioned we should be able > to clear max_occ_reset_count and do the occ re-init. > https://github.com/open-power/skiboot/issues/49 Yeah, we should probably go down the route of doing that (perhaps we should do that no matter what? I'm not sure, we'll need to talk to the OCC team to work out what's the best/correct thing to do). I've merged the patch to master as of 85a1de35cbe47618e9d4104880f182121806af4d as it seems to be something decent at least for the time being... although I'm not convinced we shouldn't either disable it or go through a larger re-init cycle with the OCC....
diff --git a/core/init.c b/core/init.c index ec9f329..92c79af 100644 --- a/core/init.c +++ b/core/init.c @@ -486,6 +486,8 @@ void __noreturn load_and_boot_kernel(bool is_reboot) ipmi_set_fw_progress_sensor(IPMI_FW_OS_BOOT); + occ_pstates_init(); + if (!is_reboot) { /* We wait for the nvram read to complete here so we can * grab stuff from there such as the kernel arguments @@ -499,7 +501,6 @@ void __noreturn load_and_boot_kernel(bool is_reboot) * OCC takes few secs to boot. Call this as late as * as possible to avoid delay. */ - occ_pstates_init(); occ_sensors_init(); } else { diff --git a/hw/occ.c b/hw/occ.c index f3f1231..fb7e683 100644 --- a/hw/occ.c +++ b/hw/occ.c @@ -1556,8 +1556,24 @@ void occ_pstates_init(void) if (proc_gen < proc_gen_p8) return; /* Handle fast reboots */ - if (occ_pstates_initialized) - return; + if (occ_pstates_initialized) { + struct dt_node *power_mgt; + int i; + const char *props[] = { + "ibm,pstate-core-max", + "ibm,pstate-frequencies-mhz", + "ibm,pstate-ids", + "ibm,pstate-max", + "ibm,pstate-min", + "ibm,pstate-nominal", + "ibm,pstate-turbo", + "ibm,pstate-ultra-turbo", + }; + + power_mgt = dt_find_by_path(dt_root, "/ibm,opal/power-mgt"); + for (i = 0; i < ARRAY_SIZE(props); i++) + dt_check_del_prop(power_mgt, props[i]); + } switch (proc_gen) { case proc_gen_p8: @@ -1605,6 +1621,9 @@ void occ_pstates_init(void) cpu_pstates_prepare_core(chip, c, pstate_nom); } + if (occ_pstates_initialized) + return; + /* Add opal_poller to poll OCC throttle status of each chip */ for_each_chip(chip) chip->throttle = 0;