fast-boot: occ: Re-parse the pstate table during fast-boot

Message ID	1517554952-23789-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com
State	Accepted
Headers	show Return-Path: <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org> Gateway: Authorized Use Only! Violators will be prosecuted for <skiboot@lists.ozlabs.org> from <shilpa.bhat@linux.vnet.ibm.com>; Fri, 2 Feb 2018 07:02:39 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 2 Feb 2018 07:02:37 -0000 From: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> To: skiboot@lists.ozlabs.org Date: Fri, 2 Feb 2018 12:32:32 +0530 Message-Id: <1517554952-23789-1-git-send-email-shilpa.bhat@linux.vnet.ibm.com> Subject: [Skiboot] [PATCH] fast-boot: occ: Re-parse the pstate table during fast-boot Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" <skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org>
Series	fast-boot: occ: Re-parse the pstate table during fast-boot \| expand fast-boot: occ: Re-parse the pstate table during fast-boot

Shilpasri G Bhat Feb. 2, 2018, 7:02 a.m. UTC

OCC shares the frequency list to host by copying the pstate table to
main memory in HOMER. This table is parsed during boot to create
device-tree properties for frequency and pstate IDs. OCC can update
the pstate table to present a new set of frequencies to the host. But
host will remain oblivious to these changes unless it is re-inited
with the updated device-tree CPU frequency properties. So this patch
allows to re-parse the pstate table and update the device-tree
properties during fast-reboot.

OCC updates the pstate table when asked to do so using pstate-table
bias command. And this is mainly used by WOF team for
characterization purposes.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
---
 core/init.c |  3 ++-
 hw/occ.c    | 23 +++++++++++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

Nicholas Piggin Feb. 3, 2018, 6:24 a.m. UTC | #1

On Fri,  2 Feb 2018 12:32:32 +0530
Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:

> OCC shares the frequency list to host by copying the pstate table to
> main memory in HOMER. This table is parsed during boot to create
> device-tree properties for frequency and pstate IDs. OCC can update
> the pstate table to present a new set of frequencies to the host. But
> host will remain oblivious to these changes unless it is re-inited
> with the updated device-tree CPU frequency properties. So this patch
> allows to re-parse the pstate table and update the device-tree
> properties during fast-reboot.
> 
> OCC updates the pstate table when asked to do so using pstate-table
> bias command. And this is mainly used by WOF team for
> characterization purposes.

Would this ever be used in production, I'm guessing not? I don't
think that's a bad thing as such -- designing for test is always
good. Perhaps a comment though to explain why you're re-parsing
it.

Without knowing much about OCC, I'll guess you're doing this so
you can update the OCC at runtime without having to update firmware
before each IPL?

I guess we should always keep in mind fast reboot should match IPL
as closely as possible and any undetected deviations are a pretty
serious flaw. (e.g., you mess up your OCC state and want to return
to normal, you would reboot).

I'm just wondering, should this be under an nvram option?

Thanks,
Nick

Vaidyanathan Srinivasan Feb. 3, 2018, 6:57 a.m. UTC | #2

* Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]:

> 
> On Fri,  2 Feb 2018 12:32:32 +0530
> Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> 
> > OCC shares the frequency list to host by copying the pstate table to
> > main memory in HOMER. This table is parsed during boot to create
> > device-tree properties for frequency and pstate IDs. OCC can update
> > the pstate table to present a new set of frequencies to the host. But
> > host will remain oblivious to these changes unless it is re-inited
> > with the updated device-tree CPU frequency properties. So this patch
> > allows to re-parse the pstate table and update the device-tree
> > properties during fast-reboot.
> > 
> > OCC updates the pstate table when asked to do so using pstate-table
> > bias command. And this is mainly used by WOF team for
> > characterization purposes.
> 
> Would this ever be used in production, I'm guessing not? I don't
> think that's a bad thing as such -- designing for test is always
> good. Perhaps a comment though to explain why you're re-parsing
> it.

Never say never :)

At this time this facility is to enabling tooling to set OCC parameter
at runtime and test the system without encoding all parameters and
building a PNOR and dependent components.

This enables a very efficient workflow with just a OCC reboot and
fast-reboot on OPAL+Linux and we are back in about 2 minutes.  This
can be leveraged in automation/CI also to test various parameters.

> Without knowing much about OCC, I'll guess you're doing this so
> you can update the OCC at runtime without having to update firmware
> before each IPL?

Yep, you got the use case right.  These OCC and PState parameters and
tunings can be tested and later rolled up into the firmware.

> I guess we should always keep in mind fast reboot should match IPL
> as closely as possible and any undetected deviations are a pretty
> serious flaw. (e.g., you mess up your OCC state and want to return
> to normal, you would reboot).

We expect PState table to change for these tests.  If something goes
wrong, OCC will crash or pull system to safe mode.  No major change in
system configuration like cpu, memory, IO.  If something really bad
happens, we will hang/checkstop and we will have to re-ipl to recover.

If there was no OCC PState changes, then parsing it again should get
us exact state compared to Power-ON and hence no risk.

> I'm just wondering, should this be under an nvram option?

The risk actually depends on what we ask OCC to do and hence not
a major config change/risk for OPAL.  I would like to leave it as
default for fast-reboot.  We add slight time factor to rebuild the
relevant device tree.  Given that fast-reboot itself is experimental,
this is a acceptable risk and overhead.

If we hit new error/fail scenarios in future, we can add settings.
I would like to roll this into a single knob for fast-reboot like
"safe", "risky" or something that can help us choose what we want to
do in reinit path.  We need not call out OCC reinit as explicit nvram
option at this time.

--Vaidy

Nicholas Piggin Feb. 3, 2018, 7:59 a.m. UTC | #3

On Sat, 3 Feb 2018 12:27:02 +0530
Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> wrote:

> * Nicholas Piggin <npiggin@gmail.com> [2018-02-03 16:24:25]:
> 
> > 
> > On Fri,  2 Feb 2018 12:32:32 +0530
> > Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote:
> >   
> > > OCC shares the frequency list to host by copying the pstate table to
> > > main memory in HOMER. This table is parsed during boot to create
> > > device-tree properties for frequency and pstate IDs. OCC can update
> > > the pstate table to present a new set of frequencies to the host. But
> > > host will remain oblivious to these changes unless it is re-inited
> > > with the updated device-tree CPU frequency properties. So this patch
> > > allows to re-parse the pstate table and update the device-tree
> > > properties during fast-reboot.
> > > 
> > > OCC updates the pstate table when asked to do so using pstate-table
> > > bias command. And this is mainly used by WOF team for
> > > characterization purposes.  
> > 
> > Would this ever be used in production, I'm guessing not? I don't
> > think that's a bad thing as such -- designing for test is always
> > good. Perhaps a comment though to explain why you're re-parsing
> > it.  
> 
> Never say never :)
> 
> At this time this facility is to enabling tooling to set OCC parameter
> at runtime and test the system without encoding all parameters and
> building a PNOR and dependent components.
> 
> This enables a very efficient workflow with just a OCC reboot and
> fast-reboot on OPAL+Linux and we are back in about 2 minutes.  This
> can be leveraged in automation/CI also to test various parameters.
> 
> > Without knowing much about OCC, I'll guess you're doing this so
> > you can update the OCC at runtime without having to update firmware
> > before each IPL?  
> 
> Yep, you got the use case right.  These OCC and PState parameters and
> tunings can be tested and later rolled up into the firmware.
>  
> > I guess we should always keep in mind fast reboot should match IPL
> > as closely as possible and any undetected deviations are a pretty
> > serious flaw. (e.g., you mess up your OCC state and want to return
> > to normal, you would reboot).  
> 
> We expect PState table to change for these tests.  If something goes
> wrong, OCC will crash or pull system to safe mode.

Is there a way to mark this and disable fast reboot if that happens.

I don't mean to single out this one patch. I just think it's a good
time to think about exactly how we're going to deal with fast reboot,
and how it would be used effectively if we enable it for end users.

We need to be very conservative and be sure to mark any possible errors
or inconsistencies in hardware or firmware for full IPL.

Other modes for debugging and testing like you're doing here are fine
too. No problem if we gate them with an nvram parameter.

>  No major change in
> system configuration like cpu, memory, IO.  If something really bad
> happens, we will hang/checkstop and we will have to re-ipl to recover.
> 
> If there was no OCC PState changes, then parsing it again should get
> us exact state compared to Power-ON and hence no risk.
> 
> > I'm just wondering, should this be under an nvram option?  
> 
> The risk actually depends on what we ask OCC to do and hence not
> a major config change/risk for OPAL.  I would like to leave it as
> default for fast-reboot.  We add slight time factor to rebuild the
> relevant device tree.  Given that fast-reboot itself is experimental,
> this is a acceptable risk and overhead.
> 
> If we hit new error/fail scenarios in future, we can add settings.
> I would like to roll this into a single knob for fast-reboot like
> "safe", "risky" or something that can help us choose what we want to
> do in reinit path.  We need not call out OCC reinit as explicit nvram
> option at this time.

Thanks for the background. The patch is fine in principle from me, I'm
not a fast reboot or OCC expert so my ack would not mean much :) But we
should be thinking about a coherent way to manage fast reboot behaviour.

Thanks,
Nick

Stewart Smith Feb. 8, 2018, 4:44 a.m. UTC | #4

Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>> I'm just wondering, should this be under an nvram option?
>
> The risk actually depends on what we ask OCC to do and hence not
> a major config change/risk for OPAL.  I would like to leave it as
> default for fast-reboot.  We add slight time factor to rebuild the
> relevant device tree.  Given that fast-reboot itself is experimental,
> this is a acceptable risk and overhead.
>
> If we hit new error/fail scenarios in future, we can add settings.
> I would like to roll this into a single knob for fast-reboot like
> "safe", "risky" or something that can help us choose what we want to
> do in reinit path.  We need not call out OCC reinit as explicit nvram
> option at this time.

It's a good question what we should do there... in a secure boot world,
we're probably better off as in that case you shouldn't be able to go
and poke things into the OCCs to do bad things.

I wonder if we could get an OCC command to "reset back to what you'd be
on IPL" or something?

ppaidipe Feb. 8, 2018, 5:15 a.m. UTC | #5

On 2018-02-08 10:14, Stewart Smith wrote:
> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>>> I'm just wondering, should this be under an nvram option?
>> 
>> The risk actually depends on what we ask OCC to do and hence not
>> a major config change/risk for OPAL.  I would like to leave it as
>> default for fast-reboot.  We add slight time factor to rebuild the
>> relevant device tree.  Given that fast-reboot itself is experimental,
>> this is a acceptable risk and overhead.
>> 
>> If we hit new error/fail scenarios in future, we can add settings.
>> I would like to roll this into a single knob for fast-reboot like
>> "safe", "risky" or something that can help us choose what we want to
>> do in reinit path.  We need not call out OCC reinit as explicit nvram
>> option at this time.
> 
> It's a good question what we should do there... in a secure boot world,
> we're probably better off as in that case you shouldn't be able to go
> and poke things into the OCCs to do bad things.
> 
> I wonder if we could get an OCC command to "reset back to what you'd be
> on IPL" or something?

I think this is one of the example failure case where occ is still in
disabled state after fast reboot. As shilpa mentioned we should be able
to clear max_occ_reset_count and do the occ re-init.
https://github.com/open-power/skiboot/issues/49

Stewart Smith Feb. 9, 2018, 5:09 a.m. UTC | #6

ppaidipe <ppaidipe@linux.vnet.ibm.com> writes:
> On 2018-02-08 10:14, Stewart Smith wrote:
>> Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> writes:
>>>> I'm just wondering, should this be under an nvram option?
>>> 
>>> The risk actually depends on what we ask OCC to do and hence not
>>> a major config change/risk for OPAL.  I would like to leave it as
>>> default for fast-reboot.  We add slight time factor to rebuild the
>>> relevant device tree.  Given that fast-reboot itself is experimental,
>>> this is a acceptable risk and overhead.
>>> 
>>> If we hit new error/fail scenarios in future, we can add settings.
>>> I would like to roll this into a single knob for fast-reboot like
>>> "safe", "risky" or something that can help us choose what we want to
>>> do in reinit path.  We need not call out OCC reinit as explicit nvram
>>> option at this time.
>> 
>> It's a good question what we should do there... in a secure boot world,
>> we're probably better off as in that case you shouldn't be able to go
>> and poke things into the OCCs to do bad things.
>> 
>> I wonder if we could get an OCC command to "reset back to what you'd be
>> on IPL" or something?
>
> I think this is one of the example failure case where occ is still in
> disabled state after fast reboot. As shilpa mentioned we should be able
> to clear max_occ_reset_count and do the occ re-init.
> https://github.com/open-power/skiboot/issues/49

Yeah, we should probably go down the route of doing that (perhaps we
should do that no matter what? I'm not sure, we'll need to talk to the
OCC team to work out what's the best/correct thing to do).

I've merged the patch to master as of
85a1de35cbe47618e9d4104880f182121806af4d as it seems to be something
decent at least for the time being... although I'm not convinced we
shouldn't either disable it or go through a larger re-init cycle with
the OCC....

fast-boot: occ: Re-parse the pstate table during fast-boot

Commit Message

Comments

Patch