Message ID | 543316e7a0efa5d60fe6196d4aa1ed6a5cbef9e5.1535359753.git-series.andrew.donnellan@au1.ibm.com |
---|---|
State | Superseded |
Headers | show |
Series | OpenCAPI support for Witherspoon | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | master/apply_patch Successfully applied |
Le 27/08/2018 à 10:55, Andrew Donnellan a écrit : > It takes a few seconds for the OCC to set everything up in order to read > GPU presence. At present, we try to kick off OCC initialisation as early as > possible to maximise the time it has to read GPU presence. > > Unfortunately sometimes that's not enough, so add a loop in > occ_get_gpu_presence() so that on the first time we try to get GPU presence > we keep trying for up to 2 seconds. Experimentally this seems to be > adequate. > > Fixes: 9b394a32c8ea ("occ: Add support for GPU presence detection") > Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > --- > hw/occ.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/hw/occ.c b/hw/occ.c > index a55bf8ed4f54..9fcac3f9581c 100644 > --- a/hw/occ.c > +++ b/hw/occ.c > @@ -1238,14 +1238,26 @@ exit: > bool occ_get_gpu_presence(struct proc_chip *chip, int gpu_num) > { > struct occ_dynamic_data *ddata; > + static int max_retries = 20; > + static bool found = false; > > assert(gpu_num <= 2); > > ddata = get_occ_dynamic_data(chip); > - > - if (ddata->major_version != 0 || ddata->minor_version < 1) { > + while (!found && max_retries) { > + if (ddata->major_version == 0 && ddata->minor_version >= 1) { > + found = true; > + break; > + } > prlog(PR_INFO, "OCC: OCC not reporting GPU slot presence, " > - "assuming device is present\n"); > + "waiting\n"); Do we really want to print up to 20 times the same message? Other than that: Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> > + time_wait_ms(100); > + max_retries--; > + ddata = get_occ_dynamic_data(chip); > + } > + > + if (!found) { > + prlog(PR_INFO, "OCC: No GPU slot presence, assuming GPU present\n"); > return true; > } >
On 29/08/18 22:44, Frederic Barrat wrote:>> prlog(PR_INFO, "OCC: OCC not reporting GPU slot presence, " >> - "assuming device is present\n"); >> + "waiting\n"); > > Do we really want to print up to 20 times the same message? Argh, Rashmica had pointed that out to me even before I sent v1 and I forgot to fix it :)
diff --git a/hw/occ.c b/hw/occ.c index a55bf8ed4f54..9fcac3f9581c 100644 --- a/hw/occ.c +++ b/hw/occ.c @@ -1238,14 +1238,26 @@ exit: bool occ_get_gpu_presence(struct proc_chip *chip, int gpu_num) { struct occ_dynamic_data *ddata; + static int max_retries = 20; + static bool found = false; assert(gpu_num <= 2); ddata = get_occ_dynamic_data(chip); - - if (ddata->major_version != 0 || ddata->minor_version < 1) { + while (!found && max_retries) { + if (ddata->major_version == 0 && ddata->minor_version >= 1) { + found = true; + break; + } prlog(PR_INFO, "OCC: OCC not reporting GPU slot presence, " - "assuming device is present\n"); + "waiting\n"); + time_wait_ms(100); + max_retries--; + ddata = get_occ_dynamic_data(chip); + } + + if (!found) { + prlog(PR_INFO, "OCC: No GPU slot presence, assuming GPU present\n"); return true; }
It takes a few seconds for the OCC to set everything up in order to read GPU presence. At present, we try to kick off OCC initialisation as early as possible to maximise the time it has to read GPU presence. Unfortunately sometimes that's not enough, so add a loop in occ_get_gpu_presence() so that on the first time we try to get GPU presence we keep trying for up to 2 seconds. Experimentally this seems to be adequate. Fixes: 9b394a32c8ea ("occ: Add support for GPU presence detection") Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> --- hw/occ.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)