Message ID | 20191127044758.32596-1-oohall@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | witherspoon: Squash spurious I2C errors | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch master (d75e82dbfbb9443efeb3f9a5921ac23605aab469) |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot | success | Test snowpatch/job/snowpatch-skiboot on branch master |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco | success | Signed-off-by present |
On 27/11/19 3:47 pm, Oliver O'Halloran wrote: > On witherspoon there's an I2C bus connecting the each chip to the GPUs > connected to that chip via NVLink. That bus has a 750kOhm pullup on the > system planar with prevents the bus from operating correctly. > > Each GPU has a smaller pullup which makes the bus usable when a GPU is > plugged in, but on systems without GPUs we get a lot of spurious I2C > master errors. Specificly, because of the oversized pullup the SDA and > SCL for that bus cannot return to '1' fast enough, so the master assumes > that another master is driving the bus and that it should stop. > > Work around this by disabling the affected port when there's no GPUs > detected in the system. > > Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Looks reasonable. Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> > --- > Tested on a system with a GPU on each socket and on a system with none. > Couldn't find a mixed system. > --- > platforms/astbmc/witherspoon.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c > index edf84fb89601..081996684603 100644 > --- a/platforms/astbmc/witherspoon.c > +++ b/platforms/astbmc/witherspoon.c > @@ -465,6 +465,7 @@ static void npu2_phb_nvlink_dt(struct phb *npuphb) > static void witherspoon_finalise_dt(bool is_reboot) > { > struct dt_node *np; > + struct proc_chip *c; > > if (is_reboot) > return; > @@ -479,6 +480,30 @@ static void witherspoon_finalise_dt(bool is_reboot) > continue; > npu2_phb_nvlink_dt(npphb); > } > + > + /* > + * The I2C bus on used to talk to the GPUs has a 750K pullup > + * which is way too big. If there's no GPUs connected to the > + * chip all I2C transactions fail with an Arb loss error since > + * SCL/SDA don't return to the idle state fast enough. Disable > + * the port to squash the errors. > + */ > + for (c = next_chip(0); c; c = next_chip(c)) { > + bool detected = false; > + int i; > + > + np = dt_find_by_path(c->devnode, "i2cm@a1000/i2c-bus@4"); > + if (!np) > + continue; > + > + for (i = 0; i < 3; i++) > + detected |= occ_get_gpu_presence(c, i); > + > + if (!detected) { > + dt_check_del_prop(np, "status"); > + dt_add_property_string(np, "status", "disabled"); > + } > + } > } > > /* The only difference between these is the PCI slot handling */ >
On Wed, Nov 27, 2019 at 3:48 PM Oliver O'Halloran <oohall@gmail.com> wrote: > > On witherspoon there's an I2C bus connecting the each chip to the GPUs > connected to that chip via NVLink. That bus has a 750kOhm pullup on the > system planar with prevents the bus from operating correctly. > > Each GPU has a smaller pullup which makes the bus usable when a GPU is > plugged in, but on systems without GPUs we get a lot of spurious I2C > master errors. Specificly, because of the oversized pullup the SDA and > SCL for that bus cannot return to '1' fast enough, so the master assumes > that another master is driving the bus and that it should stop. > > Work around this by disabling the affected port when there's no GPUs > detected in the system. > > Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Merged as dc85bd46d20db4d878216283f818573cb0c8c05b
diff --git a/platforms/astbmc/witherspoon.c b/platforms/astbmc/witherspoon.c index edf84fb89601..081996684603 100644 --- a/platforms/astbmc/witherspoon.c +++ b/platforms/astbmc/witherspoon.c @@ -465,6 +465,7 @@ static void npu2_phb_nvlink_dt(struct phb *npuphb) static void witherspoon_finalise_dt(bool is_reboot) { struct dt_node *np; + struct proc_chip *c; if (is_reboot) return; @@ -479,6 +480,30 @@ static void witherspoon_finalise_dt(bool is_reboot) continue; npu2_phb_nvlink_dt(npphb); } + + /* + * The I2C bus on used to talk to the GPUs has a 750K pullup + * which is way too big. If there's no GPUs connected to the + * chip all I2C transactions fail with an Arb loss error since + * SCL/SDA don't return to the idle state fast enough. Disable + * the port to squash the errors. + */ + for (c = next_chip(0); c; c = next_chip(c)) { + bool detected = false; + int i; + + np = dt_find_by_path(c->devnode, "i2cm@a1000/i2c-bus@4"); + if (!np) + continue; + + for (i = 0; i < 3; i++) + detected |= occ_get_gpu_presence(c, i); + + if (!detected) { + dt_check_del_prop(np, "status"); + dt_add_property_string(np, "status", "disabled"); + } + } } /* The only difference between these is the PCI slot handling */
On witherspoon there's an I2C bus connecting the each chip to the GPUs connected to that chip via NVLink. That bus has a 750kOhm pullup on the system planar with prevents the bus from operating correctly. Each GPU has a smaller pullup which makes the bus usable when a GPU is plugged in, but on systems without GPUs we get a lot of spurious I2C master errors. Specificly, because of the oversized pullup the SDA and SCL for that bus cannot return to '1' fast enough, so the master assumes that another master is driving the bus and that it should stop. Work around this by disabling the affected port when there's no GPUs detected in the system. Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- Tested on a system with a GPU on each socket and on a system with none. Couldn't find a mixed system. --- platforms/astbmc/witherspoon.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)