Message ID | 20160927005152.17676-1-ruscur@russell.cc |
---|---|
State | Superseded |
Headers | show |
On Tue, Sep 27, 2016 at 10:51:52AM +1000, Russell Currey wrote: >Recent releases of skiboot contain a regression, causing certain adapters >behind certain switches to timeout waiting for links to come up. This is >due to a hot reset at boot time: > Russell, the hot reset issued to root port caused this issue if I understand everything here. I have two questions as below: - Is it always reproducible? What kind of adapter we have this issue? - There are usually multiple downstream ports (in the PCIe switch). We see just one link is down or two (possible multiple) links are down? >a) having to permeate down through all devices and >b) being potentially unsafe given the links aren't up yet. > >Avoid the hot reset at boot in phb3 and phb4. > >Cc: stable # 5.3.x >Signed-off-by: Russell Currey <ruscur@russell.cc> >--- > hw/phb3.c | 2 +- > hw/phb4.c | 10 +--------- > 2 files changed, 2 insertions(+), 10 deletions(-) > >diff --git a/hw/phb3.c b/hw/phb3.c >index d0b5010..b8e3439 100644 >--- a/hw/phb3.c >+++ b/hw/phb3.c >@@ -2240,7 +2240,7 @@ static int64_t phb3_pfreset(struct pci_slot *slot) > /* CAPP FPGA requires 1s to flash before polling link */ > return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); > case PHB3_SLOT_PFRESET_DEASSERT_DELAY: >- pci_slot_set_state(slot, PHB3_SLOT_HRESET_START); >+ pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY); > return slot->ops.hreset(slot); Yeah, I think we should avoid the hot reset during fundamental reset, but the code change as below might be more reasonable: pci_slot_set_state(slot, PHB3_SLOT_LINK_START); return slot->ops.poll_link(slot); Also, we have similar issues in firenze-pci.c::firenze_pci_slot_freset() and p7ioc-phb.c::p7ioc_freset(). It might be reasonable to remove all the hot resets in one patch or series of patches. > default: > PHBERR(p, "Unexpected slot state %08x\n", slot->state); >diff --git a/hw/phb4.c b/hw/phb4.c >index 385ce8c..746bfb7 100644 >--- a/hw/phb4.c >+++ b/hw/phb4.c >@@ -1951,16 +1951,8 @@ static int64_t phb4_pfreset(struct pci_slot *slot) > /* CAPP FPGA requires 1s to flash before polling link */ > return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); > case PHB4_SLOT_PFRESET_DEASSERT_DELAY: >-#if 0 /* PHB3 does a Hreset here. It's unnecessary I think and it's >- * causing problems with the simulator croc model so don't do >- * it until I figure out Gavin's reasons >- */ >- pci_slot_set_state(slot, PHB4_SLOT_HRESET_START); >+ pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY); > return slot->ops.hreset(slot); >-#else >- pci_slot_set_state(slot, PHB4_SLOT_LINK_START); >- return slot->ops.poll_link(slot); >-#endif > default: > PHBERR(p, "Unexpected slot state %08x\n", slot->state); > } >-- >2.10.0 > >_______________________________________________ >Skiboot mailing list >Skiboot@lists.ozlabs.org >https://lists.ozlabs.org/listinfo/skiboot
On Tue, 2016-09-27 at 12:29 +1000, Gavin Shan wrote: > On Tue, Sep 27, 2016 at 10:51:52AM +1000, Russell Currey wrote: > > > > Recent releases of skiboot contain a regression, causing certain adapters > > behind certain switches to timeout waiting for links to come up. This is > > due to a hot reset at boot time: > > > > Russell, the hot reset issued to root port caused this issue if I understand > everything here. I have two questions as below: > > - Is it always reproducible? What kind of adapter we have this issue? It is always reproducible on some (but not all) machines, it seems. > - There are usually multiple downstream ports (in the PCIe switch). We see > just one link is down or two (possible multiple) links are down? I think there are multiple links down but I'm not completely sure. > > > > a) having to permeate down through all devices and > > b) being potentially unsafe given the links aren't up yet. > > > > Avoid the hot reset at boot in phb3 and phb4. > > > > Cc: stable # 5.3.x > > Signed-off-by: Russell Currey <ruscur@russell.cc> > > --- > > hw/phb3.c | 2 +- > > hw/phb4.c | 10 +--------- > > 2 files changed, 2 insertions(+), 10 deletions(-) > > > > diff --git a/hw/phb3.c b/hw/phb3.c > > index d0b5010..b8e3439 100644 > > --- a/hw/phb3.c > > +++ b/hw/phb3.c > > @@ -2240,7 +2240,7 @@ static int64_t phb3_pfreset(struct pci_slot *slot) > > /* CAPP FPGA requires 1s to flash before polling link */ > > return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); > > case PHB3_SLOT_PFRESET_DEASSERT_DELAY: > > - pci_slot_set_state(slot, PHB3_SLOT_HRESET_START); > > + pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY); > > return slot->ops.hreset(slot); > > Yeah, I think we should avoid the hot reset during fundamental reset, but > the code change as below might be more reasonable: > > pci_slot_set_state(slot, PHB3_SLOT_LINK_START); > return slot->ops.poll_link(slot); I thought so as well but figured there wasn't any harm in deasserting any existing hreset. If that code isn't going to do anything then skipping to LINK_START makes more sense. > > Also, we have similar issues in firenze-pci.c::firenze_pci_slot_freset() and > p7ioc-phb.c::p7ioc_freset(). It might be reasonable to remove all the hot > resets in one patch or series of patches. Good catch. I tested this patch on a FSP and it didn't break anything, but I didn't have access to a FSP machine that had the bug. Without the firenze-pci.c change this patch might not fix the issue for those machines. > > > > > default: > > PHBERR(p, "Unexpected slot state %08x\n", slot->state); > > diff --git a/hw/phb4.c b/hw/phb4.c > > index 385ce8c..746bfb7 100644 > > --- a/hw/phb4.c > > +++ b/hw/phb4.c > > @@ -1951,16 +1951,8 @@ static int64_t phb4_pfreset(struct pci_slot *slot) > > /* CAPP FPGA requires 1s to flash before polling link */ > > return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); > > case PHB4_SLOT_PFRESET_DEASSERT_DELAY: > > -#if 0 /* PHB3 does a Hreset here. It's unnecessary I think and it's > > - * causing problems with the simulator croc model so don't do > > - * it until I figure out Gavin's reasons > > - */ > > - pci_slot_set_state(slot, PHB4_SLOT_HRESET_START); > > + pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY); > > return slot->ops.hreset(slot); > > -#else > > - pci_slot_set_state(slot, PHB4_SLOT_LINK_START); > > - return slot->ops.poll_link(slot); > > -#endif > > default: > > PHBERR(p, "Unexpected slot state %08x\n", slot->state); > > } > > -- > > 2.10.0 > > > > _______________________________________________ > > Skiboot mailing list > > Skiboot@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/skiboot
diff --git a/hw/phb3.c b/hw/phb3.c index d0b5010..b8e3439 100644 --- a/hw/phb3.c +++ b/hw/phb3.c @@ -2240,7 +2240,7 @@ static int64_t phb3_pfreset(struct pci_slot *slot) /* CAPP FPGA requires 1s to flash before polling link */ return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); case PHB3_SLOT_PFRESET_DEASSERT_DELAY: - pci_slot_set_state(slot, PHB3_SLOT_HRESET_START); + pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY); return slot->ops.hreset(slot); default: PHBERR(p, "Unexpected slot state %08x\n", slot->state); diff --git a/hw/phb4.c b/hw/phb4.c index 385ce8c..746bfb7 100644 --- a/hw/phb4.c +++ b/hw/phb4.c @@ -1951,16 +1951,8 @@ static int64_t phb4_pfreset(struct pci_slot *slot) /* CAPP FPGA requires 1s to flash before polling link */ return pci_slot_set_sm_timeout(slot, secs_to_tb(1)); case PHB4_SLOT_PFRESET_DEASSERT_DELAY: -#if 0 /* PHB3 does a Hreset here. It's unnecessary I think and it's - * causing problems with the simulator croc model so don't do - * it until I figure out Gavin's reasons - */ - pci_slot_set_state(slot, PHB4_SLOT_HRESET_START); + pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY); return slot->ops.hreset(slot); -#else - pci_slot_set_state(slot, PHB4_SLOT_LINK_START); - return slot->ops.poll_link(slot); -#endif default: PHBERR(p, "Unexpected slot state %08x\n", slot->state); }
Recent releases of skiboot contain a regression, causing certain adapters behind certain switches to timeout waiting for links to come up. This is due to a hot reset at boot time: a) having to permeate down through all devices and b) being potentially unsafe given the links aren't up yet. Avoid the hot reset at boot in phb3 and phb4. Cc: stable # 5.3.x Signed-off-by: Russell Currey <ruscur@russell.cc> --- hw/phb3.c | 2 +- hw/phb4.c | 10 +--------- 2 files changed, 2 insertions(+), 10 deletions(-)