diff mbox

phb3/4: Don't perform a hot reset at boot

Message ID 20160927005152.17676-1-ruscur@russell.cc
State Superseded
Headers show

Commit Message

Russell Currey Sept. 27, 2016, 12:51 a.m. UTC
Recent releases of skiboot contain a regression, causing certain adapters
behind certain switches to timeout waiting for links to come up.  This is
due to a hot reset at boot time:

a) having to permeate down through all devices and
b) being potentially unsafe given the links aren't up yet.

Avoid the hot reset at boot in phb3 and phb4.

Cc: stable # 5.3.x
Signed-off-by: Russell Currey <ruscur@russell.cc>
---
 hw/phb3.c |  2 +-
 hw/phb4.c | 10 +---------
 2 files changed, 2 insertions(+), 10 deletions(-)

Comments

Gavin Shan Sept. 27, 2016, 2:29 a.m. UTC | #1
On Tue, Sep 27, 2016 at 10:51:52AM +1000, Russell Currey wrote:
>Recent releases of skiboot contain a regression, causing certain adapters
>behind certain switches to timeout waiting for links to come up.  This is
>due to a hot reset at boot time:
>

Russell, the hot reset issued to root port caused this issue if I understand
everything here. I have two questions as below:

- Is it always reproducible? What kind of adapter we have this issue?
- There are usually multiple downstream ports (in the PCIe switch). We see
  just one link is down or two (possible multiple) links are down?

>a) having to permeate down through all devices and
>b) being potentially unsafe given the links aren't up yet.
>
>Avoid the hot reset at boot in phb3 and phb4.
>
>Cc: stable # 5.3.x
>Signed-off-by: Russell Currey <ruscur@russell.cc>
>---
> hw/phb3.c |  2 +-
> hw/phb4.c | 10 +---------
> 2 files changed, 2 insertions(+), 10 deletions(-)
>
>diff --git a/hw/phb3.c b/hw/phb3.c
>index d0b5010..b8e3439 100644
>--- a/hw/phb3.c
>+++ b/hw/phb3.c
>@@ -2240,7 +2240,7 @@ static int64_t phb3_pfreset(struct pci_slot *slot)
> 		/* CAPP FPGA requires 1s to flash before polling link */
> 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
> 	case PHB3_SLOT_PFRESET_DEASSERT_DELAY:
>-		pci_slot_set_state(slot, PHB3_SLOT_HRESET_START);
>+		pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY);
> 		return slot->ops.hreset(slot);

Yeah, I think we should avoid the hot reset during fundamental reset, but
the code change as below might be more reasonable:

		pci_slot_set_state(slot, PHB3_SLOT_LINK_START);
                return slot->ops.poll_link(slot);

Also, we have similar issues in firenze-pci.c::firenze_pci_slot_freset() and
p7ioc-phb.c::p7ioc_freset(). It might be reasonable to remove all the hot
resets in one patch or series of patches.

> 	default:
> 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
>diff --git a/hw/phb4.c b/hw/phb4.c
>index 385ce8c..746bfb7 100644
>--- a/hw/phb4.c
>+++ b/hw/phb4.c
>@@ -1951,16 +1951,8 @@ static int64_t phb4_pfreset(struct pci_slot *slot)
> 		/* CAPP FPGA requires 1s to flash before polling link */
> 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
> 	case PHB4_SLOT_PFRESET_DEASSERT_DELAY:
>-#if 0 	/* PHB3 does a Hreset here. It's unnecessary I think and it's
>-	 * causing problems with the simulator croc model so don't do
>-	 * it until I figure out Gavin's reasons
>-	 */
>-		pci_slot_set_state(slot, PHB4_SLOT_HRESET_START);
>+		pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY);
> 		return slot->ops.hreset(slot);
>-#else
>-		pci_slot_set_state(slot, PHB4_SLOT_LINK_START);
>-		return slot->ops.poll_link(slot);
>-#endif
> 	default:
> 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
> 	}
>-- 
>2.10.0
>
>_______________________________________________
>Skiboot mailing list
>Skiboot@lists.ozlabs.org
>https://lists.ozlabs.org/listinfo/skiboot
Russell Currey Sept. 27, 2016, 3:28 a.m. UTC | #2
On Tue, 2016-09-27 at 12:29 +1000, Gavin Shan wrote:
> On Tue, Sep 27, 2016 at 10:51:52AM +1000, Russell Currey wrote:
> > 
> > Recent releases of skiboot contain a regression, causing certain adapters
> > behind certain switches to timeout waiting for links to come up.  This is
> > due to a hot reset at boot time:
> > 
> 
> Russell, the hot reset issued to root port caused this issue if I understand
> everything here. I have two questions as below:
> 
> - Is it always reproducible? What kind of adapter we have this issue?

It is always reproducible on some (but not all) machines, it seems.

> - There are usually multiple downstream ports (in the PCIe switch). We see
>   just one link is down or two (possible multiple) links are down?

I think there are multiple links down but I'm not completely sure.

> > 
> > a) having to permeate down through all devices and
> > b) being potentially unsafe given the links aren't up yet.
> > 
> > Avoid the hot reset at boot in phb3 and phb4.
> > 
> > Cc: stable # 5.3.x
> > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > ---
> > hw/phb3.c |  2 +-
> > hw/phb4.c | 10 +---------
> > 2 files changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/phb3.c b/hw/phb3.c
> > index d0b5010..b8e3439 100644
> > --- a/hw/phb3.c
> > +++ b/hw/phb3.c
> > @@ -2240,7 +2240,7 @@ static int64_t phb3_pfreset(struct pci_slot *slot)
> > 		/* CAPP FPGA requires 1s to flash before polling link */
> > 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
> > 	case PHB3_SLOT_PFRESET_DEASSERT_DELAY:
> > -		pci_slot_set_state(slot, PHB3_SLOT_HRESET_START);
> > +		pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY);
> > 		return slot->ops.hreset(slot);
> 
> Yeah, I think we should avoid the hot reset during fundamental reset, but
> the code change as below might be more reasonable:
> 
> 		pci_slot_set_state(slot, PHB3_SLOT_LINK_START);
>                 return slot->ops.poll_link(slot);

I thought so as well but figured there wasn't any harm in deasserting any
existing hreset.  If that code isn't going to do anything then skipping to
LINK_START makes more sense.

> 
> Also, we have similar issues in firenze-pci.c::firenze_pci_slot_freset() and
> p7ioc-phb.c::p7ioc_freset(). It might be reasonable to remove all the hot
> resets in one patch or series of patches.

Good catch.  I tested this patch on a FSP and it didn't break anything, but I
didn't have access to a FSP machine that had the bug.  Without the firenze-pci.c 
change this patch might not fix the issue for those machines.

> 
> > 
> > 	default:
> > 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
> > diff --git a/hw/phb4.c b/hw/phb4.c
> > index 385ce8c..746bfb7 100644
> > --- a/hw/phb4.c
> > +++ b/hw/phb4.c
> > @@ -1951,16 +1951,8 @@ static int64_t phb4_pfreset(struct pci_slot *slot)
> > 		/* CAPP FPGA requires 1s to flash before polling link */
> > 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
> > 	case PHB4_SLOT_PFRESET_DEASSERT_DELAY:
> > -#if 0 	/* PHB3 does a Hreset here. It's unnecessary I think and it's
> > -	 * causing problems with the simulator croc model so don't do
> > -	 * it until I figure out Gavin's reasons
> > -	 */
> > -		pci_slot_set_state(slot, PHB4_SLOT_HRESET_START);
> > +		pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY);
> > 		return slot->ops.hreset(slot);
> > -#else
> > -		pci_slot_set_state(slot, PHB4_SLOT_LINK_START);
> > -		return slot->ops.poll_link(slot);
> > -#endif
> > 	default:
> > 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
> > 	}
> > -- 
> > 2.10.0
> > 
> > _______________________________________________
> > Skiboot mailing list
> > Skiboot@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/skiboot
diff mbox

Patch

diff --git a/hw/phb3.c b/hw/phb3.c
index d0b5010..b8e3439 100644
--- a/hw/phb3.c
+++ b/hw/phb3.c
@@ -2240,7 +2240,7 @@  static int64_t phb3_pfreset(struct pci_slot *slot)
 		/* CAPP FPGA requires 1s to flash before polling link */
 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
 	case PHB3_SLOT_PFRESET_DEASSERT_DELAY:
-		pci_slot_set_state(slot, PHB3_SLOT_HRESET_START);
+		pci_slot_set_state(slot, PHB3_SLOT_HRESET_DELAY);
 		return slot->ops.hreset(slot);
 	default:
 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
diff --git a/hw/phb4.c b/hw/phb4.c
index 385ce8c..746bfb7 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -1951,16 +1951,8 @@  static int64_t phb4_pfreset(struct pci_slot *slot)
 		/* CAPP FPGA requires 1s to flash before polling link */
 		return pci_slot_set_sm_timeout(slot, secs_to_tb(1));
 	case PHB4_SLOT_PFRESET_DEASSERT_DELAY:
-#if 0 	/* PHB3 does a Hreset here. It's unnecessary I think and it's
-	 * causing problems with the simulator croc model so don't do
-	 * it until I figure out Gavin's reasons
-	 */
-		pci_slot_set_state(slot, PHB4_SLOT_HRESET_START);
+		pci_slot_set_state(slot, PHB4_SLOT_HRESET_DELAY);
 		return slot->ops.hreset(slot);
-#else
-		pci_slot_set_state(slot, PHB4_SLOT_LINK_START);
-		return slot->ops.poll_link(slot);
-#endif
 	default:
 		PHBERR(p, "Unexpected slot state %08x\n", slot->state);
 	}