Message ID | 20170810065843.13893-1-ruscur@russell.cc |
---|---|
State | Accepted |
Headers | show |
On 10/08/17 16:58, Russell Currey wrote: > This fixes an unlikely but possible assert() fail on kdump. > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: Russell Currey <ruscur@russell.cc> Where does the assert() happen? Regardless this looks like a pretty reasonable thing to check, so... Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > --- > hw/xive.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/hw/xive.c b/hw/xive.c > index 03b9478e..18636723 100644 > --- a/hw/xive.c > +++ b/hw/xive.c > @@ -601,6 +601,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx, > > /* PIR case */ > if (((vp >> 30) & 1) == 0) { > + if (find_cpu_by_pir(index) == NULL) > + return false; > if (blk) > *blk = PIR2VP_BLK(index); > if (idx) > @@ -656,6 +658,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx, > > /* PIR case */ > if (((vp >> 30) & 1) == 0) { > + if (find_cpu_by_pir(index) == NULL) > + return false; > if (blk) > *blk = PIR2VP_BLK(index); > if (idx) >
On 10/08/17 16:58, Russell Currey wrote: > If a PHB is marked broken it didn't work on boot, and if it didn't work > on boot then there's no point trying to recover it later Can't a PHB also end up in BROKEN when a creset fails? Per the error: label in phb4_creset()... > > Signed-off-by: Russell Currey <ruscur@russell.cc> In any case this seems sensible. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Would phb3 benefit from a similar change? > --- > hw/phb4.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/hw/phb4.c b/hw/phb4.c > index 7e8f68bf..012a8cdc 100644 > --- a/hw/phb4.c > +++ b/hw/phb4.c > @@ -2608,6 +2608,10 @@ static int64_t phb4_creset(struct pci_slot *slot) > struct phb4 *p = phb_to_phb4(slot->phb); > uint64_t pbcq_status, reg; > > + /* Don't even try fixing a broken PHB */ > + if (p->state == PHB4_STATE_BROKEN) > + return OPAL_HARDWARE; > + > switch (slot->state) { > case PHB4_SLOT_NORMAL: > case PHB4_SLOT_CRESET_START: >
On 10/08/17 16:58, Russell Currey wrote: > phb4_creset() is typically called by functions that prepare the link > to go down. In cases where creset() is called directly by the kernel, > this isn't the case and it can cause issues. Prepare for link down in > creset, just like we do in freset and hreset. > > Signed-off-by: Russell Currey <ruscur@russell.cc> Looks reasonable Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > --- > hw/phb4.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/phb4.c b/hw/phb4.c > index 012a8cdc..b467e369 100644 > --- a/hw/phb4.c > +++ b/hw/phb4.c > @@ -2623,6 +2623,7 @@ static int64_t phb4_creset(struct pci_slot *slot) > do_capp_recovery_scoms(p); > #endif > > + phb4_prepare_link_change(slot, false); > /* Clear error inject register, preventing recursive errors */ > xscom_write(p->chip_id, p->pe_xscom + 0x2, 0x0); > >
On 10/08/17 16:58, Russell Currey wrote: > If a PHB is being completely reset, its state is about to be blown away > anyway, so if it's not in an appropriate state, creset it regardless. > > Signed-off-by: Russell Currey <ruscur@russell.cc> Though hitting this path is probably still a bug somewhere? Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> > --- > hw/phb4.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/hw/phb4.c b/hw/phb4.c > index b467e369..d13ab8c3 100644 > --- a/hw/phb4.c > +++ b/hw/phb4.c > @@ -2700,8 +2700,11 @@ static int64_t phb4_creset(struct pci_slot *slot) > pci_slot_set_state(slot, PHB4_SLOT_NORMAL); > return slot->ops.freset(slot); > default: > - PHBERR(p, "CRESET: Unexpected slot state %08x\n", > + PHBERR(p, "CRESET: Unexpected slot state %08x, resetting...\n", > slot->state); > + pci_slot_set_state(slot, PHB4_SLOT_NORMAL); > + return slot->ops.creset(slot); > + > } > > error: >
On 10/08/17 16:58, Russell Currey wrote: > These registers are supposed to be 16bit, and it makes part of the > register dump misleading. > > Signed-off-by: Russell Currey <ruscur@russell.cc> > --- > hw/phb4.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/hw/phb4.c b/hw/phb4.c > index d13ab8c3..4c7bfe3b 100644 > --- a/hw/phb4.c > +++ b/hw/phb4.c > @@ -1906,7 +1906,7 @@ static void phb4_read_phb_status(struct phb4 *p, > static void phb4_eeh_dump_regs(struct phb4 *p) > { > struct OpalIoPhb4ErrorData *s; > - uint32_t reg; > + uint16_t reg; > unsigned int i; > > if (!verbose_eeh) > @@ -1929,9 +1929,9 @@ static void phb4_eeh_dump_regs(struct phb4 *p) > PHBERR(p, " uncorrErrorStatus = %08x\n", s->uncorrErrorStatus); > > /* Two non OPAL API registers that are useful */ > - phb4_pcicfg_read32(&p->phb, 0, p->ecap + PCICAP_EXP_DEVCTL, ®); > + phb4_pcicfg_read16(&p->phb, 0, p->ecap + PCICAP_EXP_DEVCTL, ®); > PHBERR(p, " devctl = %08x\n", reg); > - phb4_pcicfg_read32(&p->phb, 0, p->ecap + PCICAP_EXP_DEVSTAT, > + phb4_pcicfg_read16(&p->phb, 0, p->ecap + PCICAP_EXP_DEVSTAT, > ®); > PHBERR(p, " devStat = %08x\n", reg); Do these need to be changed to %04x?
Russell Currey <ruscur@russell.cc> writes: > This fixes an unlikely but possible assert() fail on kdump. > > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: Russell Currey <ruscur@russell.cc> Series merged to master as of 8bdfaa24f88bd8800385a6997bb178dfdd970648
diff --git a/hw/xive.c b/hw/xive.c index 03b9478e..18636723 100644 --- a/hw/xive.c +++ b/hw/xive.c @@ -601,6 +601,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx, /* PIR case */ if (((vp >> 30) & 1) == 0) { + if (find_cpu_by_pir(index) == NULL) + return false; if (blk) *blk = PIR2VP_BLK(index); if (idx) @@ -656,6 +658,8 @@ static bool xive_decode_vp(uint32_t vp, uint32_t *blk, uint32_t *idx, /* PIR case */ if (((vp >> 30) & 1) == 0) { + if (find_cpu_by_pir(index) == NULL) + return false; if (blk) *blk = PIR2VP_BLK(index); if (idx)