Message ID | 4A5E23D0.9020906@us.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
2009/7/15 Mike Mason <mmlnx@us.ibm.com>: > By default, EEH does what's known as a "hot reset" during error recovery of > a PCI Express device. We've found a case where the device needs a > "fundamental reset" to recover properly. The current PCI error recovery and > EEH frameworks do not support this distinction. > > The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev > that indicates whether the device requires a fundamental reset during error > recovery. This bit can be checked by EEH to determine which reset type is > required. > > This patch supersedes the previously submitted patch that implemented a > reset type callback. > > Please review and let me know of any concerns. I like this patch a *lot* better .. it is vastly simpler, more direct. > diff -uNrp a/include/linux/pci.h b/include/linux/pci.h > --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 > +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 > @@ -273,6 +273,7 @@ struct pci_dev { > unsigned int ari_enabled:1; /* ARI forwarding */ > unsigned int is_managed:1; > unsigned int is_pcie:1; > + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset > */ > unsigned int state_saved:1; > unsigned int is_physfn:1; > unsigned int is_virtfn:1; As Ben points out, the name is awkward. How about needs_freset ? Since this affects the entire pci subsystem, it should be documented properly. The "pci error recovery" subsystem was designed to be usable in other architectures, and so the error recovery docs should take at least a paragraph to describe what this flag means, and when its supposed to be used. Providing the docs patch together with the pci.h patch *only* would probably simplify acceptance by the PCI community. --linas
Linas Vepstas <linasvepstas@gmail.com> wrote on 07/23/2009 07:44:33 AM: > 2009/7/15 Mike Mason <mmlnx@us.ibm.com>: > > By default, EEH does what's known as a "hot reset" during error recovery of > > a PCI Express device. We've found a case where the device needs a > > "fundamental reset" to recover properly. The current PCI error recovery and > > EEH frameworks do not support this distinction. > > > > The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev > > that indicates whether the device requires a fundamental reset during error > > recovery. This bit can be checked by EEH to determine which reset type is > > required. > > > > This patch supersedes the previously submitted patch that implemented a > > reset type callback. > > > > Please review and let me know of any concerns. > > I like this patch a *lot* better .. it is vastly simpler, more direct. > > > > diff -uNrp a/include/linux/pci.h b/include/linux/pci.h > > --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 > > +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 > > @@ -273,6 +273,7 @@ struct pci_dev { > > unsigned int ari_enabled:1; /* ARI forwarding */ > > unsigned int is_managed:1; > > unsigned int is_pcie:1; > > + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset > > */ > > unsigned int state_saved:1; > > unsigned int is_physfn:1; > > unsigned int is_virtfn:1; > > As Ben points out, the name is awkward. How about needs_freset ? I have no problem changing the name. > Since this affects the entire pci subsystem, it should be documented > properly. The "pci error recovery" subsystem was designed to be > usable in other architectures, and so the error recovery docs should > take at least a paragraph to describe what this flag means, and when > its supposed to be used. I will take a stab at updating the docs and post here for comment. > Providing the docs patch together with the pci.h patch *only* would > probably simplify acceptance by the PCI community. > > --linas
Linas Vepstas <linasvepstas@gmail.com> wrote on 07/23/2009 07:44:33 AM: > 2009/7/15 Mike Mason <mmlnx@us.ibm.com>: > > By default, EEH does what's known as a "hot reset" during error recovery of > > a PCI Express device. We've found a case where the device needs a > > "fundamental reset" to recover properly. The current PCI error recovery and > > EEH frameworks do not support this distinction. > > > > The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev > > that indicates whether the device requires a fundamental reset during error > > recovery. This bit can be checked by EEH to determine which reset type is > > required. > > > > This patch supersedes the previously submitted patch that implemented a > > reset type callback. > > > > Please review and let me know of any concerns. > > I like this patch a *lot* better .. it is vastly simpler, more direct. > > > > diff -uNrp a/include/linux/pci.h b/include/linux/pci.h > > --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 > > +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 > > @@ -273,6 +273,7 @@ struct pci_dev { > > unsigned int ari_enabled:1; /* ARI forwarding */ > > unsigned int is_managed:1; > > unsigned int is_pcie:1; > > + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset > > */ > > unsigned int state_saved:1; > > unsigned int is_physfn:1; > > unsigned int is_virtfn:1; > > As Ben points out, the name is awkward. How about needs_freset ? I am OK with name change. > Since this affects the entire pci subsystem, it should be documented > properly. The "pci error recovery" subsystem was designed to be > usable in other architectures, and so the error recovery docs should > take at least a paragraph to describe what this flag means, and when > its supposed to be used. I will update the documentation, are you referring to Documentation/powerpc/eeh-pci-error-recovery.txt or some other documentation? > Providing the docs patch together with the pci.h patch *only* would > probably simplify acceptance by the PCI community. > > --linas
2009/7/24 Richard Lary <rlary@us.ibm.com>: > Linas Vepstas <linasvepstas@gmail.com> wrote on 07/23/2009 07:44:33 AM: > >> 2009/7/15 Mike Mason <mmlnx@us.ibm.com>: >> > By default, EEH does what's known as a "hot reset" during error recovery >> > of >> > a PCI Express device. We've found a case where the device needs a >> > "fundamental reset" to recover properly. The current PCI error recovery >> > and >> > EEH frameworks do not support this distinction. >> > >> > The attached patch (courtesy of Richard Lary) adds a bit field to >> > pci_dev >> > that indicates whether the device requires a fundamental reset during >> > error >> > recovery. This bit can be checked by EEH to determine which reset type >> > is >> > required. >> > >> > This patch supersedes the previously submitted patch that implemented a >> > reset type callback. >> > >> > Please review and let me know of any concerns. >> >> I like this patch a *lot* better .. it is vastly simpler, more direct. >> >> >> > diff -uNrp a/include/linux/pci.h b/include/linux/pci.h >> > --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 >> > +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 >> > @@ -273,6 +273,7 @@ struct pci_dev { >> > unsigned int ari_enabled:1; /* ARI forwarding */ >> > unsigned int is_managed:1; >> > unsigned int is_pcie:1; >> > + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental >> > reset >> > */ >> > unsigned int state_saved:1; >> > unsigned int is_physfn:1; >> > unsigned int is_virtfn:1; >> >> As Ben points out, the name is awkward. How about needs_freset ? > > I am OK with name change. > > >> Since this affects the entire pci subsystem, it should be documented >> properly. The "pci error recovery" subsystem was designed to be >> usable in other architectures, and so the error recovery docs should >> take at least a paragraph to describe what this flag means, and when >> its supposed to be used. > > I will update the documentation, are you referring to > Documentation/powerpc/eeh-pci-error-recovery.txt > or some other documentation? No, I'm thinking Documentation/PCI/pci-error-recovery.txt because the flag is not powerpc-specific. --linas > >> Providing the docs patch together with the pci.h patch *only* would >> probably simplify acceptance by the PCI community. >> >> --linas >
Linas Vepstas <linasvepstas@gmail.com> wrote on 07/24/2009 05:30:09 PM: > 2009/7/24 Richard Lary <rlary@us.ibm.com>: > > Linas Vepstas <linasvepstas@gmail.com> wrote on 07/23/2009 07:44:33 AM: > > > >> 2009/7/15 Mike Mason <mmlnx@us.ibm.com>: > >> > By default, EEH does what's known as a "hot reset" during error recovery > >> > of > >> > a PCI Express device. We've found a case where the device needs a > >> > "fundamental reset" to recover properly. The current PCI error recovery > >> > and > >> > EEH frameworks do not support this distinction. > >> > > >> > The attached patch (courtesy of Richard Lary) adds a bit field to > >> > pci_dev > >> > that indicates whether the device requires a fundamental reset during > >> > error > >> > recovery. This bit can be checked by EEH to determine which reset type > >> > is > >> > required. > >> > > >> > This patch supersedes the previously submitted patch that implemented a > >> > reset type callback. > >> > > >> > Please review and let me know of any concerns. > >> > >> I like this patch a *lot* better .. it is vastly simpler, more direct. > >> > >> > >> > diff -uNrp a/include/linux/pci.h b/include/linux/pci.h > >> > --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 > >> > +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 > >> > @@ -273,6 +273,7 @@ struct pci_dev { > >> > unsigned int ari_enabled:1; /* ARI forwarding */ > >> > unsigned int is_managed:1; > >> > unsigned int is_pcie:1; > >> > + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental > >> > reset > >> > */ > >> > unsigned int state_saved:1; > >> > unsigned int is_physfn:1; > >> > unsigned int is_virtfn:1; > >> > >> As Ben points out, the name is awkward. How about needs_freset ? > > > > I am OK with name change. > > > > > >> Since this affects the entire pci subsystem, it should be documented > >> properly. The "pci error recovery" subsystem was designed to be > >> usable in other architectures, and so the error recovery docs should > >> take at least a paragraph to describe what this flag means, and when > >> its supposed to be used. > > > > I will update the documentation, are you referring to > > Documentation/powerpc/eeh-pci-error-recovery.txt > > or some other documentation? > > No, I'm thinking > Documentation/PCI/pci-error-recovery.txt > > because the flag is not powerpc-specific. Got it, glad I asked... -rich > > > >> Providing the docs patch together with the pci.h patch *only* would > >> probably simplify acceptance by the PCI community. > >> > >> --linas > >
diff -uNrp a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c --- a/arch/powerpc/kernel/pci_64.c 2009-07-13 14:25:24.000000000 -0700 +++ b/arch/powerpc/kernel/pci_64.c 2009-07-15 10:26:26.000000000 -0700 @@ -143,6 +143,7 @@ struct pci_dev *of_create_pci_dev(struct dev->dev.bus = &pci_bus_type; dev->devfn = devfn; dev->multifunction = 0; /* maybe a lie? */ + dev->fndmntl_rst_rqd = 0; /* pcie fundamental reset required */ dev->vendor = get_int_prop(node, "vendor-id", 0xffff); dev->device = get_int_prop(node, "device-id", 0xffff); diff -uNrp a/arch/powerpc/platforms/pseries/eeh.c b/arch/powerpc/platforms/pseries/eeh.c --- a/arch/powerpc/platforms/pseries/eeh.c 2009-06-09 20:05:27.000000000 -0700 +++ b/arch/powerpc/platforms/pseries/eeh.c 2009-07-15 10:29:04.000000000 -0700 @@ -744,7 +744,15 @@ int pcibios_set_pcie_reset_state(struct static void __rtas_set_slot_reset(struct pci_dn *pdn) { - rtas_pci_slot_reset (pdn, 1); + struct pci_dev *dev = pdn->pcidev; + + /* Determine type of EEH reset required by device, + * default hot reset or fundamental reset + */ + if (dev->fndmntl_rst_rqd) + rtas_pci_slot_reset(pdn, 3); + else + rtas_pci_slot_reset(pdn, 1); /* The PCI bus requires that the reset be held high for at least * a 100 milliseconds. We wait a bit longer 'just in case'. */ diff -uNrp a/include/linux/pci.h b/include/linux/pci.h --- a/include/linux/pci.h 2009-07-13 14:25:37.000000000 -0700 +++ b/include/linux/pci.h 2009-07-15 10:25:37.000000000 -0700 @@ -273,6 +273,7 @@ struct pci_dev { unsigned int ari_enabled:1; /* ARI forwarding */ unsigned int is_managed:1; unsigned int is_pcie:1; + unsigned int fndmntl_rst_rqd:1; /* Dev requires fundamental reset */ unsigned int state_saved:1; unsigned int is_physfn:1; unsigned int is_virtfn:1;
By default, EEH does what's known as a "hot reset" during error recovery of a PCI Express device. We've found a case where the device needs a "fundamental reset" to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch (courtesy of Richard Lary) adds a bit field to pci_dev that indicates whether the device requires a fundamental reset during error recovery. This bit can be checked by EEH to determine which reset type is required. This patch supersedes the previously submitted patch that implemented a reset type callback. Please review and let me know of any concerns. Signed-off-by: Mike Mason <mmlnx@us.ibm.com>