Message ID | 72ae8ae9c54097158894a52de23690448de38ea9.1565930772.git.sbobroff@linux.ibm.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b905f8cdca7725e750a84f7188ea6821750124c3 |
Headers | show |
Series | [v5,01/12] powerpc/64: Adjust order in pcibios_init() | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (c9633332103e55bc73d80d07ead28b95a22a85a3) |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 163 lines checked |
Sam Bobroff <sbobroff@linux.ibm.com> writes: > diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c > index 427fc22f72b6..11c807468ab5 100644 > --- a/arch/powerpc/kernel/of_platform.c > +++ b/arch/powerpc/kernel/of_platform.c > @@ -81,7 +81,8 @@ static int of_pci_phb_probe(struct platform_device *dev) > pcibios_claim_one_bus(phb->bus); > > /* Finish EEH setup */ > - eeh_add_device_tree_late(phb->bus); > + if (!eeh_has_flag(EEH_FORCE_DISABLED)) > + eeh_add_device_tree_late(phb->bus); This breaks cell_defconfig which has CONFIG_EEH=n. That's because while eeh_add_device_tree_late() has an empty definition in that case, eeh_has_flag() and EEH_FORCE_DISABLED do not. Let me know how you want to fix it, if it's small just send me an incremental diff. cheers
Hello Sam, Sam Bobroff <sbobroff@linux.ibm.com> writes: > On PowerNV and pSeries, devices currently acquire EEH support from > several different places: Boot-time devices from eeh_probe_devices() > and eeh_addr_cache_build(), Virtual Function devices from the pcibios > bus add device hooks and hot plugged devices from pci_hp_add_devices() > (with other platforms using other methods as well). Unfortunately, > pSeries machines currently discover hot plugged devices using > pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do > not receive EEH support. > > Rather than adding another case for pci_rescan_bus(), this change > widens the scope of the pcibios bus add device hooks so that they can > handle all devices. As a side effect this also supports devices > discovered after manually rescanning via /sys/bus/pci/rescan. > > Note that on PowerNV, this change allows the EEH subsystem to become > enabled after boot as long as it has not been forced off, which was > not previously possible (it was already possible on pSeries). With this change, I get a crash (use after free by the looks of it) when I remove and then add a pci device in qemu: $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \ -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \ -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \ -nic model=e1000 ... # echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \ echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan pci 0000:00:00.0: Removing from iommu group 0 pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff] pci 0000:00:00.0: reg 0x14: [io 0x10040-0x1007f] pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] pci 0000:00:00.0: Adding to iommu group 0 pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56 e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection pci 0000:00:00.0: Removing from iommu group 0 pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff] pci 0000:00:00.0: reg 0x14: [io 0x10000-0x1003f] pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb Faulting instruction address: 0xc000000000597270 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76 NIP: c000000000597270 LR: c000000000599470 CTR: c0000000002030b0 REGS: c00000003ee4f650 TRAP: 0380 Not tainted (5.3.0-rc2-00092-gf381d5711f09) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002442 XER: 00000000 CFAR: c00000000059946c IRQMASK: 0 GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0 LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 Call Trace: [c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable) [c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 [c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40 [c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40 [c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0 [c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220 [c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0 [c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60 [c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100 [c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0 [c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70 [c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100 [c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60 [c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0 [c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280 [c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70 [c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220 [c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140 [c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70 FWIW during boot the EEH core reports: EEH: No capable adapters found: recovery disabled. > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index ca8b0c58a6a7..87edac6f2fd9 100644 > --- a/arch/powerpc/kernel/eeh.c > +++ b/arch/powerpc/kernel/eeh.c > @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev) > struct pci_dn *pdn; > struct eeh_dev *edev; > > - if (!dev || !eeh_enabled()) > + if (!dev) > return; > > pr_debug("EEH: Adding device %s\n", pci_name(dev)); Reverting this hunk works around (fixes?) it.
On Fri, Sep 20, 2019 at 6:28 AM Nathan Lynch <nathanl@linux.ibm.com> wrote: > > Hello Sam, > > Sam Bobroff <sbobroff@linux.ibm.com> writes: > > With this change, I get a crash (use after free by the looks of it) when > I remove and then add a pci device in qemu: > > $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \ > -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \ > -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \ > -nic model=e1000 is there anything special in your kernel config? I tested this with pseries_le_defconfig and couldn't hit the crash. > > ... > > # echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \ > echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan > > pci 0000:00:00.0: Removing from iommu group 0 > pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 > pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff] > pci 0000:00:00.0: reg 0x14: [io 0x10040-0x1007f] > pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] > pci 0000:00:00.0: Adding to iommu group 0 > pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] > pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] > e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56 > e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection > pci 0000:00:00.0: Removing from iommu group 0 > pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 > pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: reg 0x14: [io 0x10000-0x1003f] > pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] > pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] > pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] > BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb > Faulting instruction address: 0xc000000000597270 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > Modules linked in: > CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76 > NIP: c000000000597270 LR: c000000000599470 CTR: c0000000002030b0 > REGS: c00000003ee4f650 TRAP: 0380 Not tainted (5.3.0-rc2-00092-gf381d5711f09) > MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002442 XER: 00000000 > CFAR: c00000000059946c IRQMASK: 0 > GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b > GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 > GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff > GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 > GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 > GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 > GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b > GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 > NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0 > LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 > Call Trace: > [c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable) > [c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 > [c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40 > [c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40 > [c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0 > [c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220 > [c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0 > [c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60 > [c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100 > [c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0 > [c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70 > [c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100 > [c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60 > [c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0 > [c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280 > [c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70 > [c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220 > [c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140 > [c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70 > > FWIW during boot the EEH core reports: > > EEH: No capable adapters found: recovery disabled. > > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > > index ca8b0c58a6a7..87edac6f2fd9 100644 > > --- a/arch/powerpc/kernel/eeh.c > > +++ b/arch/powerpc/kernel/eeh.c > > @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev) > > struct pci_dn *pdn; > > struct eeh_dev *edev; > > > > - if (!dev || !eeh_enabled()) > > + if (!dev) > > return; > > > > pr_debug("EEH: Adding device %s\n", pci_name(dev)); > > Reverting this hunk works around (fixes?) it.
"Oliver O'Halloran" <oohall@gmail.com> writes: > On Fri, Sep 20, 2019 at 6:28 AM Nathan Lynch <nathanl@linux.ibm.com> wrote: >> >> Hello Sam, >> >> Sam Bobroff <sbobroff@linux.ibm.com> writes: >> >> With this change, I get a crash (use after free by the looks of it) when >> I remove and then add a pci device in qemu: >> >> $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \ >> -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \ >> -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \ >> -nic model=e1000 > > is there anything special in your kernel config? I tested this with > pseries_le_defconfig and couldn't hit the crash. My config is below; CONFIG_SLUB_DEBUG_ON=y probably makes the difference. CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_AUDIT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_LOG_CPU_MAX_BUF_SHIFT=13 CONFIG_NUMA_BALANCING=y CONFIG_CGROUPS=y CONFIG_MEMCG=y CONFIG_MEMCG_SWAP=y CONFIG_CGROUP_SCHED=y CONFIG_CGROUP_FREEZER=y CONFIG_CPUSETS=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_PERF=y CONFIG_CGROUP_BPF=y CONFIG_USER_NS=y CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="rootfs.cpio" CONFIG_BPF_SYSCALL=y # CONFIG_COMPAT_BRK is not set CONFIG_PROFILING=y CONFIG_PPC64=y CONFIG_NR_CPUS=2048 CONFIG_CPU_LITTLE_ENDIAN=y CONFIG_PPC_SPLPAR=y CONFIG_DTL=y CONFIG_SCANLOG=y CONFIG_PPC_SMLPAR=y CONFIG_RTAS_FLASH=y CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y CONFIG_HZ_100=y CONFIG_PPC_TRANSACTIONAL_MEM=y CONFIG_KEXEC=y CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_PPC_64K_PAGES=y CONFIG_PPC_SUBPAGE_PROT=y CONFIG_SCHED_SMT=y CONFIG_PM_DEBUG=y CONFIG_VIRTUALIZATION=y CONFIG_KVM_BOOK3S_64=y CONFIG_KVM_BOOK3S_64_HV=y CONFIG_VHOST_NET=y CONFIG_OPROFILE=y CONFIG_KPROBES=y CONFIG_JUMP_LABEL=y CONFIG_REFCOUNT_FULL=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_PARTITION_ADVANCED=y CONFIG_BINFMT_MISC=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTREMOVE=y CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_XFRM_USER=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_NET_IPIP=y CONFIG_SYN_COOKIES=y CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_IPCOMP=y # CONFIG_IPV6 is not set CONFIG_NETFILTER=y # CONFIG_NETFILTER_ADVANCED is not set CONFIG_NF_CONNTRACK=y CONFIG_NF_CONNTRACK_FTP=y CONFIG_NF_CONNTRACK_IRC=y CONFIG_NF_CONNTRACK_SIP=y CONFIG_NF_CT_NETLINK=y CONFIG_NETFILTER_XT_MARK=y CONFIG_NETFILTER_XT_TARGET_LOG=y CONFIG_NETFILTER_XT_TARGET_NFLOG=y CONFIG_NETFILTER_XT_TARGET_TCPMSS=y CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=y CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y CONFIG_NETFILTER_XT_MATCH_POLICY=y CONFIG_NETFILTER_XT_MATCH_STATE=y CONFIG_NF_LOG_ARP=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_NAT=y CONFIG_IP_NF_TARGET_MASQUERADE=y CONFIG_IP_NF_MANGLE=y CONFIG_BRIDGE=y CONFIG_VLAN_8021Q=y CONFIG_NET_SCHED=y CONFIG_NET_CLS_BPF=y CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_BPF=y CONFIG_BPF_JIT=y CONFIG_HOTPLUG_PCI=y CONFIG_HOTPLUG_PCI_RPA=y CONFIG_HOTPLUG_PCI_RPA_DLPAR=y CONFIG_UEVENT_HELPER=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_OF_UNITTEST=y CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_BLK_DEV_FD=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_NBD=y CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=65536 CONFIG_VIRTIO_BLK=y CONFIG_CXL=y CONFIG_OCXL=y CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_ST=y CONFIG_BLK_DEV_SR=y CONFIG_BLK_DEV_SR_VENDOR=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_CXGB3_ISCSI=y CONFIG_SCSI_CXGB4_ISCSI=y CONFIG_SCSI_BNX2_ISCSI=y CONFIG_BE2ISCSI=y CONFIG_CXLFLASH=y CONFIG_SCSI_MPT2SAS=y CONFIG_SCSI_IBMVSCSI=y CONFIG_SCSI_IBMVFC=y CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0 CONFIG_SCSI_IPR=y CONFIG_SCSI_QLA_FC=y CONFIG_SCSI_QLA_ISCSI=y CONFIG_SCSI_LPFC=y CONFIG_SCSI_VIRTIO=y CONFIG_SCSI_DH=y CONFIG_SCSI_DH_RDAC=y CONFIG_SCSI_DH_ALUA=y CONFIG_ATA=y CONFIG_SATA_AHCI=y CONFIG_PATA_AMD=y CONFIG_ATA_GENERIC=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID10=y CONFIG_MD_RAID456=y CONFIG_MD_MULTIPATH=y CONFIG_MD_FAULTY=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_THIN_PROVISIONING=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_DM_MULTIPATH=y CONFIG_DM_MULTIPATH_QL=y CONFIG_DM_MULTIPATH_ST=y CONFIG_DM_UEVENT=y CONFIG_BONDING=y CONFIG_DUMMY=y CONFIG_MACVLAN=y CONFIG_MACVTAP=y CONFIG_VXLAN=y CONFIG_NETCONSOLE=y CONFIG_TUN=y CONFIG_VETH=y CONFIG_VIRTIO_NET=y CONFIG_VORTEX=y CONFIG_ACENIC=y CONFIG_ACENIC_OMIT_TIGON_I=y CONFIG_PCNET32=y CONFIG_TIGON3=y CONFIG_BNX2X=y CONFIG_CHELSIO_T1=y CONFIG_BE2NET=y CONFIG_IBMVETH=y CONFIG_E100=y CONFIG_E1000=y CONFIG_E1000E=y CONFIG_IXGB=y CONFIG_IXGBE=y CONFIG_I40E=y CONFIG_MLX4_EN=y CONFIG_MYRI10GE=y CONFIG_S2IO=y CONFIG_QLGE=y CONFIG_NETXEN_NIC=y CONFIG_PPP=y CONFIG_PPP_BSDCOMP=y CONFIG_PPP_DEFLATE=y CONFIG_PPPOE=y CONFIG_PPP_ASYNC=y CONFIG_PPP_SYNC_TTY=y CONFIG_INPUT_EVDEV=y CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=y # CONFIG_SERIO_SERPORT is not set CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_ICOM=y CONFIG_SERIAL_JSM=y CONFIG_HVC_CONSOLE=y CONFIG_HVC_RTAS=y CONFIG_HVCS=y CONFIG_VIRTIO_CONSOLE=y CONFIG_IBM_BSR=y CONFIG_POWERNV_OP_PANEL=y CONFIG_HW_RANDOM=y CONFIG_RAW_DRIVER=y CONFIG_MAX_RAW_DEVS=1024 CONFIG_I2C_CHARDEV=y CONFIG_FB=y CONFIG_FIRMWARE_EDID=y CONFIG_FB_OF=y CONFIG_FB_MATROX=y CONFIG_FB_MATROX_MILLENIUM=y CONFIG_FB_MATROX_MYSTIQUE=y CONFIG_FB_MATROX_G=y CONFIG_FB_RADEON=y CONFIG_FB_IBM_GXT4500=y CONFIG_LCD_CLASS_DEVICE=y CONFIG_LCD_PLATFORM=y # CONFIG_VGA_CONSOLE is not set CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_LOGO=y CONFIG_HID_GYRATION=y CONFIG_HID_PANTHERLORD=y CONFIG_HID_PETALYNX=y CONFIG_HID_SAMSUNG=y CONFIG_HID_SUNPLUS=y CONFIG_USB_HIDDEV=y CONFIG_USB=y CONFIG_USB_MON=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_HCD_PPC_OF is not set CONFIG_USB_OHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_LEDS_POWERNV=y CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_MTHCA=y CONFIG_INFINIBAND_CXGB3=y CONFIG_INFINIBAND_CXGB4=y CONFIG_MLX4_INFINIBAND=y CONFIG_INFINIBAND_IPOIB=y CONFIG_INFINIBAND_IPOIB_CM=y CONFIG_INFINIBAND_SRP=y CONFIG_INFINIBAND_ISER=y CONFIG_RTC_CLASS=y CONFIG_RTC_DRV_GENERIC=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_BALLOON=y CONFIG_VALIDATE_FS_PARSER=y CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT2_FS_SECURITY=y CONFIG_EXT4_FS=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_EXT4_FS_SECURITY=y CONFIG_JFS_FS=y CONFIG_JFS_POSIX_ACL=y CONFIG_JFS_SECURITY=y CONFIG_XFS_FS=y CONFIG_XFS_POSIX_ACL=y CONFIG_BTRFS_FS=y CONFIG_BTRFS_FS_POSIX_ACL=y CONFIG_NILFS2_FS=y CONFIG_FS_DAX=y CONFIG_AUTOFS4_FS=y CONFIG_FUSE_FS=y CONFIG_OVERLAY_FS=y CONFIG_ISO9660_FS=y CONFIG_UDF_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_PROC_KCORE=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_HUGETLBFS=y CONFIG_CRAMFS=y CONFIG_SQUASHFS=y CONFIG_SQUASHFS_XATTR=y CONFIG_SQUASHFS_LZO=y CONFIG_SQUASHFS_XZ=y CONFIG_PSTORE=y CONFIG_NFS_FS=y CONFIG_NFS_V3_ACL=y CONFIG_NFS_V4=y CONFIG_NFSD=y CONFIG_NFSD_V3_ACL=y CONFIG_NFSD_V4=y CONFIG_CIFS=y CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_UTF8=y CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_PCBC=y CONFIG_CRYPTO_CRC32C_VPMSUM=y CONFIG_CRYPTO_MD5_PPC=y CONFIG_CRYPTO_MICHAEL_MIC=y CONFIG_CRYPTO_SHA1_PPC=y CONFIG_CRYPTO_TGR192=y CONFIG_CRYPTO_WP512=y CONFIG_CRYPTO_ANUBIS=y CONFIG_CRYPTO_ARC4=y CONFIG_CRYPTO_BLOWFISH=y CONFIG_CRYPTO_CAST6=y CONFIG_CRYPTO_KHAZAD=y CONFIG_CRYPTO_SALSA20=y CONFIG_CRYPTO_SERPENT=y CONFIG_CRYPTO_TEA=y CONFIG_CRYPTO_TWOFISH=y CONFIG_CRYPTO_LZO=y CONFIG_CRYPTO_DEV_NX=y CONFIG_CRYPTO_DEV_VMX=y CONFIG_CRYPTO_DEV_VMX_ENCRYPT=y CONFIG_CRYPTO_DEV_VIRTIO=y CONFIG_PRINTK_TIME=y CONFIG_DYNAMIC_DEBUG=y CONFIG_DEBUG_INFO=y CONFIG_DEBUG_INFO_REDUCED=y CONFIG_GDB_SCRIPTS=y CONFIG_MAGIC_SYSRQ=y CONFIG_DEBUG_KERNEL=y CONFIG_PAGE_EXTENSION=y CONFIG_PAGE_POISONING=y CONFIG_SLUB_DEBUG_ON=y CONFIG_DEBUG_STACK_USAGE=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_PER_CPU_MAPS=y CONFIG_DEBUG_STACKOVERFLOW=y CONFIG_DEBUG_SHIRQ=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_WQ_WATCHDOG=y CONFIG_PANIC_ON_OOPS=y CONFIG_SCHED_STACK_END_CHECK=y CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_SG=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_DEBUG_WQ_FORCE_RR_CPU=y CONFIG_LATENCYTOP=y CONFIG_FUNCTION_TRACER=y CONFIG_SCHED_TRACER=y CONFIG_BLK_DEV_IO_TRACE=y CONFIG_CODE_PATCHING_SELFTEST=y CONFIG_FTR_FIXUP_SELFTEST=y CONFIG_MSI_BITMAP_SELFTEST=y CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y
On Thu, Sep 19, 2019 at 03:28:40PM -0500, Nathan Lynch wrote: > Hello Sam, > > Sam Bobroff <sbobroff@linux.ibm.com> writes: > > On PowerNV and pSeries, devices currently acquire EEH support from > > several different places: Boot-time devices from eeh_probe_devices() > > and eeh_addr_cache_build(), Virtual Function devices from the pcibios > > bus add device hooks and hot plugged devices from pci_hp_add_devices() > > (with other platforms using other methods as well). Unfortunately, > > pSeries machines currently discover hot plugged devices using > > pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do > > not receive EEH support. > > > > Rather than adding another case for pci_rescan_bus(), this change > > widens the scope of the pcibios bus add device hooks so that they can > > handle all devices. As a side effect this also supports devices > > discovered after manually rescanning via /sys/bus/pci/rescan. > > > > Note that on PowerNV, this change allows the EEH subsystem to become > > enabled after boot as long as it has not been forced off, which was > > not previously possible (it was already possible on pSeries). > > With this change, I get a crash (use after free by the looks of it) when > I remove and then add a pci device in qemu: > > $ qemu-system-ppc64 -M pseries -append 'debug console=hvc0' \ > -nographic -vga none -m 1G,slots=32,maxmem=1024G -smp 2 \ > -kernel vmlinux -initrd ~/b/br/ppc64le-initramfs/images/rootfs.cpio \ > -nic model=e1000 > > ... > > # echo 1 > /sys/devices/pci0000:00/0000:00:00.0/remove ; \ > echo 1 > /sys/devices/pci0000:00/pci_bus/0000:00/rescan > > pci 0000:00:00.0: Removing from iommu group 0 > pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 > pci 0000:00:00.0: reg 0x10: [mem 0x200080000000-0x20008001ffff] > pci 0000:00:00.0: reg 0x14: [io 0x10040-0x1007f] > pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] > pci 0000:00:00.0: Adding to iommu group 0 > pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] > pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] > e1000 0000:00:00.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56 > e1000 0000:00:00.0 eth0: Intel(R) PRO/1000 Network Connection > pci 0000:00:00.0: Removing from iommu group 0 > pci 0000:00:00.0: [8086:100e] type 00 class 0x020000 > pci 0000:00:00.0: reg 0x10: [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: reg 0x14: [io 0x10000-0x1003f] > pci 0000:00:00.0: reg 0x30: [mem 0x200080040000-0x20008007ffff pref] > pci 0000:00:00.0: BAR 6: assigned [mem 0x200080000000-0x20008003ffff pref] > pci 0000:00:00.0: BAR 0: assigned [mem 0x200080040000-0x20008005ffff] > pci 0000:00:00.0: BAR 1: assigned [io 0x10000-0x1003f] > BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6bfb > Faulting instruction address: 0xc000000000597270 > Oops: Kernel access of bad area, sig: 11 [#1] > LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > Modules linked in: > CPU: 0 PID: 2464 Comm: pci-probe-vs-cp Not tainted 5.3.0-rc2-00092-gf381d5711f09 #76 > NIP: c000000000597270 LR: c000000000599470 CTR: c0000000002030b0 > REGS: c00000003ee4f650 TRAP: 0380 Not tainted (5.3.0-rc2-00092-gf381d5711f09) > MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24002442 XER: 00000000 > CFAR: c00000000059946c IRQMASK: 0 > GPR00: c000000000599470 c00000003ee4f8e0 c000000003317a00 6b6b6b6b6b6b6b6b > GPR04: c000000001d0fa38 0000000000000000 0000000000000000 221a64979a66f870 > GPR08: c00000000347b398 0000000000000000 c00000000336e070 ffffffffffffffff > GPR12: 0000000000002000 c000000004060000 0000000000000000 0000000000000000 > GPR16: 00000000100a78d8 00007fffe9fdff96 00000000100a7898 0000000000000000 > GPR20: 0000000000000000 00000000100e0ff0 0000000000000000 00000000100e0fe8 > GPR24: 0000000000000000 000001002ae50260 c000000001d0fa38 6b6b6b6b6b6b6b6b > GPR28: fffffffffffffff2 c000000001d0fa38 0000000000000000 c000000003118c18 > NIP [c000000000597270] kernfs_find_ns+0x50/0x3d0 > LR [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 > Call Trace: > [c00000003ee4f8e0] [c00000000020950c] lockdep_hardirqs_on+0x10c/0x210 (unreliable) > [c00000003ee4f970] [c000000000599470] kernfs_remove_by_name_ns+0x60/0xe0 > [c00000003ee4fa00] [c00000000059ca08] sysfs_remove_file_ns+0x28/0x40 > [c00000003ee4fa20] [c000000000cbd70c] device_remove_file+0x2c/0x40 > [c00000003ee4fa40] [c000000000051480] eeh_sysfs_remove_device+0x50/0xf0 > [c00000003ee4fa80] [c00000000004a594] eeh_add_device_late.part.7+0x84/0x220 > [c00000003ee4fb00] [c0000000000e94f0] pseries_pcibios_bus_add_device+0x60/0xb0 > [c00000003ee4fb70] [c00000000006fc40] pcibios_bus_add_device+0x40/0x60 > [c00000003ee4fb90] [c000000000bc5220] pci_bus_add_device+0x30/0x100 > [c00000003ee4fc00] [c000000000bc5344] pci_bus_add_devices+0x54/0xb0 > [c00000003ee4fc40] [c000000000bca058] pci_rescan_bus+0x48/0x70 > [c00000003ee4fc70] [c000000000bd9adc] dev_bus_rescan_store+0xcc/0x100 > [c00000003ee4fcb0] [c000000000cbc9d8] dev_attr_store+0x38/0x60 > [c00000003ee4fcd0] [c00000000059c460] sysfs_kf_write+0x70/0xb0 > [c00000003ee4fd10] [c00000000059aa98] kernfs_fop_write+0xf8/0x280 > [c00000003ee4fd60] [c0000000004b3e5c] __vfs_write+0x3c/0x70 > [c00000003ee4fd80] [c0000000004b81f0] vfs_write+0xd0/0x220 > [c00000003ee4fdd0] [c0000000004b85ac] ksys_write+0x7c/0x140 > [c00000003ee4fe20] [c00000000000bc6c] system_call+0x5c/0x70 > > FWIW during boot the EEH core reports: > > EEH: No capable adapters found: recovery disabled. > > > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > > index ca8b0c58a6a7..87edac6f2fd9 100644 > > --- a/arch/powerpc/kernel/eeh.c > > +++ b/arch/powerpc/kernel/eeh.c > > @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev) > > struct pci_dn *pdn; > > struct eeh_dev *edev; > > > > - if (!dev || !eeh_enabled()) > > + if (!dev) > > return; > > > > pr_debug("EEH: Adding device %s\n", pci_name(dev)); > > Reverting this hunk works around (fixes?) it. Hi Nathan, Thanks, this does look like a bug to me. I couldn't replicate your crash (even with CONFIG_SLUB_DEBUG_ON) but I think I do see a bug there. Does the below patch also fix it for you? Cheers, Sam. diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 0a91dee51245..f8aa65cb2931 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1207,10 +1207,11 @@ void eeh_add_device_late(struct pci_dev *dev) if (eeh_has_flag(EEH_PROBE_MODE_DEV)) eeh_ops->probe(pdn, NULL); - edev->pdev = dev; - dev->dev.archdata.edev = edev; - - eeh_addr_cache_insert_dev(dev); + if (eeh_enabled()) { + edev->pdev = dev; + dev->dev.archdata.edev = edev; + eeh_addr_cache_insert_dev(dev); + } } /**
Sam Bobroff <sbobroff@linux.ibm.com> writes: > Thanks, this does look like a bug to me. I couldn't replicate your crash > (even with CONFIG_SLUB_DEBUG_ON) but I think I do see a bug there. > > Does the below patch also fix it for you? Yes, this works as well, thanks. > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c > index 0a91dee51245..f8aa65cb2931 100644 > --- a/arch/powerpc/kernel/eeh.c > +++ b/arch/powerpc/kernel/eeh.c > @@ -1207,10 +1207,11 @@ void eeh_add_device_late(struct pci_dev *dev) > if (eeh_has_flag(EEH_PROBE_MODE_DEV)) > eeh_ops->probe(pdn, NULL); > > - edev->pdev = dev; > - dev->dev.archdata.edev = edev; > - > - eeh_addr_cache_insert_dev(dev); > + if (eeh_enabled()) { > + edev->pdev = dev; > + dev->dev.archdata.edev = edev; > + eeh_addr_cache_insert_dev(dev); > + } > } > > /**
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index ca8b0c58a6a7..87edac6f2fd9 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -1272,7 +1272,7 @@ void eeh_add_device_late(struct pci_dev *dev) struct pci_dn *pdn; struct eeh_dev *edev; - if (!dev || !eeh_enabled()) + if (!dev) return; pr_debug("EEH: Adding device %s\n", pci_name(dev)); diff --git a/arch/powerpc/kernel/of_platform.c b/arch/powerpc/kernel/of_platform.c index 427fc22f72b6..11c807468ab5 100644 --- a/arch/powerpc/kernel/of_platform.c +++ b/arch/powerpc/kernel/of_platform.c @@ -81,7 +81,8 @@ static int of_pci_phb_probe(struct platform_device *dev) pcibios_claim_one_bus(phb->bus); /* Finish EEH setup */ - eeh_add_device_tree_late(phb->bus); + if (!eeh_has_flag(EEH_FORCE_DISABLED)) + eeh_add_device_tree_late(phb->bus); /* Add probed PCI devices to the device model */ pci_bus_add_devices(phb->bus); diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index 629f9390d9af..77cc2f51c2ea 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -43,7 +43,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); - if (!pdev->is_virtfn) + if (eeh_has_flag(EEH_FORCE_DISABLED)) return; pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev)); @@ -222,6 +222,25 @@ static const struct file_operations eeh_tree_state_debugfs_ops = { #endif /* CONFIG_DEBUG_FS */ +void pnv_eeh_enable_phbs(void) +{ + struct pci_controller *hose; + struct pnv_phb *phb; + + list_for_each_entry(hose, &hose_list, list_node) { + phb = hose->private_data; + /* + * If EEH is enabled, we're going to rely on that. + * Otherwise, we restore to conventional mechanism + * to clear frozen PE during PCI config access. + */ + if (eeh_enabled()) + phb->flags |= PNV_PHB_FLAG_EEH; + else + phb->flags &= ~PNV_PHB_FLAG_EEH; + } +} + /** * pnv_eeh_post_init - EEH platform dependent post initialization * @@ -260,19 +279,11 @@ int pnv_eeh_post_init(void) if (!eeh_enabled()) disable_irq(eeh_event_irq); + pnv_eeh_enable_phbs(); + list_for_each_entry(hose, &hose_list, list_node) { phb = hose->private_data; - /* - * If EEH is enabled, we're going to rely on that. - * Otherwise, we restore to conventional mechanism - * to clear frozen PE during PCI config access. - */ - if (eeh_enabled()) - phb->flags |= PNV_PHB_FLAG_EEH; - else - phb->flags &= ~PNV_PHB_FLAG_EEH; - /* Create debugfs entries */ #ifdef CONFIG_DEBUG_FS if (phb->has_dbgfs || !phb->dbgfs) @@ -483,7 +494,11 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data) * Enable EEH explicitly so that we will do EEH check * while accessing I/O stuff */ - eeh_add_flag(EEH_ENABLED); + if (!eeh_has_flag(EEH_ENABLED)) { + enable_irq(eeh_event_irq); + pnv_eeh_enable_phbs(); + eeh_add_flag(EEH_ENABLED); + } /* Save memory bars */ eeh_save_bars(edev); diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c index 31733f6d642c..96ad41fbf96b 100644 --- a/arch/powerpc/platforms/pseries/eeh_pseries.c +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c @@ -42,44 +42,44 @@ static int ibm_get_config_addr_info; static int ibm_get_config_addr_info2; static int ibm_configure_pe; -#ifdef CONFIG_PCI_IOV void pseries_pcibios_bus_add_device(struct pci_dev *pdev) { struct pci_dn *pdn = pci_get_pdn(pdev); - struct pci_dn *physfn_pdn; - struct eeh_dev *edev; - if (!pdev->is_virtfn) + if (eeh_has_flag(EEH_FORCE_DISABLED)) return; pr_debug("%s: EEH: Setting up device %s.\n", __func__, pci_name(pdev)); +#ifdef CONFIG_PCI_IOV + if (pdev->is_virtfn) { + struct pci_dn *physfn_pdn; - pdn->device_id = pdev->device; - pdn->vendor_id = pdev->vendor; - pdn->class_code = pdev->class; - /* - * Last allow unfreeze return code used for retrieval - * by user space in eeh-sysfs to show the last command - * completion from platform. - */ - pdn->last_allow_rc = 0; - physfn_pdn = pci_get_pdn(pdev->physfn); - pdn->pe_number = physfn_pdn->pe_num_map[pdn->vf_index]; - edev = pdn_to_eeh_dev(pdn); - - /* - * The following operations will fail if VF's sysfs files - * aren't created or its resources aren't finalized. - */ + pdn->device_id = pdev->device; + pdn->vendor_id = pdev->vendor; + pdn->class_code = pdev->class; + /* + * Last allow unfreeze return code used for retrieval + * by user space in eeh-sysfs to show the last command + * completion from platform. + */ + pdn->last_allow_rc = 0; + physfn_pdn = pci_get_pdn(pdev->physfn); + pdn->pe_number = physfn_pdn->pe_num_map[pdn->vf_index]; + } +#endif eeh_add_device_early(pdn); eeh_add_device_late(pdev); - edev->pe_config_addr = (pdn->busno << 16) | (pdn->devfn << 8); - eeh_rmv_from_parent_pe(edev); /* Remove as it is adding to bus pe */ - eeh_add_to_parent_pe(edev); /* Add as VF PE type */ - eeh_sysfs_add_device(pdev); +#ifdef CONFIG_PCI_IOV + if (pdev->is_virtfn) { + struct eeh_dev *edev = pdn_to_eeh_dev(pdn); -} + edev->pe_config_addr = (pdn->busno << 16) | (pdn->devfn << 8); + eeh_rmv_from_parent_pe(edev); /* Remove as it is adding to bus pe */ + eeh_add_to_parent_pe(edev); /* Add as VF PE type */ + } #endif + eeh_sysfs_add_device(pdev); +} /* * Buffer for reporting slot-error-detail rtas calls. Its here @@ -146,10 +146,8 @@ static int pseries_eeh_init(void) /* Set EEH probe mode */ eeh_add_flag(EEH_PROBE_MODE_DEVTREE | EEH_ENABLE_IO_FOR_LOG); -#ifdef CONFIG_PCI_IOV /* Set EEH machine dependent code */ ppc_md.pcibios_bus_add_device = pseries_pcibios_bus_add_device; -#endif return 0; }
On PowerNV and pSeries, devices currently acquire EEH support from several different places: Boot-time devices from eeh_probe_devices() and eeh_addr_cache_build(), Virtual Function devices from the pcibios bus add device hooks and hot plugged devices from pci_hp_add_devices() (with other platforms using other methods as well). Unfortunately, pSeries machines currently discover hot plugged devices using pci_rescan_bus(), not pci_hp_add_devices(), and so those devices do not receive EEH support. Rather than adding another case for pci_rescan_bus(), this change widens the scope of the pcibios bus add device hooks so that they can handle all devices. As a side effect this also supports devices discovered after manually rescanning via /sys/bus/pci/rescan. Note that on PowerNV, this change allows the EEH subsystem to become enabled after boot as long as it has not been forced off, which was not previously possible (it was already possible on pSeries). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> --- arch/powerpc/kernel/eeh.c | 2 +- arch/powerpc/kernel/of_platform.c | 3 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 39 +++++++++----- arch/powerpc/platforms/pseries/eeh_pseries.c | 54 ++++++++++---------- 4 files changed, 56 insertions(+), 42 deletions(-)