Message ID | 1473408667-7107-1-git-send-email-bhe@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Hi Joreg, Sorry, forget ccing to you. Recently I tried to fix the kdump failure in amd iommu system again, and now the latest code works, IO_PAGE_FAULT can't be seen any more. But on several amd iommu system with bnx2 NIC, always IO_PAGE_FAULT will be printed out. After investegating I found out bnx2 driver doesn't reset hardware/reg like other pci device, it does the reset job in bnx2_open which is the net device up stage. So with this patch the IO_PAGE_FAULT is away too on the system with bnx2 NIC. I will However when I got a intel system with vt-d and bnx2 NIC, kdump works well, and no any error message can be seen. From code it clearly shows the domain assignment is done in __intel_map_single, at this time bnx2 driver hasn't reset device, the on-flight DMA should still exist. Do you have any idea on this? Or I missed anything? I also deferred the set_dte_entry calling to __map_single calling, the principal should be similar. Thanks Baoquan On 09/09/16 at 04:11pm, Baoquan He wrote: > When system enters into kdump kernel because of kernel panic, it won't > shutdown devices. On-flight DMA will continue transferring data until > device driver initializes. All devices are supposed to reset during > driver initialization. And this property is used to fix the kdump > failure in system with intel iommu. Other systems with hardware iommu > should be similar. Please check commit 091d42e ("iommu/vt-d: Copy > translation tables from old kernel") and those commits around it. > > But bnx2 driver doesn't reset device during driver initialization. The > device resetting is deferred to net device up stage. This will cause > hardware iommu handling failure on bnx2 device. And its resetting relies > on firmware. So in this patch move the firmware requesting code to earlier > bnx2_init_one(), then next call bnx2_reset_chip to reset device. > > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > drivers/net/ethernet/broadcom/bnx2.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c > index 8fc3f3c..d68a487 100644 > --- a/drivers/net/ethernet/broadcom/bnx2.c > +++ b/drivers/net/ethernet/broadcom/bnx2.c > @@ -6356,10 +6356,6 @@ bnx2_open(struct net_device *dev) > struct bnx2 *bp = netdev_priv(dev); > int rc; > > - rc = bnx2_request_firmware(bp); > - if (rc < 0) > - goto out; > - > netif_carrier_off(dev); > > bnx2_disable_int(bp); > @@ -8575,6 +8571,12 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > > pci_set_drvdata(pdev, dev); > > + rc = bnx2_request_firmware(bp); > + if (rc < 0) > + goto error; > + > + > + bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET); > memcpy(dev->dev_addr, bp->mac_addr, ETH_ALEN); > > dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | > @@ -8607,6 +8609,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > return 0; > > error: > + bnx2_release_firmware(bp); > pci_iounmap(pdev, bp->regview); > pci_release_regions(pdev); > pci_disable_device(pdev); > -- > 2.5.5 >
Hi Baoquan, On Fri, Sep 09, 2016 at 04:22:25PM +0800, Baoquan He wrote: > Recently I tried to fix the kdump failure in amd iommu system again, and > now the latest code works, IO_PAGE_FAULT can't be seen any more. But on > several amd iommu system with bnx2 NIC, always IO_PAGE_FAULT will be > printed out. After investegating I found out bnx2 driver doesn't reset > hardware/reg like other pci device, it does the reset job in bnx2_open > which is the net device up stage. So with this patch the IO_PAGE_FAULT > is away too on the system with bnx2 NIC. I will > > However when I got a intel system with vt-d and bnx2 NIC, kdump works > well, and no any error message can be seen. From code it clearly shows > the domain assignment is done in __intel_map_single, at this time bnx2 > driver hasn't reset device, the on-flight DMA should still exist. Do you > have any idea on this? Or I missed anything? I also deferred the > set_dte_entry calling to __map_single calling, the principal should be > similar. Did you make sure that all unity-mappings are in place in the newly assigned domain for the bnx2 device before domains are switched? Joerg
On 09/09/16 at 10:41am, Joerg Roedel wrote: > > Hi Baoquan, > > On Fri, Sep 09, 2016 at 04:22:25PM +0800, Baoquan He wrote: > > Recently I tried to fix the kdump failure in amd iommu system again, and > > now the latest code works, IO_PAGE_FAULT can't be seen any more. But on > > several amd iommu system with bnx2 NIC, always IO_PAGE_FAULT will be > > printed out. After investegating I found out bnx2 driver doesn't reset > > hardware/reg like other pci device, it does the reset job in bnx2_open > > which is the net device up stage. So with this patch the IO_PAGE_FAULT > > is away too on the system with bnx2 NIC. I will > > > > However when I got a intel system with vt-d and bnx2 NIC, kdump works > > well, and no any error message can be seen. From code it clearly shows > > the domain assignment is done in __intel_map_single, at this time bnx2 > > driver hasn't reset device, the on-flight DMA should still exist. Do you > > have any idea on this? Or I missed anything? I also deferred the > > set_dte_entry calling to __map_single calling, the principal should be > > similar. > > Did you make sure that all unity-mappings are in place in the newly > assigned domain for the bnx2 device before domains are switched? About unity-mappings, are you saying it for intel iommu or amd iommu? For amd iommu, I just skip call set_dte_entry calling during iommu device init stage. Then in device init stage, namely __map_single, judge and call set_dte_entry there to install the new pt_root into the related dev table entry. Thought you mention it, let me look into the unity-mappings of amd iommu again. Thanks a lot!
Hi Joerg, On 09/09/16 at 10:41am, Joerg Roedel wrote: > > Hi Baoquan, > > On Fri, Sep 09, 2016 at 04:22:25PM +0800, Baoquan He wrote: > > Recently I tried to fix the kdump failure in amd iommu system again, and > > now the latest code works, IO_PAGE_FAULT can't be seen any more. But on > > several amd iommu system with bnx2 NIC, always IO_PAGE_FAULT will be > > printed out. After investegating I found out bnx2 driver doesn't reset > > hardware/reg like other pci device, it does the reset job in bnx2_open > > which is the net device up stage. So with this patch the IO_PAGE_FAULT > > is away too on the system with bnx2 NIC. I will > > > > However when I got a intel system with vt-d and bnx2 NIC, kdump works > > well, and no any error message can be seen. From code it clearly shows > > the domain assignment is done in __intel_map_single, at this time bnx2 > > driver hasn't reset device, the on-flight DMA should still exist. Do you > > have any idea on this? Or I missed anything? I also deferred the > > set_dte_entry calling to __map_single calling, the principal should be > > similar. > > Did you make sure that all unity-mappings are in place in the newly > assigned domain for the bnx2 device before domains are switched? What I am doing is that in iommu driver init stage anything will keep going forware as it does in normal kernel except for set_dte_entry calling. If in kdump kernel and in iommu init stage, just return directly at the beginning of set_dte_entry. For identity mapping, the pass through handling will do everything but return from the beginning of set_dte_entry too. Since it need install pte_root into dev table entry too though its pte_root is NULL. The unity mapping range only does iova reservation. It doesn't do dev table entry handling before device driver init. So unity mappings should be OK. Thanks a lot! Baoquan
On 09/09/16 at 04:11pm, Baoquan He wrote: > When system enters into kdump kernel because of kernel panic, it won't > shutdown devices. On-flight DMA will continue transferring data until > device driver initializes. All devices are supposed to reset during > driver initialization. And this property is used to fix the kdump > failure in system with intel iommu. Other systems with hardware iommu > should be similar. Please check commit 091d42e ("iommu/vt-d: Copy > translation tables from old kernel") and those commits around it. > > But bnx2 driver doesn't reset device during driver initialization. The > device resetting is deferred to net device up stage. This will cause > hardware iommu handling failure on bnx2 device. And its resetting relies > on firmware. So in this patch move the firmware requesting code to earlier > bnx2_init_one(), then next call bnx2_reset_chip to reset device. > > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > drivers/net/ethernet/broadcom/bnx2.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c > index 8fc3f3c..d68a487 100644 > --- a/drivers/net/ethernet/broadcom/bnx2.c > +++ b/drivers/net/ethernet/broadcom/bnx2.c > @@ -6356,10 +6356,6 @@ bnx2_open(struct net_device *dev) > struct bnx2 *bp = netdev_priv(dev); > int rc; > > - rc = bnx2_request_firmware(bp); > - if (rc < 0) > - goto out; Sorry, here the corresponding bnx2_release_firmware need be removed too. Will post v2 to update this. > - > netif_carrier_off(dev); > > bnx2_disable_int(bp); > @@ -8575,6 +8571,12 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > > pci_set_drvdata(pdev, dev); > > + rc = bnx2_request_firmware(bp); > + if (rc < 0) > + goto error; > + > + > + bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET); > memcpy(dev->dev_addr, bp->mac_addr, ETH_ALEN); > > dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | > @@ -8607,6 +8609,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) > return 0; > > error: > + bnx2_release_firmware(bp); > pci_iounmap(pdev, bp->regview); > pci_release_regions(pdev); > pci_disable_device(pdev); > -- > 2.5.5 >
From: Baoquan He <bhe@redhat.com> Date: Fri, 9 Sep 2016 16:11:07 +0800 > When system enters into kdump kernel because of kernel panic, it won't > shutdown devices. On-flight DMA will continue transferring data until > device driver initializes. All devices are supposed to reset during > driver initialization. And this property is used to fix the kdump > failure in system with intel iommu. Other systems with hardware iommu > should be similar. Please check commit 091d42e ("iommu/vt-d: Copy > translation tables from old kernel") and those commits around it. > > But bnx2 driver doesn't reset device during driver initialization. The > device resetting is deferred to net device up stage. This will cause > hardware iommu handling failure on bnx2 device. And its resetting relies > on firmware. So in this patch move the firmware requesting code to earlier > bnx2_init_one(), then next call bnx2_reset_chip to reset device. > > Signed-off-by: Baoquan He <bhe@redhat.com> Applied, thanks.
diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index 8fc3f3c..d68a487 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -6356,10 +6356,6 @@ bnx2_open(struct net_device *dev) struct bnx2 *bp = netdev_priv(dev); int rc; - rc = bnx2_request_firmware(bp); - if (rc < 0) - goto out; - netif_carrier_off(dev); bnx2_disable_int(bp); @@ -8575,6 +8571,12 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) pci_set_drvdata(pdev, dev); + rc = bnx2_request_firmware(bp); + if (rc < 0) + goto error; + + + bnx2_reset_chip(bp, BNX2_DRV_MSG_CODE_RESET); memcpy(dev->dev_addr, bp->mac_addr, ETH_ALEN); dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_SG | @@ -8607,6 +8609,7 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) return 0; error: + bnx2_release_firmware(bp); pci_iounmap(pdev, bp->regview); pci_release_regions(pdev); pci_disable_device(pdev);
When system enters into kdump kernel because of kernel panic, it won't shutdown devices. On-flight DMA will continue transferring data until device driver initializes. All devices are supposed to reset during driver initialization. And this property is used to fix the kdump failure in system with intel iommu. Other systems with hardware iommu should be similar. Please check commit 091d42e ("iommu/vt-d: Copy translation tables from old kernel") and those commits around it. But bnx2 driver doesn't reset device during driver initialization. The device resetting is deferred to net device up stage. This will cause hardware iommu handling failure on bnx2 device. And its resetting relies on firmware. So in this patch move the firmware requesting code to earlier bnx2_init_one(), then next call bnx2_reset_chip to reset device. Signed-off-by: Baoquan He <bhe@redhat.com> --- drivers/net/ethernet/broadcom/bnx2.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)