Message ID | 20201113050632.74124-1-aik@ozlabs.ru (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [kernel] vfio_pci_nvlink2: Do not attempt NPU2 setup on old P8's NPU | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (80ecbe16c827714ce3741ed1f1d34488b903e717) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 25 lines checked |
snowpatch_ozlabs/needsstable | warning | Please consider tagging this patch for stable! |
On 13/11/20 4:06 pm, Alexey Kardashevskiy wrote: > We execute certain NPU2 setup code (such as mapping an LPID to a device > in NPU2) unconditionally if an Nvlink bridge is detected. However this > cannot succeed on P8+ machines as the init helpers return an error other > than ENODEV which means the device is there is and setup failed so > vfio_pci_enable() fails and pass through is not possible. > > This changes the two NPU2 related init helpers to return -ENODEV if > there is no "memory-region" device tree property as this is > the distinction between NPU and NPU2. > > Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Should this be Cc: stable? Andrew
On 13/11/2020 16:30, Andrew Donnellan wrote: > On 13/11/20 4:06 pm, Alexey Kardashevskiy wrote: >> We execute certain NPU2 setup code (such as mapping an LPID to a device >> in NPU2) unconditionally if an Nvlink bridge is detected. However this >> cannot succeed on P8+ machines as the init helpers return an error other >> than ENODEV which means the device is there is and setup failed so >> vfio_pci_enable() fails and pass through is not possible. >> >> This changes the two NPU2 related init helpers to return -ENODEV if >> there is no "memory-region" device tree property as this is >> the distinction between NPU and NPU2. >> >> Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] >> subdriver") >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > > Should this be Cc: stable? This depends on whether P8+ + NVLink was ever a product (hi Leonardo) and had actual customers who still rely on upstream kernels to work as after many years only the last week I heard form some Redhat test engineer that it does not work. May be cc: stable...
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > On 13/11/2020 16:30, Andrew Donnellan wrote: >> On 13/11/20 4:06 pm, Alexey Kardashevskiy wrote: >>> We execute certain NPU2 setup code (such as mapping an LPID to a device >>> in NPU2) unconditionally if an Nvlink bridge is detected. However this >>> cannot succeed on P8+ machines as the init helpers return an error other >>> than ENODEV which means the device is there is and setup failed so >>> vfio_pci_enable() fails and pass through is not possible. >>> >>> This changes the two NPU2 related init helpers to return -ENODEV if >>> there is no "memory-region" device tree property as this is >>> the distinction between NPU and NPU2. >>> >>> Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] >>> subdriver") >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> >> Should this be Cc: stable? > > This depends on whether P8+ + NVLink was ever a product (hi Leonardo) > and had actual customers who still rely on upstream kernels to work as > after many years only the last week I heard form some Redhat test > engineer that it does not work. May be cc: stable... I don't think it really matters if it was a product or not. Upstream is never a product anyway. If the fix is simple and unlikely to introduce a regression, and would potentially save someone having to debug the problem again, then it should get backported to stable. You should also clarify what you mean by "P8+", it won't be clear to most readers if you mean "Power 8 and/or later" or specifically Naples / Power8 NVL. cheers
diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c index 65c61710c0e9..9adcf6a8f888 100644 --- a/drivers/vfio/pci/vfio_pci_nvlink2.c +++ b/drivers/vfio/pci/vfio_pci_nvlink2.c @@ -231,7 +231,7 @@ int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev) return -EINVAL; if (of_property_read_u32(npu_node, "memory-region", &mem_phandle)) - return -EINVAL; + return -ENODEV; mem_node = of_find_node_by_phandle(mem_phandle); if (!mem_node) @@ -393,7 +393,7 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev) int ret; struct vfio_pci_npu2_data *data; struct device_node *nvlink_dn; - u32 nvlink_index = 0; + u32 nvlink_index = 0, mem_phandle = 0; struct pci_dev *npdev = vdev->pdev; struct device_node *npu_node = pci_device_to_OF_node(npdev); struct pci_controller *hose = pci_bus_to_host(npdev->bus); @@ -408,6 +408,9 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev) if (!pnv_pci_get_gpu_dev(vdev->pdev)) return -ENODEV; + if (of_property_read_u32(npu_node, "memory-region", &mem_phandle)) + return -ENODEV; + /* * NPU2 normally has 8 ATSD registers (for concurrency) and 6 links * so we can allocate one register per link, using nvlink index as
We execute certain NPU2 setup code (such as mapping an LPID to a device in NPU2) unconditionally if an Nvlink bridge is detected. However this cannot succeed on P8+ machines as the init helpers return an error other than ENODEV which means the device is there is and setup failed so vfio_pci_enable() fails and pass through is not possible. This changes the two NPU2 related init helpers to return -ENODEV if there is no "memory-region" device tree property as this is the distinction between NPU and NPU2. Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- drivers/vfio/pci/vfio_pci_nvlink2.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)