diff mbox series

[2/2] npu2: Purge cache when resetting a GPU

Message ID 1559923710-11923-2-git-send-email-arbab@linux.ibm.com
State Accepted
Headers show
Series [1/2] npu2: Fix typo | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (db3929ee4f0a98596938f05da2789686908ebfd4)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot fail Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Reza Arbab June 7, 2019, 4:08 p.m. UTC
After putting all a GPU's links in reset, do a cache purge in case we
have CPU cache lines belonging to the now-unaccessible GPU memory.

Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU")
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
---
 hw/npu2.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Alexey Kardashevskiy June 11, 2019, 2:55 a.m. UTC | #1
On 08/06/2019 02:08, Reza Arbab wrote:
> After putting all a GPU's links in reset, do a cache purge in case we
> have CPU cache lines belonging to the now-unaccessible GPU memory.
> 
> Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU")
> Cc: skiboot-stable@lists.ozlabs.org
> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
> ---
>  hw/npu2.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/npu2.c b/hw/npu2.c
> index 774592911110..3a2808d7133c 100644
> --- a/hw/npu2.c
> +++ b/hw/npu2.c
> @@ -561,6 +561,8 @@ static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
>  
>  	gpu = list_top(&pd->children, struct pci_device, link);
>  	if (gpu && (*data & PCI_CFG_BRCTL_SECONDARY_RESET)) {
> +		int64_t rc;
> +
>  		dt_for_each_compatible(dt_root, np, "ibm,power9-npu-pciex") {
>  			npphb = pci_get_phb(dt_prop_get_cell(np,
>  					"ibm,opal-phbid", 1));
> @@ -574,6 +576,10 @@ static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
>  					npu2_dev_procedure_reset(ndev);
>  			}
>  		}
> +
> +		rc = purge_l2_l3_caches();
> +		if (rc)
> +			return rc;


Ah, missed that :( Probably would make more sense to have this in
npu2_dev_procedure_reset() as the other caller also purges caches.

Either way

Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>



>  	}
>  
>  	return OPAL_PARTIAL;
>
Reza Arbab June 11, 2019, 3:38 p.m. UTC | #2
On Tue, Jun 11, 2019 at 12:55:48PM +1000, Alexey Kardashevskiy wrote:
>On 08/06/2019 02:08, Reza Arbab wrote:
>> After putting all a GPU's links in reset, do a cache purge in case we
>> have CPU cache lines belonging to the now-unaccessible GPU memory.
>>
>> Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU")
>> Cc: skiboot-stable@lists.ozlabs.org
>> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
>> ---
>>  hw/npu2.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/npu2.c b/hw/npu2.c
>> index 774592911110..3a2808d7133c 100644
>> --- a/hw/npu2.c
>> +++ b/hw/npu2.c
>> @@ -561,6 +561,8 @@ static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
>>
>>  	gpu = list_top(&pd->children, struct pci_device, link);
>>  	if (gpu && (*data & PCI_CFG_BRCTL_SECONDARY_RESET)) {
>> +		int64_t rc;
>> +
>>  		dt_for_each_compatible(dt_root, np, "ibm,power9-npu-pciex") {
>>  			npphb = pci_get_phb(dt_prop_get_cell(np,
>>  					"ibm,opal-phbid", 1));
>> @@ -574,6 +576,10 @@ static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
>>  					npu2_dev_procedure_reset(ndev);
>>  			}
>>  		}
>> +
>> +		rc = purge_l2_l3_caches();
>> +		if (rc)
>> +			return rc;
>
>
>Ah, missed that :( Probably would make more sense to have this in
>npu2_dev_procedure_reset() as the other caller also purges caches.

That is tempting but we'd end up purging the cache a bunch of times, 
once for each link. Probably better to reset each link, and then just do 
one purge afterwards.
Stewart Smith June 13, 2019, 4:57 a.m. UTC | #3
Reza Arbab <arbab@linux.ibm.com> writes:
> After putting all a GPU's links in reset, do a cache purge in case we
> have CPU cache lines belonging to the now-unaccessible GPU memory.
>
> Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU")
> Cc: skiboot-stable@lists.ozlabs.org
> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>

Thanks, merged to master as of d4f2f77377dab27eaf792aa089090bcdd953a5cc
Vasant Hegde July 1, 2019, 10:18 a.m. UTC | #4
On 06/07/2019 09:38 PM, Reza Arbab wrote:
> After putting all a GPU's links in reset, do a cache purge in case we
> have CPU cache lines belonging to the now-unaccessible GPU memory.
> 
> Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU")
> Cc: skiboot-stable@lists.ozlabs.org

It will be part of v6.3.2.

Thanks!
-Vasant
diff mbox series

Patch

diff --git a/hw/npu2.c b/hw/npu2.c
index 774592911110..3a2808d7133c 100644
--- a/hw/npu2.c
+++ b/hw/npu2.c
@@ -561,6 +561,8 @@  static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
 
 	gpu = list_top(&pd->children, struct pci_device, link);
 	if (gpu && (*data & PCI_CFG_BRCTL_SECONDARY_RESET)) {
+		int64_t rc;
+
 		dt_for_each_compatible(dt_root, np, "ibm,power9-npu-pciex") {
 			npphb = pci_get_phb(dt_prop_get_cell(np,
 					"ibm,opal-phbid", 1));
@@ -574,6 +576,10 @@  static int64_t npu2_gpu_bridge_sec_bus_reset(void *dev,
 					npu2_dev_procedure_reset(ndev);
 			}
 		}
+
+		rc = purge_l2_l3_caches();
+		if (rc)
+			return rc;
 	}
 
 	return OPAL_PARTIAL;