Message ID | 20190722082821.37310-1-aik@ozlabs.ru (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [kernel] powerpc/pseries/iommu: Add cond_resched() for huge updates | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (f3365d1a959d5c6527efe3d38276acc9b58e3f3f) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 7 lines checked |
Alexey Kardashevskiy <aik@ozlabs.ru> writes: > Mapping ~5.000.000 TCEs currently takes about 40s; this is the amount > required for a 300GB VM with 64k IOMMU page size. Anything bigger than > this produces RCU stall warnings. OK. Are we sure we're not doing anything stupid in that code to make it go that slowly? > This adds cond_resched() to allow the scheduler to do context switching > when it decides to. > > This loop is called from dma_set_mask() which is a sleepable context. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > arch/powerpc/platforms/pseries/iommu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > index 889dc2e44b89..2b8de822272f 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -459,6 +459,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn, > static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn, > unsigned long num_pfn, void *arg) > { > + cond_resched(); > return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg); > } Why there and not in tce_setrange_multi_pSeriesLP() ? I'm not sure what the maximum granularity walk_system_ram_range() will ever call us with is. cheers
On 23/07/2019 21:46, Michael Ellerman wrote: > Alexey Kardashevskiy <aik@ozlabs.ru> writes: >> Mapping ~5.000.000 TCEs currently takes about 40s; this is the amount >> required for a 300GB VM with 64k IOMMU page size. Anything bigger than >> this produces RCU stall warnings. > > OK. Are we sure we're not doing anything stupid in that code to make it > go that slowly? Each tce_setrange_multi_pSeriesLP() is a hypercall and KVM (if it is KVM and the call was not bounced to QEMU which I believe does not happen) walks through all 512 TCEs in the request and does TCE Kill for each of those TCEs which in turn are OPAL calls, and not just one per TCE but two - one for PHB's TCE and one for NPU's TCE. And I have a test patch to do TCE kills in powernv (not in OPAL), 40s figure was taken with this patch. So I agree it should be faster but it won't be 1-2s and for longer operations we will need this resched. > >> This adds cond_resched() to allow the scheduler to do context switching >> when it decides to. >> >> This loop is called from dma_set_mask() which is a sleepable context. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> arch/powerpc/platforms/pseries/iommu.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c >> index 889dc2e44b89..2b8de822272f 100644 >> --- a/arch/powerpc/platforms/pseries/iommu.c >> +++ b/arch/powerpc/platforms/pseries/iommu.c >> @@ -459,6 +459,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn, >> static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn, >> unsigned long num_pfn, void *arg) >> { >> + cond_resched(); >> return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg); >> } > > Why there and not in tce_setrange_multi_pSeriesLP() ? The other caller is iommu_mem_notifier and I am unsure about locking there, there may be nasty surprises. > I'm not sure what the maximum granularity walk_system_ram_range() will > ever call us with is. Contiguous blocks, which are too big. You're right, this resched better go to tce_setrange_multi_pSeriesLP.
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 889dc2e44b89..2b8de822272f 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -459,6 +459,7 @@ static int tce_setrange_multi_pSeriesLP(unsigned long start_pfn, static int tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn, unsigned long num_pfn, void *arg) { + cond_resched(); return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg); }
Mapping ~5.000.000 TCEs currently takes about 40s; this is the amount required for a 300GB VM with 64k IOMMU page size. Anything bigger than this produces RCU stall warnings. This adds cond_resched() to allow the scheduler to do context switching when it decides to. This loop is called from dma_set_mask() which is a sleepable context. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- arch/powerpc/platforms/pseries/iommu.c | 1 + 1 file changed, 1 insertion(+)