Message ID | 1575681159-30356-3-git-send-email-linuxram@us.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Enable IOMMU support for pseries Secure VMs | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (567dea0e848944848650d7fd27699e2de5d49353) |
snowpatch_ozlabs/build-ppc64le | warning | Build succeeded but added 1 new sparse warnings |
snowpatch_ozlabs/build-ppc64be | warning | Build succeeded but added 1 new sparse warnings |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 23 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
On 07/12/2019 12:12, Ram Pai wrote: > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > secure guests") > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > path for secure VMs, helped enable dma_direct path. This enabled > support for bounce-buffering through SWIOTLB. However it fails to > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > cases including, TCE mapping I/O pages, in the presence of a > IOMMU. > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Although I still do not totally understand the mechanics of this (swiotlb on top of huge DDW at 0x800.0000.0000.0000), this change looks reasonable anyway. > --- > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > index 67b5009..4e27d66 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -36,7 +36,6 @@ > #include <asm/udbg.h> > #include <asm/mmzone.h> > #include <asm/plpar_wrappers.h> > -#include <asm/svm.h> > #include <asm/ultravisor.h> > > #include "pseries.h" > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > of_reconfig_notifier_register(&iommu_reconfig_nb); > register_memory_notifier(&iommu_mem_nb); > > - /* > - * Secure guest memory is inacessible to devices so regular DMA isn't > - * possible. > - * > - * In that case keep devices' dma_map_ops as NULL so that the generic > - * DMA code path will use SWIOTLB to bounce buffers for DMA. > - */ > - if (!is_secure_guest()) > - set_pci_dma_ops(&dma_iommu_ops); > + set_pci_dma_ops(&dma_iommu_ops); > } > > static int __init disable_multitce(char *str) >
Hello Ram, Ram Pai <linuxram@us.ibm.com> writes: > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > secure guests") > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > path for secure VMs, helped enable dma_direct path. This enabled > support for bounce-buffering through SWIOTLB. However it fails to > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > cases including, TCE mapping I/O pages, in the presence of a > IOMMU. > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> > --- > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > index 67b5009..4e27d66 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -36,7 +36,6 @@ > #include <asm/udbg.h> > #include <asm/mmzone.h> > #include <asm/plpar_wrappers.h> > -#include <asm/svm.h> > #include <asm/ultravisor.h> > > #include "pseries.h" You still need to keep <asm/svm.h>, otherwise there won't be a definition of is_secure_guest() when CONFIG_PPC_SVM=n. -- Thiago Jung Bauermann IBM Linux Technology Center
Quoting Ram Pai (2019-12-06 19:12:39) > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > secure guests") > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > path for secure VMs, helped enable dma_direct path. This enabled > support for bounce-buffering through SWIOTLB. However it fails to > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > cases including, TCE mapping I/O pages, in the presence of a > IOMMU. Wasn't clear to me at first, but I guess the main gist of this series is that we want to continue to use SWIOTLB, but also need to create mappings of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) checks in DMA API functions to do the same. That makes sense, but one issue I see with that is that dma_iommu_map_bypass() only tests true if all the following are true: 1) the device requests a 64-bit DMA mask via dma_set_mask/dma_set_coherent_mask 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) dma_is_direct() checks don't have this limitation, so I think for anything cases, such as devices that use a smaller DMA mask, we'll end up falling back to the non-bypass functions in dma_iommu_ops, which will likely break for things like dma_alloc_coherent/dma_map_single since they won't use SWIOTLB pages and won't do the necessary calls to set_memory_unencrypted() to share those non-SWIOTLB buffers with hypervisor. Maybe that's ok, but I think we should be clearer about how to fail/handle these cases. Though I also agree with some concerns Alexey stated earlier: it seems wasteful to map the entire DDW window just so these bounce buffers can be mapped. Especially if you consider the lack of a mapping to be an additional safe-guard against things like buggy device implementations on the QEMU side. E.g. if we leaked pages to the hypervisor on accident, those pages wouldn't be immediately accessible to a device, and would still require additional work get past the IOMMU. What would it look like if we try to make all this work with disable_ddw passed to kernel command-line (or forced for is_secure_guest())? 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops, but an additional case or hook that considers is_secure_guest() might do it. 2) We'd also need to set up an IOMMU mapping for the bounce buffers via io_tlb_start/io_tlb_end. We could do it once, on-demand via dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or maybe in some init function. That also has the benefit of not requiring devices to support 64-bit DMA. Alternatively, we could continue to rely on the 64-bit DDW window, but modify call to enable_ddw() to only map the io_tlb_start/end range in the case of is_secure_guest(). This is a little cleaner implementation-wise since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks, but devices that don't support 64-bit will fail back to not using dma_direct_* ops and fail miserably. We'd probably want to handle that more gracefully. Or we handle both cases gracefully. To me it makes more sense to enable non-DDW case, then consider adding DDW case later if there's some reason why 64-bit DMA is needed. But would be good to hear if there are other opinions. > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> > --- > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > 1 file changed, 1 insertion(+), 10 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > index 67b5009..4e27d66 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -36,7 +36,6 @@ > #include <asm/udbg.h> > #include <asm/mmzone.h> > #include <asm/plpar_wrappers.h> > -#include <asm/svm.h> > #include <asm/ultravisor.h> > > #include "pseries.h" > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > of_reconfig_notifier_register(&iommu_reconfig_nb); > register_memory_notifier(&iommu_mem_nb); > > - /* > - * Secure guest memory is inacessible to devices so regular DMA isn't > - * possible. > - * > - * In that case keep devices' dma_map_ops as NULL so that the generic > - * DMA code path will use SWIOTLB to bounce buffers for DMA. > - */ > - if (!is_secure_guest()) > - set_pci_dma_ops(&dma_iommu_ops); > + set_pci_dma_ops(&dma_iommu_ops); > } > > static int __init disable_multitce(char *str) > -- > 1.8.3.1 >
On 11/12/2019 12:43, Michael Roth wrote: > Quoting Ram Pai (2019-12-06 19:12:39) >> Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on >> secure guests") >> disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops >> path for secure VMs, helped enable dma_direct path. This enabled >> support for bounce-buffering through SWIOTLB. However it fails to >> operate when IOMMU is enabled, since I/O pages are not TCE mapped. >> >> Renable dma_iommu_ops path for pseries Secure VMs. It handles all >> cases including, TCE mapping I/O pages, in the presence of a >> IOMMU. > > Wasn't clear to me at first, but I guess the main gist of this series is > that we want to continue to use SWIOTLB, but also need to create mappings > of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops > and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout > to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) > checks in DMA API functions to do the same. Correct. Took me a bit of time to realize what we got here :) We only rely on dma_iommu_ops::.dma_supported to write the DMA offset to a device (when creating a huge window), and after that we know it is mapped directly and swiotlb gets this 1<<59 offset via __phys_to_dma(). > That makes sense, but one issue I see with that is that > dma_iommu_map_bypass() only tests true if all the following are true: > > 1) the device requests a 64-bit DMA mask via > dma_set_mask/dma_set_coherent_mask > 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) > > dma_is_direct() checks don't have this limitation, so I think for > anything cases, such as devices that use a smaller DMA mask, we'll > end up falling back to the non-bypass functions in dma_iommu_ops, which > will likely break for things like dma_alloc_coherent/dma_map_single > since they won't use SWIOTLB pages and won't do the necessary calls to > set_memory_unencrypted() to share those non-SWIOTLB buffers with > hypervisor. > > Maybe that's ok, but I think we should be clearer about how to > fail/handle these cases. > > Though I also agree with some concerns Alexey stated earlier: it seems > wasteful to map the entire DDW window just so these bounce buffers can be > mapped. Especially if you consider the lack of a mapping to be an additional > safe-guard against things like buggy device implementations on the QEMU > side. E.g. if we leaked pages to the hypervisor on accident, those pages > wouldn't be immediately accessible to a device, and would still require > additional work get past the IOMMU. > > What would it look like if we try to make all this work with disable_ddw passed > to kernel command-line (or forced for is_secure_guest())? > > 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops, > but an additional case or hook that considers is_secure_guest() might do > it. > > 2) We'd also need to set up an IOMMU mapping for the bounce buffers via > io_tlb_start/io_tlb_end. We could do it once, on-demand via > dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or > maybe in some init function. io_tlb_start/io_tlb_end are only guaranteed to stay within 4GB and our default DMA window is 1GB (KVM) or 2GB (PowerVM), ok, we can define ARCH_LOW_ADDRESS_LIMIT as 1GB. But it has also been mentioned that we are likely to be having swiotlb buffers outside of the first 4GB as they are not just for crippled devices any more. So we are likely to have 64bit window, I'd just ditch the default window then, I have patches for this but every time I thought I have a use case, turned out that I did not. > That also has the benefit of not requiring devices to support 64-bit DMA. > > Alternatively, we could continue to rely on the 64-bit DDW window, but > modify call to enable_ddw() to only map the io_tlb_start/end range in > the case of is_secure_guest(). This is a little cleaner implementation-wise > since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks, but > devices that don't support 64-bit will fail back to not using dma_direct_* ops > and fail miserably. We'd probably want to handle that more gracefully. > > Or we handle both cases gracefully. To me it makes more sense to enable > non-DDW case, then consider adding DDW case later if there's some reason > why 64-bit DMA is needed. But would be good to hear if there are other > opinions. For now we need to do something with the H_PUT_TCE_INDIRECT's page - either disable multitce (but boot time increases) or share the page. The patch does the latter. Thanks, > >> >> Signed-off-by: Ram Pai <linuxram@us.ibm.com> >> --- >> arch/powerpc/platforms/pseries/iommu.c | 11 +---------- >> 1 file changed, 1 insertion(+), 10 deletions(-) >> >> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c >> index 67b5009..4e27d66 100644 >> --- a/arch/powerpc/platforms/pseries/iommu.c >> +++ b/arch/powerpc/platforms/pseries/iommu.c >> @@ -36,7 +36,6 @@ >> #include <asm/udbg.h> >> #include <asm/mmzone.h> >> #include <asm/plpar_wrappers.h> >> -#include <asm/svm.h> >> #include <asm/ultravisor.h> >> >> #include "pseries.h" >> @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) >> of_reconfig_notifier_register(&iommu_reconfig_nb); >> register_memory_notifier(&iommu_mem_nb); >> >> - /* >> - * Secure guest memory is inacessible to devices so regular DMA isn't >> - * possible. >> - * >> - * In that case keep devices' dma_map_ops as NULL so that the generic >> - * DMA code path will use SWIOTLB to bounce buffers for DMA. >> - */ >> - if (!is_secure_guest()) >> - set_pci_dma_ops(&dma_iommu_ops); >> + set_pci_dma_ops(&dma_iommu_ops); >> } >> >> static int __init disable_multitce(char *str) >> -- >> 1.8.3.1 >>
Quoting Alexey Kardashevskiy (2019-12-11 02:36:29) > > > On 11/12/2019 12:43, Michael Roth wrote: > > Quoting Ram Pai (2019-12-06 19:12:39) > >> Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > >> secure guests") > >> disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > >> path for secure VMs, helped enable dma_direct path. This enabled > >> support for bounce-buffering through SWIOTLB. However it fails to > >> operate when IOMMU is enabled, since I/O pages are not TCE mapped. > >> > >> Renable dma_iommu_ops path for pseries Secure VMs. It handles all > >> cases including, TCE mapping I/O pages, in the presence of a > >> IOMMU. > > > > Wasn't clear to me at first, but I guess the main gist of this series is > > that we want to continue to use SWIOTLB, but also need to create mappings > > of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops > > and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout > > to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) > > checks in DMA API functions to do the same. > > > Correct. Took me a bit of time to realize what we got here :) We only > rely on dma_iommu_ops::.dma_supported to write the DMA offset to a > device (when creating a huge window), and after that we know it is > mapped directly and swiotlb gets this 1<<59 offset via __phys_to_dma(). > > > > That makes sense, but one issue I see with that is that > > dma_iommu_map_bypass() only tests true if all the following are true: > > > > 1) the device requests a 64-bit DMA mask via > > dma_set_mask/dma_set_coherent_mask > > 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) > > > > dma_is_direct() checks don't have this limitation, so I think for > > anything cases, such as devices that use a smaller DMA mask, we'll > > end up falling back to the non-bypass functions in dma_iommu_ops, which > > will likely break for things like dma_alloc_coherent/dma_map_single > > since they won't use SWIOTLB pages and won't do the necessary calls to > > set_memory_unencrypted() to share those non-SWIOTLB buffers with > > hypervisor. > > > > Maybe that's ok, but I think we should be clearer about how to > > fail/handle these cases. > > > > Though I also agree with some concerns Alexey stated earlier: it seems > > wasteful to map the entire DDW window just so these bounce buffers can be > > mapped. Especially if you consider the lack of a mapping to be an additional > > safe-guard against things like buggy device implementations on the QEMU > > side. E.g. if we leaked pages to the hypervisor on accident, those pages > > wouldn't be immediately accessible to a device, and would still require > > additional work get past the IOMMU. > > > > What would it look like if we try to make all this work with disable_ddw passed > > to kernel command-line (or forced for is_secure_guest())? > > > > 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops, > > but an additional case or hook that considers is_secure_guest() might do > > it. > > > > 2) We'd also need to set up an IOMMU mapping for the bounce buffers via > > io_tlb_start/io_tlb_end. We could do it once, on-demand via > > dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or > > maybe in some init function. > > > io_tlb_start/io_tlb_end are only guaranteed to stay within 4GB and our > default DMA window is 1GB (KVM) or 2GB (PowerVM), ok, we can define > ARCH_LOW_ADDRESS_LIMIT as 1GB. True, and limiting allocations to under 1GB might be brittle (also saw a patching floating around that increased IO_TLB_DEFAULT_SIZE size to 1GB, which obviously wouldn't work out with this approach, but not sure if that's still needed or not: "powerpc/svm: Increase SWIOTLB buffer size") However that's only an issue if we insist on using an identity mapping in the IOMMU, which would be nice because non-IOMMU virtio would magically work, but since that's not a goal of this series I think we do have the option of mapping io_tlb_start at DMA address 0 (or thereabouts). We'd probably need to modify __phys_to_dma to treat archdata.dma_offset as a negative offset in this case, but it seems like it would work about the same as with DDW offset. But yah, it does make things a bit less appealing than what I was initially thinking with that approach... > > But it has also been mentioned that we are likely to be having swiotlb > buffers outside of the first 4GB as they are not just for crippled > devices any more. So we are likely to have 64bit window, I'd just ditch > the default window then, I have patches for this but every time I > thought I have a use case, turned out that I did not. Not sure I've seen this discussion, maybe it was on slack? By crippled devices do you mean virtio with IOMMU off? Isn't swiotlb buffer limited to under ARCH_LOW_ADDRESS_LIMIT in any case? Just trying to understand what changes we're anticipating there. > > > > That also has the benefit of not requiring devices to support 64-bit DMA. > > > > Alternatively, we could continue to rely on the 64-bit DDW window, but > > modify call to enable_ddw() to only map the io_tlb_start/end range in > > the case of is_secure_guest(). This is a little cleaner implementation-wise > > since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks, but > > devices that don't support 64-bit will fail back to not using dma_direct_* ops > > and fail miserably. We'd probably want to handle that more gracefully. > > > > Or we handle both cases gracefully. To me it makes more sense to enable > > non-DDW case, then consider adding DDW case later if there's some reason > > why 64-bit DMA is needed. But would be good to hear if there are other > > opinions. > > > For now we need to do something with the H_PUT_TCE_INDIRECT's page - > either disable multitce (but boot time increases) or share the page. The > patch does the latter. Thanks, I was sort of hoping the option of only mapping the bounce buffer (or avoiding DDW completely) would help here, but looks like the issue has more to do with clearing the default TCE table. Fair enough. Reverting to dma_iommu_ops does re-introduce some new failure paths for non 64-bit devices though, so I think it would be good to address that as part of this series. I think it would be sufficient to have dma_set_mask/dma_set_coherent_mask/dma_set_mask_and_coherent fail for non 64-bit masks. I think something like the following in dma_iommu_dma_supported() might do it: /* * Secure guests currently rely on 64-bit DMA window, which is only * utilized for devices that support 64-bit DMA masks */ if (is_secure_guest() && mask < DMA_BIT_MASK(64)) { dev_err(dev, "Warning: 64-bit DMA required when PEF enabled: mask 0x%08llx", mask); return 0; } That should let most drivers fail gracefully and make it clear what devices are broken (rather than silently misbehaving). It also makes things a bit clearer WRT what we expect to work with this applied. > > > > > >> > >> Signed-off-by: Ram Pai <linuxram@us.ibm.com> > >> --- > >> arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > >> 1 file changed, 1 insertion(+), 10 deletions(-) > >> > >> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > >> index 67b5009..4e27d66 100644 > >> --- a/arch/powerpc/platforms/pseries/iommu.c > >> +++ b/arch/powerpc/platforms/pseries/iommu.c > >> @@ -36,7 +36,6 @@ > >> #include <asm/udbg.h> > >> #include <asm/mmzone.h> > >> #include <asm/plpar_wrappers.h> > >> -#include <asm/svm.h> > >> #include <asm/ultravisor.h> > >> > >> #include "pseries.h" > >> @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > >> of_reconfig_notifier_register(&iommu_reconfig_nb); > >> register_memory_notifier(&iommu_mem_nb); > >> > >> - /* > >> - * Secure guest memory is inacessible to devices so regular DMA isn't > >> - * possible. > >> - * > >> - * In that case keep devices' dma_map_ops as NULL so that the generic > >> - * DMA code path will use SWIOTLB to bounce buffers for DMA. > >> - */ > >> - if (!is_secure_guest()) > >> - set_pci_dma_ops(&dma_iommu_ops); > >> + set_pci_dma_ops(&dma_iommu_ops); > >> } > >> > >> static int __init disable_multitce(char *str) > >> -- > >> 1.8.3.1 > >> > > -- > Alexey
On Wed, Dec 11, 2019 at 12:07:17PM -0600, Michael Roth wrote: > > io_tlb_start/io_tlb_end are only guaranteed to stay within 4GB and our > > default DMA window is 1GB (KVM) or 2GB (PowerVM), ok, we can define > > ARCH_LOW_ADDRESS_LIMIT as 1GB. > > True, and limiting allocations to under 1GB might be brittle (also saw a > patching floating around that increased IO_TLB_DEFAULT_SIZE size to 1GB, > which obviously wouldn't work out with this approach, but not sure if > that's still needed or not: "powerpc/svm: Increase SWIOTLB buffer size") FYI, there is a patch out there that allocates the powerpc swiotlb from the boottom of the memblock area instead of the top to fix a 85xx regression. Also the AMD folks have been asking about non-GFP_DMA32 swiotlb pools as they have the same bounce buffer issue with SEV. I think it is entirely doable to have multiple swiotlb pool, I just need a volunteer to implement that. > > However that's only an issue if we insist on using an identity mapping > in the IOMMU, which would be nice because non-IOMMU virtio would > magically work, but since that's not a goal of this series I think we do > have the option of mapping io_tlb_start at DMA address 0 (or > thereabouts). > > We'd probably need to modify __phys_to_dma to treat archdata.dma_offset > as a negative offset in this case, but it seems like it would work about > the same as with DDW offset. Or switch to the generic version of __phys_to_dma that has a negative offset. We'd just need to look into a signed value for dma_pfn_offset to allow for the existing platforms that need the current positive offset.
On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote: > Quoting Ram Pai (2019-12-06 19:12:39) > > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > > secure guests") > > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > > path for secure VMs, helped enable dma_direct path. This enabled > > support for bounce-buffering through SWIOTLB. However it fails to > > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > > cases including, TCE mapping I/O pages, in the presence of a > > IOMMU. > > Wasn't clear to me at first, but I guess the main gist of this series is > that we want to continue to use SWIOTLB, but also need to create mappings > of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops > and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout > to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) > checks in DMA API functions to do the same. > > That makes sense, but one issue I see with that is that > dma_iommu_map_bypass() only tests true if all the following are true: > > 1) the device requests a 64-bit DMA mask via > dma_set_mask/dma_set_coherent_mask > 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) > > dma_is_direct() checks don't have this limitation, so I think for > anything cases, such as devices that use a smaller DMA mask, we'll > end up falling back to the non-bypass functions in dma_iommu_ops, which > will likely break for things like dma_alloc_coherent/dma_map_single > since they won't use SWIOTLB pages and won't do the necessary calls to > set_memory_unencrypted() to share those non-SWIOTLB buffers with > hypervisor. > > Maybe that's ok, but I think we should be clearer about how to > fail/handle these cases. Yes. makes sense. Device that cannot handle 64bit dma mask will not work. > > Though I also agree with some concerns Alexey stated earlier: it seems > wasteful to map the entire DDW window just so these bounce buffers can be > mapped. Especially if you consider the lack of a mapping to be an additional > safe-guard against things like buggy device implementations on the QEMU > side. E.g. if we leaked pages to the hypervisor on accident, those pages > wouldn't be immediately accessible to a device, and would still require > additional work get past the IOMMU. Well, an accidental unintented page leak to the hypervisor, is a very bad thing, regardless of any DMA mapping. The device may not be able to access it, but the hypervisor still can access it. > > What would it look like if we try to make all this work with disable_ddw passed > to kernel command-line (or forced for is_secure_guest())? > > 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops, > but an additional case or hook that considers is_secure_guest() might do > it. > > 2) We'd also need to set up an IOMMU mapping for the bounce buffers via > io_tlb_start/io_tlb_end. We could do it once, on-demand via > dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or > maybe in some init function. Hmm... i not sure how to accomplish (2). we need use some DDW window to setup the mappings. right? If disable_ddw is set, there wont be any ddw. What am i missing? > > That also has the benefit of not requiring devices to support 64-bit DMA. > > Alternatively, we could continue to rely on the 64-bit DDW window, but > modify call to enable_ddw() to only map the io_tlb_start/end range in > the case of is_secure_guest(). This is a little cleaner implementation-wise > since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks. I have been experimenting with this. Trying to map only the memory range from io_tlb_start/io_tlb_end though the 64-bit ddw window. But due to some reason, it wants the io_tlb_start to be aligned to some boundary. It looks like a 2^28 boundary. Not sure what dictates that boundary. > , but > devices that don't support 64-bit will fail back to not using dma_direct_* ops > and fail miserably. We'd probably want to handle that more gracefully. Yes i will put a warning message to indicate the failure. > > Or we handle both cases gracefully. To me it makes more sense to enable > non-DDW case, then consider adding DDW case later if there's some reason > why 64-bit DMA is needed. But would be good to hear if there are other > opinions. educate me a bit here. What is a non-DDW case? is it possible for a device to acccess memory, in the presence of a IOMMU, without a window-mapping? > > > > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> > > --- > > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > > index 67b5009..4e27d66 100644 > > --- a/arch/powerpc/platforms/pseries/iommu.c > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > @@ -36,7 +36,6 @@ > > #include <asm/udbg.h> > > #include <asm/mmzone.h> > > #include <asm/plpar_wrappers.h> > > -#include <asm/svm.h> > > #include <asm/ultravisor.h> > > > > #include "pseries.h" > > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > > of_reconfig_notifier_register(&iommu_reconfig_nb); > > register_memory_notifier(&iommu_mem_nb); > > > > - /* > > - * Secure guest memory is inacessible to devices so regular DMA isn't > > - * possible. > > - * > > - * In that case keep devices' dma_map_ops as NULL so that the generic > > - * DMA code path will use SWIOTLB to bounce buffers for DMA. > > - */ > > - if (!is_secure_guest()) > > - set_pci_dma_ops(&dma_iommu_ops); > > + set_pci_dma_ops(&dma_iommu_ops); > > } > > > > static int __init disable_multitce(char *str) > > -- > > 1.8.3.1 > >
Quoting Ram Pai (2019-12-12 00:45:02) > On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote: > > Quoting Ram Pai (2019-12-06 19:12:39) > > > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > > > secure guests") > > > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > > > path for secure VMs, helped enable dma_direct path. This enabled > > > support for bounce-buffering through SWIOTLB. However it fails to > > > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > > > > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > > > cases including, TCE mapping I/O pages, in the presence of a > > > IOMMU. > > > > Wasn't clear to me at first, but I guess the main gist of this series is > > that we want to continue to use SWIOTLB, but also need to create mappings > > of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops > > and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout > > to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) > > checks in DMA API functions to do the same. > > > > That makes sense, but one issue I see with that is that > > dma_iommu_map_bypass() only tests true if all the following are true: > > > > 1) the device requests a 64-bit DMA mask via > > dma_set_mask/dma_set_coherent_mask > > 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) > > > > dma_is_direct() checks don't have this limitation, so I think for > > anything cases, such as devices that use a smaller DMA mask, we'll > > end up falling back to the non-bypass functions in dma_iommu_ops, which > > will likely break for things like dma_alloc_coherent/dma_map_single > > since they won't use SWIOTLB pages and won't do the necessary calls to > > set_memory_unencrypted() to share those non-SWIOTLB buffers with > > hypervisor. > > > > Maybe that's ok, but I think we should be clearer about how to > > fail/handle these cases. > > Yes. makes sense. Device that cannot handle 64bit dma mask will not work. > > > > > Though I also agree with some concerns Alexey stated earlier: it seems > > wasteful to map the entire DDW window just so these bounce buffers can be > > mapped. Especially if you consider the lack of a mapping to be an additional > > safe-guard against things like buggy device implementations on the QEMU > > side. E.g. if we leaked pages to the hypervisor on accident, those pages > > wouldn't be immediately accessible to a device, and would still require > > additional work get past the IOMMU. > > Well, an accidental unintented page leak to the hypervisor, is a very > bad thing, regardless of any DMA mapping. The device may not be able to > access it, but the hypervisor still can access it. Agreed, but if IOMMU can provide additional isolation we should make use of it when reasonable. > > > > > What would it look like if we try to make all this work with disable_ddw passed > > to kernel command-line (or forced for is_secure_guest())? > > > > 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* ops, > > but an additional case or hook that considers is_secure_guest() might do > > it. > > > > 2) We'd also need to set up an IOMMU mapping for the bounce buffers via > > io_tlb_start/io_tlb_end. We could do it once, on-demand via > > dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or > > maybe in some init function. > > Hmm... i not sure how to accomplish (2). we need use some DDW window > to setup the mappings. right? If disable_ddw is set, there wont be any > ddw. What am i missing? We have 2 windows, the default 32-bit window that normally covers DMA addresses 0..1GB, and an additional DDW window that's created on demand for 64-bit devices (since they can address a full linear mapping of all guest memory at DMA address 1<<59). Your current patch uses the latter, but we could potentially use the 32-bit window since we only need to map the SWIOTLB pages. Not saying that's necessarily better, but one upside is it only requires devices to support 32-bit DMA addressing, which is a larger class of devices than those that support 64-bit (since 64-bit device drivers generally have a 32-bit fallback). Required changes are a bit more pervasive though. It might make sense to get both scenarios working at some point, but maybe it's okay to handle 64-bit first. 64-bit does give us more legroom if we anticipate changes in where the SWIOTLB memory is allocated from, or expect it to become larger than 1GB. > > > > > That also has the benefit of not requiring devices to support 64-bit DMA. > > > > Alternatively, we could continue to rely on the 64-bit DDW window, but > > modify call to enable_ddw() to only map the io_tlb_start/end range in > > the case of is_secure_guest(). This is a little cleaner implementation-wise > > since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks. > > I have been experimenting with this. Trying to map only the memory > range from io_tlb_start/io_tlb_end though the 64-bit ddw window. But > due to some reason, it wants the io_tlb_start to be aligned to some > boundary. It looks like a 2^28 boundary. Not sure what dictates that > boundary. Not sure, but that might be related to 256MB LMB size. Could also be the page size of the DDW window, but seems large for that. In any case I think it would be okay if we needed to truncate io_tlb_start to a lower 256MB-boundary and the subsequent range. We have a few more mappings than strictly necessary, but it's still better than all guest memory. > > > > , but > > devices that don't support 64-bit will fail back to not using dma_direct_* ops > > and fail miserably. We'd probably want to handle that more gracefully. > > Yes i will put a warning message to indicate the failure. > > > > > Or we handle both cases gracefully. To me it makes more sense to enable > > non-DDW case, then consider adding DDW case later if there's some reason > > why 64-bit DMA is needed. But would be good to hear if there are other > > opinions. > > educate me a bit here. What is a non-DDW case? is it possible for a > device to acccess memory, in the presence of a IOMMU, without a window-mapping? It's not indeed, but we have the default 32-bit window for the non-DDW case. See above. > > > > > > > > > Signed-off-by: Ram Pai <linuxram@us.ibm.com> > > > --- > > > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c > > > index 67b5009..4e27d66 100644 > > > --- a/arch/powerpc/platforms/pseries/iommu.c > > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > > @@ -36,7 +36,6 @@ > > > #include <asm/udbg.h> > > > #include <asm/mmzone.h> > > > #include <asm/plpar_wrappers.h> > > > -#include <asm/svm.h> > > > #include <asm/ultravisor.h> > > > > > > #include "pseries.h" > > > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > > > of_reconfig_notifier_register(&iommu_reconfig_nb); > > > register_memory_notifier(&iommu_mem_nb); > > > > > > - /* > > > - * Secure guest memory is inacessible to devices so regular DMA isn't > > > - * possible. > > > - * > > > - * In that case keep devices' dma_map_ops as NULL so that the generic > > > - * DMA code path will use SWIOTLB to bounce buffers for DMA. > > > - */ > > > - if (!is_secure_guest()) > > > - set_pci_dma_ops(&dma_iommu_ops); > > > + set_pci_dma_ops(&dma_iommu_ops); > > > } > > > > > > static int __init disable_multitce(char *str) > > > -- > > > 1.8.3.1 > > > > > -- > Ram Pai
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 67b5009..4e27d66 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -36,7 +36,6 @@ #include <asm/udbg.h> #include <asm/mmzone.h> #include <asm/plpar_wrappers.h> -#include <asm/svm.h> #include <asm/ultravisor.h> #include "pseries.h" @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) of_reconfig_notifier_register(&iommu_reconfig_nb); register_memory_notifier(&iommu_mem_nb); - /* - * Secure guest memory is inacessible to devices so regular DMA isn't - * possible. - * - * In that case keep devices' dma_map_ops as NULL so that the generic - * DMA code path will use SWIOTLB to bounce buffers for DMA. - */ - if (!is_secure_guest()) - set_pci_dma_ops(&dma_iommu_ops); + set_pci_dma_ops(&dma_iommu_ops); } static int __init disable_multitce(char *str)
Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests") disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops path for secure VMs, helped enable dma_direct path. This enabled support for bounce-buffering through SWIOTLB. However it fails to operate when IOMMU is enabled, since I/O pages are not TCE mapped. Renable dma_iommu_ops path for pseries Secure VMs. It handles all cases including, TCE mapping I/O pages, in the presence of a IOMMU. Signed-off-by: Ram Pai <linuxram@us.ibm.com> --- arch/powerpc/platforms/pseries/iommu.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)