From patchwork Thu Jun 7 08:44:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 926200 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 411fH62qV2z9s01 for ; Thu, 7 Jun 2018 18:44:34 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932616AbeFGIoc (ORCPT ); Thu, 7 Jun 2018 04:44:32 -0400 Received: from 107-173-13-209-host.colocrossing.com ([107.173.13.209]:60795 "EHLO ozlabs.ru" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753423AbeFGIo3 (ORCPT ); Thu, 7 Jun 2018 04:44:29 -0400 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 764DAAE80026; Thu, 7 Jun 2018 04:43:20 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , David Gibson , kvm-ppc@vger.kernel.org, Alex Williamson , Benjamin Herrenschmidt , Ram Pai , kvm@vger.kernel.org, Alistair Popple Subject: [RFC PATCH kernel 1/5] vfio/spapr_tce: Simplify page contained test Date: Thu, 7 Jun 2018 18:44:16 +1000 Message-Id: <20180607084420.29513-2-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180607084420.29513-1-aik@ozlabs.ru> References: <20180607084420.29513-1-aik@ozlabs.ru> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org The test function takes a page struct pointer which is not used by either of two callers in any other way, make it simple and just pass a physical address there. This should cause no behavioral change now but later we may start supporting host addresses for memory devices which are not backed with page structs. Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson --- drivers/vfio/vfio_iommu_spapr_tce.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 759a5bd..2c4a048 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -249,8 +249,9 @@ static void tce_iommu_userspace_view_free(struct iommu_table *tbl, decrement_locked_vm(mm, cb >> PAGE_SHIFT); } -static bool tce_page_is_contained(struct page *page, unsigned page_shift) +static bool tce_page_is_contained(unsigned long hpa, unsigned page_shift) { + struct page *page = pfn_to_page(hpa >> PAGE_SHIFT); /* * Check that the TCE table granularity is not bigger than the size of * a page we just found. Otherwise the hardware can get access to @@ -549,7 +550,6 @@ static long tce_iommu_build(struct tce_container *container, enum dma_data_direction direction) { long i, ret = 0; - struct page *page; unsigned long hpa; enum dma_data_direction dirtmp; @@ -560,8 +560,7 @@ static long tce_iommu_build(struct tce_container *container, if (ret) break; - page = pfn_to_page(hpa >> PAGE_SHIFT); - if (!tce_page_is_contained(page, tbl->it_page_shift)) { + if (!tce_page_is_contained(hpa, tbl->it_page_shift)) { ret = -EPERM; break; } @@ -595,7 +594,6 @@ static long tce_iommu_build_v2(struct tce_container *container, enum dma_data_direction direction) { long i, ret = 0; - struct page *page; unsigned long hpa; enum dma_data_direction dirtmp; @@ -615,8 +613,7 @@ static long tce_iommu_build_v2(struct tce_container *container, if (ret) break; - page = pfn_to_page(hpa >> PAGE_SHIFT); - if (!tce_page_is_contained(page, tbl->it_page_shift)) { + if (!tce_page_is_contained(hpa, tbl->it_page_shift)) { ret = -EPERM; break; } From patchwork Thu Jun 7 08:44:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 926202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 411fH80S9fzB3sh for ; Thu, 7 Jun 2018 18:44:36 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753325AbeFGIod (ORCPT ); Thu, 7 Jun 2018 04:44:33 -0400 Received: from 107-173-13-209-host.colocrossing.com ([107.173.13.209]:33078 "EHLO ozlabs.ru" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S932569AbeFGIob (ORCPT ); Thu, 7 Jun 2018 04:44:31 -0400 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 3DF25AE8003B; Thu, 7 Jun 2018 04:43:23 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , David Gibson , kvm-ppc@vger.kernel.org, Alex Williamson , Benjamin Herrenschmidt , Ram Pai , kvm@vger.kernel.org, Alistair Popple Subject: [RFC PATCH kernel 2/5] powerpc/iommu_context: Change referencing in API Date: Thu, 7 Jun 2018 18:44:17 +1000 Message-Id: <20180607084420.29513-3-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180607084420.29513-1-aik@ozlabs.ru> References: <20180607084420.29513-1-aik@ozlabs.ru> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org At the moment a single function - mm_iommu_get() - allocates a new region or just references it if it is already registered with the current MM context. We are going to allow API to be used for memory devices and different variant of mm_iommu_get() will be needed so let's move referencing part to where it belongs - mm_iommu_find(). This turns mm_iommu_get() into a wrapper as the actual function will be extended later and renames it to mm_iommu_new() to illustrate the change. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/mmu_context.h | 2 +- arch/powerpc/mm/mmu_context_iommu.c | 19 +++++++++++++++---- drivers/vfio/vfio_iommu_spapr_tce.c | 21 +++++++++++---------- 3 files changed, 27 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 1835ca1..b598ec4 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -21,7 +21,7 @@ struct mm_iommu_table_group_mem_t; extern int isolate_lru_page(struct page *page); /* from internal.h */ extern bool mm_iommu_preregistered(struct mm_struct *mm); -extern long mm_iommu_get(struct mm_struct *mm, +extern long mm_iommu_new(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_struct *mm, diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 4c615fc..6b471d2 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -120,7 +120,8 @@ static int mm_iommu_move_page_from_cma(struct page *page) return 0; } -long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, +static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, + unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { struct mm_iommu_table_group_mem_t *mem; @@ -132,8 +133,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next) { if ((mem->ua == ua) && (mem->entries == entries)) { - ++mem->used; - *pmem = mem; + ret = -EBUSY; goto unlock_exit; } @@ -218,7 +218,13 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries, return ret; } -EXPORT_SYMBOL_GPL(mm_iommu_get); + +long mm_iommu_new(struct mm_struct *mm, unsigned long ua, unsigned long entries, + struct mm_iommu_table_group_mem_t **pmem) +{ + return mm_iommu_do_alloc(mm, ua, entries, pmem); +} +EXPORT_SYMBOL_GPL(mm_iommu_new); static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem) { @@ -337,13 +343,18 @@ struct mm_iommu_table_group_mem_t *mm_iommu_find(struct mm_struct *mm, { struct mm_iommu_table_group_mem_t *mem, *ret = NULL; + mutex_lock(&mem_list_mutex); + list_for_each_entry_rcu(mem, &mm->context.iommu_group_mem_list, next) { if ((mem->ua == ua) && (mem->entries == entries)) { ret = mem; + ++mem->used; break; } } + mutex_unlock(&mem_list_mutex); + return ret; } EXPORT_SYMBOL_GPL(mm_iommu_find); diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 2c4a048..7f1effd 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -149,9 +149,9 @@ static long tce_iommu_prereg_free(struct tce_container *container, static long tce_iommu_unregister_pages(struct tce_container *container, __u64 vaddr, __u64 size) { + long ret = -ENOENT; struct mm_iommu_table_group_mem_t *mem; struct tce_iommu_prereg *tcemem; - bool found = false; if ((vaddr & ~PAGE_MASK) || (size & ~PAGE_MASK)) return -EINVAL; @@ -162,15 +162,14 @@ static long tce_iommu_unregister_pages(struct tce_container *container, list_for_each_entry(tcemem, &container->prereg_list, next) { if (tcemem->mem == mem) { - found = true; + ret = tce_iommu_prereg_free(container, tcemem); break; } } - if (!found) - return -ENOENT; + mm_iommu_put(container->mm, mem); - return tce_iommu_prereg_free(container, tcemem); + return ret; } static long tce_iommu_register_pages(struct tce_container *container, @@ -188,15 +187,17 @@ static long tce_iommu_register_pages(struct tce_container *container, mem = mm_iommu_find(container->mm, vaddr, entries); if (mem) { list_for_each_entry(tcemem, &container->prereg_list, next) { - if (tcemem->mem == mem) + if (tcemem->mem == mem) { + mm_iommu_put(container->mm, mem); return -EBUSY; + } } + } else { + ret = mm_iommu_new(container->mm, vaddr, entries, &mem); + if (ret) + return ret; } - ret = mm_iommu_get(container->mm, vaddr, entries, &mem); - if (ret) - return ret; - tcemem = kzalloc(sizeof(*tcemem), GFP_KERNEL); if (!tcemem) { mm_iommu_put(container->mm, mem); From patchwork Thu Jun 7 08:44:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 926203 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 411fH90RJGz9s4w for ; Thu, 7 Jun 2018 18:44:37 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932337AbeFGIof (ORCPT ); Thu, 7 Jun 2018 04:44:35 -0400 Received: from 107-173-13-209-host.colocrossing.com ([107.173.13.209]:33503 "EHLO ozlabs.ru" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753387AbeFGIoe (ORCPT ); Thu, 7 Jun 2018 04:44:34 -0400 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 02F1BAE801DE; Thu, 7 Jun 2018 04:43:25 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , David Gibson , kvm-ppc@vger.kernel.org, Alex Williamson , Benjamin Herrenschmidt , Ram Pai , kvm@vger.kernel.org, Alistair Popple Subject: [RFC PATCH kernel 3/5] powerpc/iommu: Do not pin memory of a memory device Date: Thu, 7 Jun 2018 18:44:18 +1000 Message-Id: <20180607084420.29513-4-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180607084420.29513-1-aik@ozlabs.ru> References: <20180607084420.29513-1-aik@ozlabs.ru> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org This new memory does not have page structs as it is not hotplugged to the host so gup() will fail anyway. This registers a new mapping in memory context so the user of this API does not have to worry about the nature of this memory. Also, since host addresses may not be backed with page structs, this adds a workaround to iommu_tce_xchg() to avoid putting absent page structs. realmode_pfn_to_page() is used there as, unline its virtmode counterpart, it actually walks through the list of vmemmap_backing. The same is used in tce_page_is_contained() to drop the check for now. Signed-off-by: Alexey Kardashevskiy # Conflicts: # arch/powerpc/mm/mmu_context_iommu.c --- arch/powerpc/include/asm/mmu_context.h | 3 ++ arch/powerpc/kernel/iommu.c | 8 +++-- arch/powerpc/mm/mmu_context_iommu.c | 55 +++++++++++++++++++++++++++------- drivers/vfio/vfio_iommu_spapr_tce.c | 12 +++++++- 4 files changed, 65 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index b598ec4..0c14495 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -24,6 +24,9 @@ extern bool mm_iommu_preregistered(struct mm_struct *mm); extern long mm_iommu_new(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem); +extern long mm_iommu_newdev(struct mm_struct *mm, unsigned long ua, + unsigned long entries, unsigned long dev_hpa, + struct mm_iommu_table_group_mem_t **pmem); extern long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem); extern void mm_iommu_init(struct mm_struct *mm); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index af7a20d..fc985a5 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1001,8 +1001,12 @@ long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry, ret = tbl->it_ops->exchange(tbl, entry, hpa, direction); if (!ret && ((*direction == DMA_FROM_DEVICE) || - (*direction == DMA_BIDIRECTIONAL))) - SetPageDirty(pfn_to_page(*hpa >> PAGE_SHIFT)); + (*direction == DMA_BIDIRECTIONAL))) { + struct page *pg = __va(realmode_pfn_to_page(*hpa >> PAGE_SHIFT)); + + if (pg) + SetPageDirty(pg); + } /* if (unlikely(ret)) pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c index 6b471d2..b132924 100644 --- a/arch/powerpc/mm/mmu_context_iommu.c +++ b/arch/powerpc/mm/mmu_context_iommu.c @@ -30,6 +30,8 @@ struct mm_iommu_table_group_mem_t { u64 ua; /* userspace address */ u64 entries; /* number of entries in hpas[] */ u64 *hpas; /* vmalloc'ed */ +#define MM_IOMMU_TABLE_INVALID_HPA ((uint64_t)-1) + u64 dev_hpa; /* Device memory base address */ }; static long mm_iommu_adjust_locked_vm(struct mm_struct *mm, @@ -121,7 +123,7 @@ static int mm_iommu_move_page_from_cma(struct page *page) } static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, - unsigned long entries, + unsigned long entries, unsigned long dev_hpa, struct mm_iommu_table_group_mem_t **pmem) { struct mm_iommu_table_group_mem_t *mem; @@ -147,11 +149,13 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, } - ret = mm_iommu_adjust_locked_vm(mm, entries, true); - if (ret) - goto unlock_exit; + if (dev_hpa == MM_IOMMU_TABLE_INVALID_HPA) { + ret = mm_iommu_adjust_locked_vm(mm, entries, true); + if (ret) + goto unlock_exit; - locked_entries = entries; + locked_entries = entries; + } mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) { @@ -159,6 +163,11 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, goto unlock_exit; } + if (dev_hpa != MM_IOMMU_TABLE_INVALID_HPA) { + mem->dev_hpa = dev_hpa; + goto good_exit; + } + mem->hpas = vzalloc(entries * sizeof(mem->hpas[0])); if (!mem->hpas) { kfree(mem); @@ -202,6 +211,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT; } +good_exit: atomic64_set(&mem->mapped, 1); mem->used = 1; mem->ua = ua; @@ -222,15 +232,27 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua, long mm_iommu_new(struct mm_struct *mm, unsigned long ua, unsigned long entries, struct mm_iommu_table_group_mem_t **pmem) { - return mm_iommu_do_alloc(mm, ua, entries, pmem); + return mm_iommu_do_alloc(mm, ua, entries, MM_IOMMU_TABLE_INVALID_HPA, + pmem); } EXPORT_SYMBOL_GPL(mm_iommu_new); +long mm_iommu_newdev(struct mm_struct *mm, unsigned long ua, + unsigned long entries, unsigned long dev_hpa, + struct mm_iommu_table_group_mem_t **pmem) +{ + return mm_iommu_do_alloc(mm, ua, entries, dev_hpa, pmem); +} +EXPORT_SYMBOL_GPL(mm_iommu_newdev); + static void mm_iommu_unpin(struct mm_iommu_table_group_mem_t *mem) { long i; struct page *page = NULL; + if (!mem->hpas) + return; + for (i = 0; i < mem->entries; ++i) { if (!mem->hpas[i]) continue; @@ -269,6 +291,7 @@ static void mm_iommu_release(struct mm_iommu_table_group_mem_t *mem) long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem) { long ret = 0; + unsigned long entries; mutex_lock(&mem_list_mutex); @@ -290,9 +313,11 @@ long mm_iommu_put(struct mm_struct *mm, struct mm_iommu_table_group_mem_t *mem) } /* @mapped became 0 so now mappings are disabled, release the region */ + entries = mem->entries; mm_iommu_release(mem); - mm_iommu_adjust_locked_vm(mm, mem->entries, false); + if (mem->dev_hpa != MM_IOMMU_TABLE_INVALID_HPA) + mm_iommu_adjust_locked_vm(mm, entries, false); unlock_exit: mutex_unlock(&mem_list_mutex); @@ -363,11 +388,17 @@ long mm_iommu_ua_to_hpa(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; - u64 *va = &mem->hpas[entry]; + u64 *va; if (entry >= mem->entries) return -EFAULT; + if (!mem->hpas) { + *hpa = mem->dev_hpa + (ua - mem->ua); + return 0; + } + + va = &mem->hpas[entry]; *hpa = *va | (ua & ~PAGE_MASK); return 0; @@ -378,13 +409,17 @@ long mm_iommu_ua_to_hpa_rm(struct mm_iommu_table_group_mem_t *mem, unsigned long ua, unsigned long *hpa) { const long entry = (ua - mem->ua) >> PAGE_SHIFT; - void *va = &mem->hpas[entry]; unsigned long *pa; if (entry >= mem->entries) return -EFAULT; - pa = (void *) vmalloc_to_phys(va); + if (!mem->hpas) { + *hpa = mem->dev_hpa + (ua - mem->ua); + return 0; + } + + pa = (void *) vmalloc_to_phys(&mem->hpas[entry]); if (!pa) return -EFAULT; diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 7f1effd..47071f3 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -252,7 +252,17 @@ static void tce_iommu_userspace_view_free(struct iommu_table *tbl, static bool tce_page_is_contained(unsigned long hpa, unsigned page_shift) { - struct page *page = pfn_to_page(hpa >> PAGE_SHIFT); + struct page *page = __va(realmode_pfn_to_page(hpa >> PAGE_SHIFT)); + + /* + * If there not page, we assume it is a device memory and therefore + * it is contigous and always pinned. + * + * TODO: test device boundaries? + */ + if (!page) + return true; + /* * Check that the TCE table granularity is not bigger than the size of * a page we just found. Otherwise the hardware can get access to From patchwork Thu Jun 7 08:44:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 926204 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 411fHB19zfzB3t7 for ; Thu, 7 Jun 2018 18:44:38 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932417AbeFGIoh (ORCPT ); Thu, 7 Jun 2018 04:44:37 -0400 Received: from 107-173-13-209-host.colocrossing.com ([107.173.13.209]:33942 "EHLO ozlabs.ru" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753387AbeFGIog (ORCPT ); Thu, 7 Jun 2018 04:44:36 -0400 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id BC0F0AE801E1; Thu, 7 Jun 2018 04:43:28 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , David Gibson , kvm-ppc@vger.kernel.org, Alex Williamson , Benjamin Herrenschmidt , Ram Pai , kvm@vger.kernel.org, Alistair Popple Subject: [RFC PATCH kernel 4/5] vfio_pci: Allow mapping extra regions Date: Thu, 7 Jun 2018 18:44:19 +1000 Message-Id: <20180607084420.29513-5-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180607084420.29513-1-aik@ozlabs.ru> References: <20180607084420.29513-1-aik@ozlabs.ru> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org Signed-off-by: Alexey Kardashevskiy --- drivers/vfio/pci/vfio_pci_private.h | 3 +++ drivers/vfio/pci/vfio_pci.c | 10 ++++++++-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index cde3b5d..86aab05 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -59,6 +59,9 @@ struct vfio_pci_regops { size_t count, loff_t *ppos, bool iswrite); void (*release)(struct vfio_pci_device *vdev, struct vfio_pci_region *region); + int (*mmap)(struct vfio_pci_device *vdev, + struct vfio_pci_region *region, + struct vm_area_struct *vma); }; struct vfio_pci_region { diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 3729937..7bddf1e 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1123,10 +1123,16 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) return -EINVAL; if ((vma->vm_flags & VM_SHARED) == 0) return -EINVAL; + if (index >= VFIO_PCI_NUM_REGIONS) { + int regnum = index - VFIO_PCI_NUM_REGIONS; + struct vfio_pci_region *region = vdev->region + regnum; + + if (region && region->ops && region->ops->mmap) + return region->ops->mmap(vdev, region, vma); + return -EINVAL; + } if (index >= VFIO_PCI_ROM_REGION_INDEX) return -EINVAL; - if (!vdev->bar_mmap_supported[index]) - return -EINVAL; phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); req_len = vma->vm_end - vma->vm_start; From patchwork Thu Jun 7 08:44:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexey Kardashevskiy X-Patchwork-Id: 926205 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ozlabs.ru Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 411fHF3PwNz9s7G for ; Thu, 7 Jun 2018 18:44:41 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753409AbeFGIok (ORCPT ); Thu, 7 Jun 2018 04:44:40 -0400 Received: from 107-173-13-209-host.colocrossing.com ([107.173.13.209]:34523 "EHLO ozlabs.ru" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753292AbeFGIoj (ORCPT ); Thu, 7 Jun 2018 04:44:39 -0400 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 50ED7AE801E3; Thu, 7 Jun 2018 04:43:31 -0400 (EDT) From: Alexey Kardashevskiy To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy , David Gibson , kvm-ppc@vger.kernel.org, Alex Williamson , Benjamin Herrenschmidt , Ram Pai , kvm@vger.kernel.org, Alistair Popple Subject: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Date: Thu, 7 Jun 2018 18:44:20 +1000 Message-Id: <20180607084420.29513-6-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180607084420.29513-1-aik@ozlabs.ru> References: <20180607084420.29513-1-aik@ozlabs.ru> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org Some POWER9 chips come with special NVLink2 links which provide cacheable memory access to the RAM physically located on NVIDIA GPU. This memory is presented to a host via the device tree but remains offline until the NVIDIA driver onlines it. This exports this RAM to the userspace as a new region so the NVIDIA driver in the guest can train these links and online GPU RAM. Signed-off-by: Alexey Kardashevskiy --- drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/vfio_pci_private.h | 8 ++ include/uapi/linux/vfio.h | 3 + drivers/vfio/pci/vfio_pci.c | 9 ++ drivers/vfio/pci/vfio_pci_nvlink2.c | 190 ++++++++++++++++++++++++++++++++++++ drivers/vfio/pci/Kconfig | 4 + 6 files changed, 215 insertions(+) create mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 76d8ec0..9662c06 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,5 +1,6 @@ vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o +vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o obj-$(CONFIG_VFIO_PCI) += vfio-pci.o diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 86aab05..7115b9b 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -160,4 +160,12 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev) return -ENODEV; } #endif +#ifdef CONFIG_VFIO_PCI_NVLINK2 +extern int vfio_pci_nvlink2_init(struct vfio_pci_device *vdev); +#else +static inline int vfio_pci_nvlink2_init(struct vfio_pci_device *vdev) +{ + return -ENODEV; +} +#endif #endif /* VFIO_PCI_PRIVATE_H */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 1aa7b82..2fe8227 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -301,6 +301,9 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2) #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3) +/* NVIDIA GPU NV2 */ +#define VFIO_REGION_SUBTYPE_NVIDIA_NVLINK2 (4) + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 7bddf1e..38c9475 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) } } + if (pdev->vendor == PCI_VENDOR_ID_NVIDIA && + pdev->device == 0x1db1 && + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { + ret = vfio_pci_nvlink2_init(vdev); + if (ret) + dev_warn(&vdev->pdev->dev, + "Failed to setup NVIDIA NV2 RAM region\n"); + } + vfio_pci_probe_mmaps(vdev); return 0; diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c new file mode 100644 index 0000000..451c5cb --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_nvlink2.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * VFIO PCI NVIDIA Whitherspoon GPU support a.k.a. NVLink2. + * + * Copyright (C) 2018 IBM Corp. All rights reserved. + * Author: Alexey Kardashevskiy + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Register an on-GPU RAM region for cacheable access. + * + * Derived from original vfio_pci_igd.c: + * Copyright (C) 2016 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + */ + +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +struct vfio_pci_nvlink2_data { + unsigned long gpu_hpa; + unsigned long useraddr; + unsigned long size; + struct mm_struct *mm; + struct mm_iommu_table_group_mem_t *mem; +}; + +static size_t vfio_pci_nvlink2_rw(struct vfio_pci_device *vdev, + char __user *buf, size_t count, loff_t *ppos, bool iswrite) +{ + unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS; + void *base = vdev->region[i].data; + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; + + if (pos >= vdev->region[i].size) + return -EINVAL; + + count = min(count, (size_t)(vdev->region[i].size - pos)); + + if (iswrite) { + if (copy_from_user(base + pos, buf, count)) + return -EFAULT; + } else { + if (copy_to_user(buf, base + pos, count)) + return -EFAULT; + } + *ppos += count; + + return count; +} + +static void vfio_pci_nvlink2_release(struct vfio_pci_device *vdev, + struct vfio_pci_region *region) +{ + struct vfio_pci_nvlink2_data *data = region->data; + long ret; + + ret = mm_iommu_put(data->mm, data->mem); + WARN_ON(ret); + + mmdrop(data->mm); + kfree(data); +} + +static int vfio_pci_nvlink2_mmap_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct vfio_pci_region *region = vma->vm_private_data; + struct vfio_pci_nvlink2_data *data = region->data; + int ret; + unsigned long vmf_off = (vmf->address - vma->vm_start) >> PAGE_SHIFT; + unsigned long nv2pg = data->gpu_hpa >> PAGE_SHIFT; + unsigned long vm_pgoff = vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + unsigned long pfn = nv2pg + vm_pgoff + vmf_off; + + ret = vm_insert_pfn(vma, vmf->address, pfn); + /* TODO: make it a tracepoint */ + pr_debug("NVLink2: vmf=%lx hpa=%lx ret=%d\n", + vmf->address, pfn << PAGE_SHIFT, ret); + if (ret) + return VM_FAULT_SIGSEGV; + + return VM_FAULT_NOPAGE; +} + +static const struct vm_operations_struct vfio_pci_nvlink2_mmap_vmops = { + .fault = vfio_pci_nvlink2_mmap_fault, +}; + +static int vfio_pci_nvlink2_mmap(struct vfio_pci_device *vdev, + struct vfio_pci_region *region, struct vm_area_struct *vma) +{ + long ret; + struct vfio_pci_nvlink2_data *data = region->data; + + if (data->useraddr) + return -EPERM; + + if (vma->vm_end - vma->vm_start > data->size) + return -EINVAL; + + vma->vm_private_data = region; + vma->vm_flags |= VM_PFNMAP; + vma->vm_ops = &vfio_pci_nvlink2_mmap_vmops; + + /* + * Calling mm_iommu_newdev() here once as the region is not + * registered yet and therefore right initialization will happen now. + * Other places will use mm_iommu_find() which returns + * registered @mem and does not go gup(). + */ + data->useraddr = vma->vm_start; + data->mm = current->mm; + atomic_inc(&data->mm->mm_count); + ret = mm_iommu_newdev(data->mm, data->useraddr, + (vma->vm_end - vma->vm_start) >> PAGE_SHIFT, + data->gpu_hpa, &data->mem); + + pr_debug("VFIO NVLINK2 mmap: useraddr=%lx hpa=%lx size=%lx ret=%ld\n", + data->useraddr, data->gpu_hpa, + vma->vm_end - vma->vm_start, ret); + + return ret; +} + +static const struct vfio_pci_regops vfio_pci_nvlink2_regops = { + .rw = vfio_pci_nvlink2_rw, + .release = vfio_pci_nvlink2_release, + .mmap = vfio_pci_nvlink2_mmap, +}; + +int vfio_pci_nvlink2_init(struct vfio_pci_device *vdev) +{ + int len = 0, ret; + struct device_node *npu_node, *mem_node; + struct pci_dev *npu_dev; + uint32_t *mem_phandle, *val; + struct vfio_pci_nvlink2_data *data; + + npu_dev = pnv_pci_get_npu_dev(vdev->pdev, 0); + if (!npu_dev) + return -EINVAL; + + npu_node = pci_device_to_OF_node(npu_dev); + if (!npu_node) + return -EINVAL; + + mem_phandle = (void *) of_get_property(npu_node, "memory-region", NULL); + if (!mem_phandle) + return -EINVAL; + + mem_node = of_find_node_by_phandle(be32_to_cpu(*mem_phandle)); + if (!mem_node) + return -EINVAL; + + val = (uint32_t *) of_get_property(mem_node, "reg", &len); + if (!val || len != 2 * sizeof(uint64_t)) + return -EINVAL; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + data->gpu_hpa = ((uint64_t)be32_to_cpu(val[0]) << 32) | + be32_to_cpu(val[1]); + data->size = ((uint64_t)be32_to_cpu(val[2]) << 32) | + be32_to_cpu(val[3]); + + dev_dbg(&vdev->pdev->dev, "%lx..%lx\n", data->gpu_hpa, + data->gpu_hpa + data->size - 1); + + ret = vfio_pci_register_dev_region(vdev, + PCI_VENDOR_ID_NVIDIA | VFIO_REGION_TYPE_PCI_VENDOR_TYPE, + VFIO_REGION_SUBTYPE_NVIDIA_NVLINK2, + &vfio_pci_nvlink2_regops, data->size, + VFIO_REGION_INFO_FLAG_READ, data); + if (ret) + kfree(data); + + return ret; +} diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 24ee260..2725bc8 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -30,3 +30,7 @@ config VFIO_PCI_INTX config VFIO_PCI_IGD depends on VFIO_PCI def_bool y if X86 + +config VFIO_PCI_NVLINK2 + depends on VFIO_PCI + def_bool y if PPC_POWERNV