From patchwork Tue Aug 31 22:39:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dann frazier X-Patchwork-Id: 1522766 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=L53iNkq0; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GzhvQ0Mhpz9sRf; Wed, 1 Sep 2021 08:39:30 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1mLCPe-0007d1-9C; Tue, 31 Aug 2021 22:39:26 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1mLCPV-0007aM-M7 for kernel-team@lists.ubuntu.com; Tue, 31 Aug 2021 22:39:17 +0000 Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 5768A4017E for ; Tue, 31 Aug 2021 22:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1630449557; bh=AtR/73iwYmYfARaWJMJC7CNmT7w1mP6SQ2UOZHCCnvA=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L53iNkq0vYhcFuofGFfviMMccQf/TB5/GV7H72BTPFra4+QV1YNajok2ox6HazjW2 cnjm4alfkoZRXWX3W/HSrjicjYsT6jEHipb/Rmi8YCyrnSY7j7hIUfsaB3FSDpp9Gv Kzgz6zZYuRjD6xthv4WPvNaBvbzm+38EAxpKB+1Ut08yn4veSFYrqJMUelUP81rnJo kULZIjBiXkG62IJxYZ/cu7qQH9ayD+ixFvh/X8rp6WXL9Cy0MYwlFSr/wCJFx/iKLA 4cnsgXdwWNBG8vujbckWp6FpkOTGJA70/65oDguIvZSiKelohbPw17wG0ybo/Hc7Jv PfvN57QZ8c+7Q== Received: by mail-il1-f200.google.com with SMTP id c4-20020a056e020cc4b02902242bd90889so448242ilj.20 for ; Tue, 31 Aug 2021 15:39:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AtR/73iwYmYfARaWJMJC7CNmT7w1mP6SQ2UOZHCCnvA=; b=KrhBHyn2+3H4SbrU8sJHm0oqzxcp821oUiKkRnm54ykOjBOBHr/BO6gtFk+qC2zMGn wn+HKBgvVbM1b+UZIp8rth/tLfI+MypsOCj9Pn8uen4vILeO90Lz4qSLCsgSpt7IdCKJ MJ0ZXAq2gkCflFsXu9EUwCbnZVsWvf3DwkZuGsHK+cu0GfH2pqdAgZpKN58QR+3tjKjr ULARynPlwwipqHhnAVI7xo6qHbaUtOKQMpePLiRHtOAjQG8Bp8iHlqab6DuCpMBG2NCD lLgt46uRcT5ESUhAMnrwpZhGyGHnDgLHEg44uwaACxJNeyFtQJ189NQ0lyA4xka5axhI GY4w== X-Gm-Message-State: AOAM531qQdJ6+cNChDtlXb9xWAHYShn2xZzUO2NX4iZTaHAd2hMEPY7x 99AeiPpLuy0FpQoJqQ2sjJUmO7divENw5RscO2h7tVAl7OXEEgNcOnDFDWNcfyew1r23nsLsMkm 6cJzyL1Yxql+MYvIt1UThfxoJ/mei6Vb0VluyUNha8w== X-Received: by 2002:a92:c5a7:: with SMTP id r7mr22410348ilt.11.1630449554887; Tue, 31 Aug 2021 15:39:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxnlymeYgmPWRFkDLCIy5fDlU7HSjDIvJfZZ5MyjMk4N9BQIgmKu20adnEmqiH/ZDxummGr3Q== X-Received: by 2002:a92:c5a7:: with SMTP id r7mr22410309ilt.11.1630449553999; Tue, 31 Aug 2021 15:39:13 -0700 (PDT) Received: from localhost (c-71-56-235-36.hsd1.co.comcast.net. [71.56.235.36]) by smtp.gmail.com with ESMTPSA id s6sm10598367iow.1.2021.08.31.15.39.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Aug 2021 15:39:12 -0700 (PDT) From: dann frazier To: kernel-team@lists.ubuntu.com Subject: [RFC v2 1/2][Focal] IB: Allow calls to ib_umem_get from kernel ULPs Date: Tue, 31 Aug 2021 16:39:05 -0600 Message-Id: <20210831223906.25905-2-dann.frazier@canonical.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210831223906.25905-1-dann.frazier@canonical.com> References: <20210831223906.25905-1-dann.frazier@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Moni Shoua BugLink: https://bugs.launchpad.net/bugs/1923104 So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky (backported from commit c320e527e1548305f31d95ec405140b04aed25f5) [ dannf: TLDR; fairly straightforward updating of all in-tree ib_umem_get() and ib_umem_odp_get() calls to the new API. Details: drivers/infiniband/core/umem_odp.c: - ib_umem_odp_alloc_implicit(): Adjust WARN_ON_ONCE() statements to get ops.invalidate_range from device directly instead of context->device, as device is now passed in directly instead of context. These statements were since removed upstream in commit f25a546e6529 ("RDMA/odp: Use mmu_interval_notifier_insert()"). - Minor context adjustments. drivers/infiniband/hw/cxgb3/iwch_provider.c: - This driver was removed upstream in v5.5 with commit 30e0f6cf5acb ("RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel"). Update it to use new ib_umem_get() API. drivers/infiniband/hw/hns/hns_roce_qp.c: - The call to ib_umem_get() moved from hns_roce_create_qp_common() to alloc_qp_buf() upstream in upstream commit 24c22112b9c2 ("RDMA/hns: Optimize qp buffer allocation flow"), which we backported to focal in commit 2aa3ae3060ffebe5. Update the API in the new site. drivers/infiniband/hw/mlx5/odp.c: - mlx5_ib_alloc_implicit_mr(): Pass pd->ibpd.device to ib_umem_odp_alloc_implicit instead of &dev->ib_dev following code reorg in upstream commit c2edcd69351f ("RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it"). drivers/infiniband/hw/mlx5/mr.c: minor context adjustments ] Signed-off-by: dann frazier --- drivers/infiniband/core/umem.c | 27 +++++---------- drivers/infiniband/core/umem_odp.c | 33 ++++++------------- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 12 ++++--- drivers/infiniband/hw/cxgb3/iwch_provider.c | 2 +- drivers/infiniband/hw/cxgb4/mem.c | 2 +- drivers/infiniband/hw/efa/efa_verbs.c | 2 +- drivers/infiniband/hw/hns/hns_roce_cq.c | 2 +- drivers/infiniband/hw/hns/hns_roce_db.c | 3 +- drivers/infiniband/hw/hns/hns_roce_mr.c | 4 +-- drivers/infiniband/hw/hns/hns_roce_qp.c | 2 +- drivers/infiniband/hw/hns/hns_roce_srq.c | 5 +-- drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/doorbell.c | 3 +- drivers/infiniband/hw/mlx4/mr.c | 8 ++--- drivers/infiniband/hw/mlx4/qp.c | 5 +-- drivers/infiniband/hw/mlx4/srq.c | 3 +- drivers/infiniband/hw/mlx5/cq.c | 6 ++-- drivers/infiniband/hw/mlx5/devx.c | 2 +- drivers/infiniband/hw/mlx5/doorbell.c | 3 +- drivers/infiniband/hw/mlx5/mr.c | 19 +++++------ drivers/infiniband/hw/mlx5/odp.c | 2 +- drivers/infiniband/hw/mlx5/qp.c | 4 +-- drivers/infiniband/hw/mlx5/srq.c | 2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 2 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 2 +- drivers/infiniband/hw/qedr/verbs.c | 9 +++-- drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 2 +- drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c | 2 +- drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c | 7 ++-- drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c | 2 +- drivers/infiniband/sw/rdmavt/mr.c | 2 +- drivers/infiniband/sw/rxe/rxe_mr.c | 2 +- include/rdma/ib_umem.h | 4 +-- include/rdma/ib_umem_odp.h | 9 ++--- 35 files changed, 92 insertions(+), 106 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 62a5a0b34fb8..698c5359f643 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -195,15 +195,14 @@ EXPORT_SYMBOL(ib_umem_find_best_pgsz); /** * ib_umem_get - Pin and DMA map userspace memory. * - * @udata: userspace context to pin memory for + * @device: IB device to connect UMEM * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned */ -struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, +struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, size_t size, int access) { - struct ib_ucontext *context; struct ib_umem *umem; struct page **page_list; unsigned long lock_limit; @@ -215,14 +214,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, struct scatterlist *sg; unsigned int gup_flags = FOLL_WRITE; - if (!udata) - return ERR_PTR(-EIO); - - context = container_of(udata, struct uverbs_attr_bundle, driver_udata) - ->context; - if (!context) - return ERR_PTR(-EIO); - /* * If the combination of the addr and size requested for this memory * region causes an integer overflow, return error. @@ -240,7 +231,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, umem = kzalloc(sizeof(*umem), GFP_KERNEL); if (!umem) return ERR_PTR(-ENOMEM); - umem->ibdev = context->device; + umem->ibdev = device; umem->length = size; umem->address = addr; umem->writable = ib_access_writable(access); @@ -295,7 +286,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, npages -= ret; sg = ib_umem_add_sg_table(sg, page_list, ret, - dma_get_max_seg_size(context->device->dma_device), + dma_get_max_seg_size(device->dma_device), &umem->sg_nents); up_read(&mm->mmap_sem); @@ -303,10 +294,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, sg_mark_end(sg); - umem->nmap = ib_dma_map_sg(context->device, - umem->sg_head.sgl, - umem->sg_nents, - DMA_BIDIRECTIONAL); + umem->nmap = ib_dma_map_sg(device, + umem->sg_head.sgl, + umem->sg_nents, + DMA_BIDIRECTIONAL); if (!umem->nmap) { ret = -ENOMEM; @@ -317,7 +308,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, goto out; umem_release: - __ib_umem_release(context->device, umem, 0); + __ib_umem_release(device, umem, 0); vma: atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm); out: diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index fedf6829cdec..615e4ce0343b 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -287,15 +287,12 @@ static inline int ib_init_umem_odp(struct ib_umem_odp *umem_odp) * They exist only to hold the per_mm reference to help the driver create * children umems. * - * @udata: udata from the syscall being used to create the umem + * @device: IB device to create UMEM * @access: ib_reg_mr access flags */ -struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_udata *udata, +struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device, int access) { - struct ib_ucontext *context = - container_of(udata, struct uverbs_attr_bundle, driver_udata) - ->context; struct ib_umem *umem; struct ib_umem_odp *umem_odp; int ret; @@ -303,16 +300,14 @@ struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_udata *udata, if (access & IB_ACCESS_HUGETLB) return ERR_PTR(-EINVAL); - if (!context) - return ERR_PTR(-EIO); - if (WARN_ON_ONCE(!context->device->ops.invalidate_range)) + if (WARN_ON_ONCE(!device->ops.invalidate_range)) return ERR_PTR(-EINVAL); umem_odp = kzalloc(sizeof(*umem_odp), GFP_KERNEL); if (!umem_odp) return ERR_PTR(-ENOMEM); umem = &umem_odp->umem; - umem->ibdev = context->device; + umem->ibdev = device; umem->writable = ib_access_writable(access); umem->owning_mm = current->mm; umem_odp->is_implicit_odp = 1; @@ -373,7 +368,7 @@ EXPORT_SYMBOL(ib_umem_odp_alloc_child); /** * ib_umem_odp_get - Create a umem_odp for a userspace va * - * @udata: userspace context to pin memory for + * @device: IB device struct to get UMEM * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned @@ -382,31 +377,23 @@ EXPORT_SYMBOL(ib_umem_odp_alloc_child); * pinning, instead, stores the mm for future page fault handling in * conjunction with MMU notifiers. */ -struct ib_umem_odp *ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, - size_t size, int access) +struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device, + unsigned long addr, size_t size, + int access) { struct ib_umem_odp *umem_odp; - struct ib_ucontext *context; struct mm_struct *mm; int ret; - if (!udata) - return ERR_PTR(-EIO); - - context = container_of(udata, struct uverbs_attr_bundle, driver_udata) - ->context; - if (!context) - return ERR_PTR(-EIO); - if (WARN_ON_ONCE(!(access & IB_ACCESS_ON_DEMAND)) || - WARN_ON_ONCE(!context->device->ops.invalidate_range)) + WARN_ON_ONCE(!device->ops.invalidate_range)) return ERR_PTR(-EINVAL); umem_odp = kzalloc(sizeof(struct ib_umem_odp), GFP_KERNEL); if (!umem_odp) return ERR_PTR(-ENOMEM); - umem_odp->umem.ibdev = context->device; + umem_odp->umem.ibdev = device; umem_odp->umem.length = size; umem_odp->umem.address = addr; umem_odp->umem.writable = ib_access_writable(access); diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 970c2593c93c..a8bc7cea4d9c 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -855,7 +855,8 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, bytes += (qplib_qp->sq.max_wqe * psn_sz); } bytes = PAGE_ALIGN(bytes); - umem = ib_umem_get(udata, ureq.qpsva, bytes, IB_ACCESS_LOCAL_WRITE); + umem = ib_umem_get(&rdev->ibdev, ureq.qpsva, bytes, + IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) return PTR_ERR(umem); @@ -868,7 +869,7 @@ static int bnxt_re_init_user_qp(struct bnxt_re_dev *rdev, struct bnxt_re_pd *pd, if (!qp->qplib_qp.srq) { bytes = (qplib_qp->rq.max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE); bytes = PAGE_ALIGN(bytes); - umem = ib_umem_get(udata, ureq.qprva, bytes, + umem = ib_umem_get(&rdev->ibdev, ureq.qprva, bytes, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) goto rqfail; @@ -1322,7 +1323,8 @@ static int bnxt_re_init_user_srq(struct bnxt_re_dev *rdev, bytes = (qplib_srq->max_wqe * BNXT_QPLIB_MAX_RQE_ENTRY_SIZE); bytes = PAGE_ALIGN(bytes); - umem = ib_umem_get(udata, ureq.srqva, bytes, IB_ACCESS_LOCAL_WRITE); + umem = ib_umem_get(&rdev->ibdev, ureq.srqva, bytes, + IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) return PTR_ERR(umem); @@ -2564,7 +2566,7 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, goto fail; } - cq->umem = ib_umem_get(udata, req.cq_va, + cq->umem = ib_umem_get(&rdev->ibdev, req.cq_va, entries * sizeof(struct cq_base), IB_ACCESS_LOCAL_WRITE); if (IS_ERR(cq->umem)) { @@ -3548,7 +3550,7 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, /* The fixed portion of the rkey is the same as the lkey */ mr->ib_mr.rkey = mr->qplib_mr.rkey; - umem = ib_umem_get(udata, start, length, mr_access_flags); + umem = ib_umem_get(&rdev->ibdev, start, length, mr_access_flags); if (IS_ERR(umem)) { dev_err(rdev_to_dev(rdev), "Failed to get umem"); rc = -EFAULT; diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index a6da164c6fc0..cf6e5831f3e6 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -451,7 +451,7 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, mhp->rhp = rhp; - mhp->umem = ib_umem_get(udata, start, length, acc); + mhp->umem = ib_umem_get(pd->device, start, length, acc); if (IS_ERR(mhp->umem)) { err = PTR_ERR(mhp->umem); kfree(mhp); diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 5b63a1b133cb..1e4f4e525598 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -542,7 +542,7 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, mhp->rhp = rhp; - mhp->umem = ib_umem_get(udata, start, length, acc); + mhp->umem = ib_umem_get(pd->device, start, length, acc); if (IS_ERR(mhp->umem)) goto err_free_skb; diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c index bb830ff8596e..1cb98a3c2ad6 100644 --- a/drivers/infiniband/hw/efa/efa_verbs.c +++ b/drivers/infiniband/hw/efa/efa_verbs.c @@ -1423,7 +1423,7 @@ struct ib_mr *efa_reg_mr(struct ib_pd *ibpd, u64 start, u64 length, goto err_out; } - mr->umem = ib_umem_get(udata, start, length, access_flags); + mr->umem = ib_umem_get(ibpd->device, start, length, access_flags); if (IS_ERR(mr->umem)) { err = PTR_ERR(mr->umem); ibdev_dbg(&dev->ibdev, diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c index 61f53a85767b..5ffe4c996ed3 100644 --- a/drivers/infiniband/hw/hns/hns_roce_cq.c +++ b/drivers/infiniband/hw/hns/hns_roce_cq.c @@ -163,7 +163,7 @@ static int get_cq_umem(struct hns_roce_dev *hr_dev, struct hns_roce_cq *hr_cq, u32 npages; int ret; - *umem = ib_umem_get(udata, ucmd.buf_addr, buf->size, + *umem = ib_umem_get(&hr_dev->ib_dev, ucmd.buf_addr, buf->size, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(*umem)) return PTR_ERR(*umem); diff --git a/drivers/infiniband/hw/hns/hns_roce_db.c b/drivers/infiniband/hw/hns/hns_roce_db.c index 10af6958ab69..bff6abdccfb0 100644 --- a/drivers/infiniband/hw/hns/hns_roce_db.c +++ b/drivers/infiniband/hw/hns/hns_roce_db.c @@ -31,7 +31,8 @@ int hns_roce_db_map_user(struct hns_roce_ucontext *context, refcount_set(&page->refcount, 1); page->user_virt = page_addr; - page->umem = ib_umem_get(udata, page_addr, PAGE_SIZE, 0); + page->umem = ib_umem_get(context->ibucontext.device, page_addr, + PAGE_SIZE, 0); if (IS_ERR(page->umem)) { ret = PTR_ERR(page->umem); kfree(page); diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index 95765560c1cf..b9898e71655a 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -1145,7 +1145,7 @@ struct ib_mr *hns_roce_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mr) return ERR_PTR(-ENOMEM); - mr->umem = ib_umem_get(udata, start, length, access_flags); + mr->umem = ib_umem_get(pd->device, start, length, access_flags); if (IS_ERR(mr->umem)) { ret = PTR_ERR(mr->umem); goto err_free; @@ -1230,7 +1230,7 @@ static int rereg_mr_trans(struct ib_mr *ibmr, int flags, } ib_umem_release(mr->umem); - mr->umem = ib_umem_get(udata, start, length, mr_access_flags); + mr->umem = ib_umem_get(ibmr->device, start, length, mr_access_flags); if (IS_ERR(mr->umem)) { ret = PTR_ERR(mr->umem); mr->umem = NULL; diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index 464babd8ce76..28122c7071ad 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -833,7 +833,7 @@ static int alloc_qp_buf(struct hns_roce_dev *hr_dev, struct hns_roce_qp *hr_qp, } if (udata) { - hr_qp->umem = ib_umem_get(udata, addr, hr_qp->buff_size, 0); + hr_qp->umem = ib_umem_get(ibdev, addr, hr_qp->buff_size, 0); if (IS_ERR(hr_qp->umem)) { ret = PTR_ERR(hr_qp->umem); goto err_inline; diff --git a/drivers/infiniband/hw/hns/hns_roce_srq.c b/drivers/infiniband/hw/hns/hns_roce_srq.c index b5d773057caf..5b3dd1a337d4 100644 --- a/drivers/infiniband/hw/hns/hns_roce_srq.c +++ b/drivers/infiniband/hw/hns/hns_roce_srq.c @@ -186,7 +186,8 @@ static int create_user_srq(struct hns_roce_srq *srq, struct ib_udata *udata, if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) return -EFAULT; - srq->umem = ib_umem_get(udata, ucmd.buf_addr, srq_buf_size, 0); + srq->umem = + ib_umem_get(srq->ibsrq.device, ucmd.buf_addr, srq_buf_size, 0); if (IS_ERR(srq->umem)) return PTR_ERR(srq->umem); @@ -205,7 +206,7 @@ static int create_user_srq(struct hns_roce_srq *srq, struct ib_udata *udata, goto err_user_srq_mtt; /* config index queue BA */ - srq->idx_que.umem = ib_umem_get(udata, ucmd.que_addr, + srq->idx_que.umem = ib_umem_get(srq->ibsrq.device, ucmd.que_addr, srq->idx_que.buf_size, 0); if (IS_ERR(srq->idx_que.umem)) { dev_err(hr_dev->dev, "ib_umem_get error for index queue\n"); diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 8f3e666ddae1..a27ac46eaf69 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1754,7 +1754,7 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, if (length > I40IW_MAX_MR_SIZE) return ERR_PTR(-EINVAL); - region = ib_umem_get(udata, start, length, acc); + region = ib_umem_get(pd->device, start, length, acc); if (IS_ERR(region)) return (struct ib_mr *)region; diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 306b21281fa2..a57033d4b0e5 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -144,7 +144,7 @@ static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_udata *udata, int shift; int n; - *umem = ib_umem_get(udata, buf_addr, cqe * cqe_size, + *umem = ib_umem_get(&dev->ib_dev, buf_addr, cqe * cqe_size, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(*umem)) return PTR_ERR(*umem); diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c index 714f9df5bf39..d41f03ccb0e1 100644 --- a/drivers/infiniband/hw/mlx4/doorbell.c +++ b/drivers/infiniband/hw/mlx4/doorbell.c @@ -64,7 +64,8 @@ int mlx4_ib_db_map_user(struct ib_udata *udata, unsigned long virt, page->user_virt = (virt & PAGE_MASK); page->refcnt = 0; - page->umem = ib_umem_get(udata, virt & PAGE_MASK, PAGE_SIZE, 0); + page->umem = ib_umem_get(context->ibucontext.device, virt & PAGE_MASK, + PAGE_SIZE, 0); if (IS_ERR(page->umem)) { err = PTR_ERR(page->umem); kfree(page); diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index c25fe0894e51..184a281f89ec 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -367,7 +367,7 @@ int mlx4_ib_umem_calc_optimal_mtt_size(struct ib_umem *umem, u64 start_va, return block_shift; } -static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, +static struct ib_umem *mlx4_get_umem_mr(struct ib_device *device, u64 start, u64 length, int access_flags) { /* @@ -398,7 +398,7 @@ static struct ib_umem *mlx4_get_umem_mr(struct ib_udata *udata, u64 start, up_read(¤t->mm->mmap_sem); } - return ib_umem_get(udata, start, length, access_flags); + return ib_umem_get(device, start, length, access_flags); } struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, @@ -415,7 +415,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mr) return ERR_PTR(-ENOMEM); - mr->umem = mlx4_get_umem_mr(udata, start, length, access_flags); + mr->umem = mlx4_get_umem_mr(pd->device, start, length, access_flags); if (IS_ERR(mr->umem)) { err = PTR_ERR(mr->umem); goto err_free; @@ -503,7 +503,7 @@ int mlx4_ib_rereg_user_mr(struct ib_mr *mr, int flags, mlx4_mr_rereg_mem_cleanup(dev->dev, &mmr->mmr); ib_umem_release(mmr->umem); - mmr->umem = mlx4_get_umem_mr(udata, start, length, + mmr->umem = mlx4_get_umem_mr(mr->device, start, length, mr_access_flags); if (IS_ERR(mmr->umem)) { err = PTR_ERR(mmr->umem); diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index f9b2e9337c3a..82a369adadf4 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -916,7 +916,7 @@ static int create_rq(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) + (qp->sq.wqe_cnt << qp->sq.wqe_shift); - qp->umem = ib_umem_get(udata, wq.buf_addr, qp->buf_size, 0); + qp->umem = ib_umem_get(pd->device, wq.buf_addr, qp->buf_size, 0); if (IS_ERR(qp->umem)) { err = PTR_ERR(qp->umem); goto err; @@ -1110,7 +1110,8 @@ static int create_qp_common(struct ib_pd *pd, struct ib_qp_init_attr *init_attr, if (err) goto err; - qp->umem = ib_umem_get(udata, ucmd.buf_addr, qp->buf_size, 0); + qp->umem = + ib_umem_get(pd->device, ucmd.buf_addr, qp->buf_size, 0); if (IS_ERR(qp->umem)) { err = PTR_ERR(qp->umem); goto err; diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index 8dcf6e3d9ae2..8f9d5035142d 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -110,7 +110,8 @@ int mlx4_ib_create_srq(struct ib_srq *ib_srq, if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) return -EFAULT; - srq->umem = ib_umem_get(udata, ucmd.buf_addr, buf_size, 0); + srq->umem = + ib_umem_get(ib_srq->device, ucmd.buf_addr, buf_size, 0); if (IS_ERR(srq->umem)) return PTR_ERR(srq->umem); diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 72b4eb6c5013..2f5ee37c252b 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -733,8 +733,8 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata, *cqe_size = ucmd.cqe_size; cq->buf.umem = - ib_umem_get(udata, ucmd.buf_addr, entries * ucmd.cqe_size, - IB_ACCESS_LOCAL_WRITE); + ib_umem_get(&dev->ib_dev, ucmd.buf_addr, + entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(cq->buf.umem)) { err = PTR_ERR(cq->buf.umem); return err; @@ -1132,7 +1132,7 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq, if (ucmd.cqe_size && SIZE_MAX / ucmd.cqe_size <= entries - 1) return -EINVAL; - umem = ib_umem_get(udata, ucmd.buf_addr, + umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, (size_t)ucmd.cqe_size * entries, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(umem)) { diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index ae2e761b0e47..c3b4b6586d17 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -2143,7 +2143,7 @@ static int devx_umem_get(struct mlx5_ib_dev *dev, struct ib_ucontext *ucontext, if (err) return err; - obj->umem = ib_umem_get(&attrs->driver_udata, addr, size, access); + obj->umem = ib_umem_get(&dev->ib_dev, addr, size, access); if (IS_ERR(obj->umem)) return PTR_ERR(obj->umem); diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c index 12737c509aa2..61475b571531 100644 --- a/drivers/infiniband/hw/mlx5/doorbell.c +++ b/drivers/infiniband/hw/mlx5/doorbell.c @@ -64,7 +64,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, page->user_virt = (virt & PAGE_MASK); page->refcnt = 0; - page->umem = ib_umem_get(udata, virt & PAGE_MASK, PAGE_SIZE, 0); + page->umem = ib_umem_get(context->ibucontext.device, virt & PAGE_MASK, + PAGE_SIZE, 0); if (IS_ERR(page->umem)) { err = PTR_ERR(page->umem); kfree(page); diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 583b3cad68f6..24daf420317e 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -752,10 +752,9 @@ static int mr_cache_max_order(struct mlx5_ib_dev *dev) return MLX5_MAX_UMR_SHIFT; } -static int mr_umem_get(struct mlx5_ib_dev *dev, struct ib_udata *udata, - u64 start, u64 length, int access_flags, - struct ib_umem **umem, int *npages, int *page_shift, - int *ncont, int *order) +static int mr_umem_get(struct mlx5_ib_dev *dev, u64 start, u64 length, + int access_flags, struct ib_umem **umem, int *npages, + int *page_shift, int *ncont, int *order) { struct ib_umem *u; @@ -764,7 +763,8 @@ static int mr_umem_get(struct mlx5_ib_dev *dev, struct ib_udata *udata, if (access_flags & IB_ACCESS_ON_DEMAND) { struct ib_umem_odp *odp; - odp = ib_umem_odp_get(udata, start, length, access_flags); + odp = ib_umem_odp_get(&dev->ib_dev, start, length, + access_flags); if (IS_ERR(odp)) { mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(odp)); @@ -779,7 +779,7 @@ static int mr_umem_get(struct mlx5_ib_dev *dev, struct ib_udata *udata, if (order) *order = ilog2(roundup_pow_of_two(*ncont)); } else { - u = ib_umem_get(udata, start, length, access_flags); + u = ib_umem_get(&dev->ib_dev, start, length, access_flags); if (IS_ERR(u)) { mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(u)); return PTR_ERR(u); @@ -1279,7 +1279,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, return &mr->ibmr; } - err = mr_umem_get(dev, udata, start, length, access_flags, &umem, + err = mr_umem_get(dev, start, length, access_flags, &umem, &npages, &page_shift, &ncont, &order); if (err < 0) @@ -1434,9 +1434,8 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, flags |= IB_MR_REREG_TRANS; ib_umem_release(mr->umem); mr->umem = NULL; - err = mr_umem_get(dev, udata, addr, len, access_flags, - &mr->umem, &npages, &page_shift, &ncont, - &order); + err = mr_umem_get(dev, addr, len, access_flags, &mr->umem, + &npages, &page_shift, &ncont, &order); if (err) goto err; } diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c index 3f9478d19376..a38c9a82edf8 100644 --- a/drivers/infiniband/hw/mlx5/odp.c +++ b/drivers/infiniband/hw/mlx5/odp.c @@ -553,7 +553,7 @@ struct mlx5_ib_mr *mlx5_ib_alloc_implicit_mr(struct mlx5_ib_pd *pd, struct mlx5_ib_mr *imr; struct ib_umem_odp *umem_odp; - umem_odp = ib_umem_odp_alloc_implicit(udata, access_flags); + umem_odp = ib_umem_odp_alloc_implicit(pd->ibpd.device, access_flags); if (IS_ERR(umem_odp)) return ERR_CAST(umem_odp); diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index fa6c5696ad6b..45faab9e1313 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -749,7 +749,7 @@ static int mlx5_ib_umem_get(struct mlx5_ib_dev *dev, struct ib_udata *udata, { int err; - *umem = ib_umem_get(udata, addr, size, 0); + *umem = ib_umem_get(&dev->ib_dev, addr, size, 0); if (IS_ERR(*umem)) { mlx5_ib_dbg(dev, "umem_get failed\n"); return PTR_ERR(*umem); @@ -806,7 +806,7 @@ static int create_user_rq(struct mlx5_ib_dev *dev, struct ib_pd *pd, if (!ucmd->buf_addr) return -EINVAL; - rwq->umem = ib_umem_get(udata, ucmd->buf_addr, rwq->buf_size, 0); + rwq->umem = ib_umem_get(&dev->ib_dev, ucmd->buf_addr, rwq->buf_size, 0); if (IS_ERR(rwq->umem)) { mlx5_ib_dbg(dev, "umem_get failed\n"); err = PTR_ERR(rwq->umem); diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index cec67d774e1e..6d1ff13d2283 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -80,7 +80,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, srq->wq_sig = !!(ucmd.flags & MLX5_SRQ_FLAG_SIGNATURE); - srq->umem = ib_umem_get(udata, ucmd.buf_addr, buf_size, 0); + srq->umem = ib_umem_get(pd->device, ucmd.buf_addr, buf_size, 0); if (IS_ERR(srq->umem)) { mlx5_ib_dbg(dev, "failed umem get, size %d\n", buf_size); err = PTR_ERR(srq->umem); diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 33002530fee7..ac19d57803b5 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -880,7 +880,7 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (!mr) return ERR_PTR(-ENOMEM); - mr->umem = ib_umem_get(udata, start, length, acc); + mr->umem = ib_umem_get(pd->device, start, length, acc); if (IS_ERR(mr->umem)) { err = PTR_ERR(mr->umem); goto err; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index d35da18cad41..9117d60eee9b 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -875,7 +875,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len, mr = kzalloc(sizeof(*mr), GFP_KERNEL); if (!mr) return ERR_PTR(status); - mr->umem = ib_umem_get(udata, start, len, acc); + mr->umem = ib_umem_get(ibpd->device, start, len, acc); if (IS_ERR(mr->umem)) { status = -EFAULT; goto umem_err; diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index cda67142a6d5..09a0ff58c719 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -706,7 +706,7 @@ static inline int qedr_init_user_queue(struct ib_udata *udata, q->buf_addr = buf_addr; q->buf_len = buf_len; - q->umem = ib_umem_get(udata, q->buf_addr, q->buf_len, access); + q->umem = ib_umem_get(&dev->ibdev, q->buf_addr, q->buf_len, access); if (IS_ERR(q->umem)) { DP_ERR(dev, "create user queue: failed ib_umem_get, got %ld\n", PTR_ERR(q->umem)); @@ -1292,9 +1292,8 @@ static int qedr_init_srq_user_params(struct ib_udata *udata, if (rc) return rc; - srq->prod_umem = - ib_umem_get(udata, ureq->prod_pair_addr, - sizeof(struct rdma_srq_producers), access); + srq->prod_umem = ib_umem_get(srq->ibsrq.device, ureq->prod_pair_addr, + sizeof(struct rdma_srq_producers), access); if (IS_ERR(srq->prod_umem)) { qedr_free_pbl(srq->dev, &srq->usrq.pbl_info, srq->usrq.pbl_tbl); ib_umem_release(srq->usrq.umem); @@ -2622,7 +2621,7 @@ struct ib_mr *qedr_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 len, mr->type = QEDR_MR_USER; - mr->umem = ib_umem_get(udata, start, len, acc); + mr->umem = ib_umem_get(ibpd->device, start, len, acc); if (IS_ERR(mr->umem)) { rc = -EFAULT; goto err0; diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c index a26a4fd86bf4..4f6cc0de7ef9 100644 --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c @@ -135,7 +135,7 @@ int pvrdma_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, goto err_cq; } - cq->umem = ib_umem_get(udata, ucmd.buf_addr, ucmd.buf_size, + cq->umem = ib_umem_get(ibdev, ucmd.buf_addr, ucmd.buf_size, IB_ACCESS_LOCAL_WRITE); if (IS_ERR(cq->umem)) { ret = PTR_ERR(cq->umem); diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c index c61e665ff261..b039f1f00e05 100644 --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_mr.c @@ -126,7 +126,7 @@ struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, return ERR_PTR(-EINVAL); } - umem = ib_umem_get(udata, start, length, access_flags); + umem = ib_umem_get(pd->device, start, length, access_flags); if (IS_ERR(umem)) { dev_warn(&dev->pdev->dev, "could not get umem for mem region\n"); diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c index 9e5c4031d765..9c3724a2fda6 100644 --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_qp.c @@ -262,8 +262,9 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd, if (!is_srq) { /* set qp->sq.wqe_cnt, shift, buf_size.. */ - qp->rumem = ib_umem_get(udata, ucmd.rbuf_addr, - ucmd.rbuf_size, 0); + qp->rumem = + ib_umem_get(pd->device, ucmd.rbuf_addr, + ucmd.rbuf_size, 0); if (IS_ERR(qp->rumem)) { ret = PTR_ERR(qp->rumem); goto err_qp; @@ -274,7 +275,7 @@ struct ib_qp *pvrdma_create_qp(struct ib_pd *pd, qp->srq = to_vsrq(init_attr->srq); } - qp->sumem = ib_umem_get(udata, ucmd.sbuf_addr, + qp->sumem = ib_umem_get(pd->device, ucmd.sbuf_addr, ucmd.sbuf_size, 0); if (IS_ERR(qp->sumem)) { if (!is_srq) diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c index 98c8be71d91d..d330decfb80a 100644 --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c @@ -146,7 +146,7 @@ int pvrdma_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr, goto err_srq; } - srq->umem = ib_umem_get(udata, ucmd.buf_addr, ucmd.buf_size, 0); + srq->umem = ib_umem_get(ibsrq->device, ucmd.buf_addr, ucmd.buf_size, 0); if (IS_ERR(srq->umem)) { ret = PTR_ERR(srq->umem); goto err_srq; diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index b9a76bf74857..72f6534fbb52 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -390,7 +390,7 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, if (length == 0) return ERR_PTR(-EINVAL); - umem = ib_umem_get(udata, start, length, mr_access_flags); + umem = ib_umem_get(pd->device, start, length, mr_access_flags); if (IS_ERR(umem)) return (void *)umem; diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index a1326a608d66..1799dc4b5e8c 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -169,7 +169,7 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, void *vaddr; int err; - umem = ib_umem_get(udata, start, length, access); + umem = ib_umem_get(pd->ibpd.device, start, length, access); if (IS_ERR(umem)) { pr_warn("err %d from rxe_umem_get\n", (int)PTR_ERR(umem)); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 5ad7b4fff088..9353910915d4 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -69,7 +69,7 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem) #ifdef CONFIG_INFINIBAND_USER_MEM -struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr, +struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, size_t size, int access); void ib_umem_release(struct ib_umem *umem); int ib_umem_page_count(struct ib_umem *umem); @@ -83,7 +83,7 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, #include -static inline struct ib_umem *ib_umem_get(struct ib_udata *udata, +static inline struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, size_t size, int access) { diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h index 253df1a1fa54..88467e9ff649 100644 --- a/include/rdma/ib_umem_odp.h +++ b/include/rdma/ib_umem_odp.h @@ -130,9 +130,10 @@ struct ib_ucontext_per_mm { struct rw_semaphore umem_rwsem; }; -struct ib_umem_odp *ib_umem_odp_get(struct ib_udata *udata, unsigned long addr, - size_t size, int access); -struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_udata *udata, +struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device, + unsigned long addr, size_t size, + int access); +struct ib_umem_odp *ib_umem_odp_alloc_implicit(struct ib_device *device, int access); struct ib_umem_odp *ib_umem_odp_alloc_child(struct ib_umem_odp *root_umem, unsigned long addr, size_t size); @@ -191,7 +192,7 @@ static inline int ib_umem_mmu_notifier_retry(struct ib_umem_odp *umem_odp, #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */ -static inline struct ib_umem_odp *ib_umem_odp_get(struct ib_udata *udata, +static inline struct ib_umem_odp *ib_umem_odp_get(struct ib_device *device, unsigned long addr, size_t size, int access) { From patchwork Tue Aug 31 22:39:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: dann frazier X-Patchwork-Id: 1522767 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=hq1HCYWd; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GzhvW32ZNz9t0T; Wed, 1 Sep 2021 08:39:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1mLCPj-0007fH-MM; Tue, 31 Aug 2021 22:39:31 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1mLCPX-0007al-8v for kernel-team@lists.ubuntu.com; Tue, 31 Aug 2021 22:39:19 +0000 Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id EAE9C4017E for ; Tue, 31 Aug 2021 22:39:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1630449559; bh=jvAbOXB85rhS0mpSsDdrnNl6iQODzZE8R4u7IBqwZvU=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hq1HCYWdX7cr6VvGXDnR20IioKLzeHff8JjUd62QgbsyJU2jkCD6DLCLr8W4e8jhh 6JlPZj7Imo50Dw5utzleudJA1S9HNG/mvoA4ashZ1l5sJ3INX1h6fe4fOBtytd6bN6 WAQj7hWcKTuvorTrYt3uEb/ywUH321bgf6vcgIqAXiWjsZBwa+3s/uGddqbtsyshIi gHKhnq/Oiq1n7Xb8AX6end0LwQGVeWD4on0Rtj0hUdJbOUzAwpXAAM+CnXW6KY5ua8 RYlz1zRtfz0RptvxFJag2/K8M/K1TXwb4EPqIyPiYaGHh0BUztQtVGerNGkHbkptEd b6EZcM+94Qq4Q== Received: by mail-il1-f198.google.com with SMTP id j17-20020a926e11000000b0022487646515so462791ilc.14 for ; Tue, 31 Aug 2021 15:39:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jvAbOXB85rhS0mpSsDdrnNl6iQODzZE8R4u7IBqwZvU=; b=cQnQCe8w14XnjZA9RMzH/66TNX713lYETz3pPUqCGKxz0lf4i8RA8+vY9u6KHLP+O3 x+VP1wEy0cOQeh/ns+jesW4c7UjsUw0GJDcNeqopVtDvTsJp6mdG5jQP+n3MG/qsjt4+ CU6DnUHeRZgdWVLywGWam7X4iSZG6KrHzDTD8UGoMZN/seVxZx/fNARjAsnW8jFGQKQJ 7/NlFqQQtg2P12HgrfbUAHnT3UV3nRenUJtK8vOfIl5hN8A0mI/bg0wgFNgGgik6GtUp C8Yf+AIOrg7RArV3ex1x7ymQ9dc69QN+yxl9uumu050ZZdRCqBqzeypl79Og+g1l9Acg mZyg== X-Gm-Message-State: AOAM533PXbckMve8izxIAgU/fXxZdKEnUgDaC4bAxEX0tOYqqYNT9QjL nHu7/LJbdUp39VWoT1FMF8+MHQ4VCytKv+lUGq2eEGvYI6HNRQOY5niI/dbf1B42STQqPqHyIf3 e4yD5d1zXhzlfk3alMhabKcKIDXr24LdgDZntyhOpxQ== X-Received: by 2002:a5e:db06:: with SMTP id q6mr24644093iop.24.1630449556716; Tue, 31 Aug 2021 15:39:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxg4woR9D3A6cnOK4fs12LrZqUEgG456pg/5DMTYQWOHbAdbQ60UnPQQsV5y56P8ai847qZUQ== X-Received: by 2002:a5e:db06:: with SMTP id q6mr24644051iop.24.1630449555873; Tue, 31 Aug 2021 15:39:15 -0700 (PDT) Received: from localhost (c-71-56-235-36.hsd1.co.comcast.net. [71.56.235.36]) by smtp.gmail.com with ESMTPSA id f2sm10492009ioz.14.2021.08.31.15.39.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Aug 2021 15:39:14 -0700 (PDT) From: dann frazier To: kernel-team@lists.ubuntu.com Subject: [RFC v2 2/2][Focal] UBUNTU: SAUCE: RDMA/core: Introduce peer memory interface Date: Tue, 31 Aug 2021 16:39:06 -0600 Message-Id: <20210831223906.25905-3-dann.frazier@canonical.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20210831223906.25905-1-dann.frazier@canonical.com> References: <20210831223906.25905-1-dann.frazier@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Feras Daoud BugLink: https://bugs.launchpad.net/bugs/1923104 The peer_memory_client scheme allows a driver to register with the ib_umem system that it has the ability to understand user virtual address ranges that are not compatible with get_user_pages(). For instance VMAs created with io_remap_pfn_range(), or other driver special VMA. For ranges the interface understands it can provide a DMA mapped sg_table for use by the ib_umem, allowing user virtual ranges that cannot be supported by get_user_pages() to be used as umems for RDMA. This is designed to preserve the kABI, no functions or structures are changed, only new symbols are added: ib_register_peer_memory_client ib_unregister_peer_memory_client ib_umem_activate_invalidation_notifier ib_umem_get_peer And a bitfield in struct ib_umem uses more bits. This interface is compatible with the two out of tree GPU drivers: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_peerdirect.c https://github.com/Mellanox/nv_peer_memory/blob/master/nv_peer_mem.c NOTES (remote before sending): - The exact locking semantics from the GPU side during invalidation are confusing. I've made it sane but perhaps this will hit locking problems. Test with lockdep and test invalidation. The main difference here is that get_pages and dma_map are called from a context that will block progress of invalidation. The old design blocked progress of invalidation using a completion for unmap and unpin, so those should be proven safe now. Since the old design used a completion it doesn't work with lockdep, even though it has basically the same blocking semantics. - The API exported to the GPU side is crufty and makes very little sense. Functionally it should be the same still, but many useless things were dropped off - I rewrote all the comments please check spelling/grammar - Compile tested only Signed-off-by: Yishai Hadas Signed-off-by: Feras Daoud Signed-off-by: Jason Gunthorpe (backported from commit a42989294cf39d6e829424734ab0e7ec48bebcef git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git) [ dannf: Backport provided to Ubuntu's 5.4 by Feras; confirmed to pass Nvidia's internal testing. ] Signed-off-by: dann frazier --- drivers/infiniband/core/Makefile | 2 +- drivers/infiniband/core/ib_peer_mem.h | 52 +++ drivers/infiniband/core/peer_mem.c | 484 ++++++++++++++++++++++++++ drivers/infiniband/core/umem.c | 44 ++- drivers/infiniband/hw/mlx5/cq.c | 11 +- drivers/infiniband/hw/mlx5/devx.c | 2 +- drivers/infiniband/hw/mlx5/doorbell.c | 4 +- drivers/infiniband/hw/mlx5/mem.c | 11 +- drivers/infiniband/hw/mlx5/mr.c | 47 ++- drivers/infiniband/hw/mlx5/qp.c | 2 +- drivers/infiniband/hw/mlx5/srq.c | 2 +- include/rdma/ib_umem.h | 29 ++ include/rdma/peer_mem.h | 165 +++++++++ 13 files changed, 828 insertions(+), 27 deletions(-) create mode 100644 drivers/infiniband/core/ib_peer_mem.h create mode 100644 drivers/infiniband/core/peer_mem.c create mode 100644 include/rdma/peer_mem.h diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 9a8871e21545..4b7838ff6e90 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -34,5 +34,5 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \ uverbs_std_types_flow_action.o uverbs_std_types_dm.o \ uverbs_std_types_mr.o uverbs_std_types_counters.o \ uverbs_uapi.o uverbs_std_types_device.o -ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o +ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o peer_mem.o ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o diff --git a/drivers/infiniband/core/ib_peer_mem.h b/drivers/infiniband/core/ib_peer_mem.h new file mode 100644 index 000000000000..bb38ffee724a --- /dev/null +++ b/drivers/infiniband/core/ib_peer_mem.h @@ -0,0 +1,52 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* + * Copyright (c) 2014-2020, Mellanox Technologies. All rights reserved. + */ +#ifndef RDMA_IB_PEER_MEM_H +#define RDMA_IB_PEER_MEM_H + +#include +#include +#include +#include + +struct ib_peer_memory_statistics { + atomic64_t num_alloc_mrs; + atomic64_t num_dealloc_mrs; + atomic64_t num_reg_pages; + atomic64_t num_dereg_pages; + atomic64_t num_reg_bytes; + atomic64_t num_dereg_bytes; + unsigned long num_free_callbacks; +}; + +struct ib_peer_memory_client { + struct kobject kobj; + refcount_t usecnt; + struct completion usecnt_zero; + const struct peer_memory_client *peer_mem; + struct list_head core_peer_list; + struct ib_peer_memory_statistics stats; + struct xarray umem_xa; + u32 xa_cyclic_next; + bool invalidation_required; +}; + +struct ib_umem_peer { + struct ib_umem umem; + struct kref kref; + /* peer memory that manages this umem */ + struct ib_peer_memory_client *ib_peer_client; + void *peer_client_context; + umem_invalidate_func_t invalidation_func; + void *invalidation_private; + struct mutex mapping_lock; + bool mapped; + u32 xa_id; +}; + +struct ib_umem *ib_peer_umem_get(struct ib_umem *old_umem, int old_ret, + unsigned long peer_mem_flags); +void ib_peer_umem_release(struct ib_umem *umem); + +#endif diff --git a/drivers/infiniband/core/peer_mem.c b/drivers/infiniband/core/peer_mem.c new file mode 100644 index 000000000000..833865578cb0 --- /dev/null +++ b/drivers/infiniband/core/peer_mem.c @@ -0,0 +1,484 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2014-2020, Mellanox Technologies. All rights reserved. + */ +#include +#include +#include +#include "ib_peer_mem.h" +static DEFINE_MUTEX(peer_memory_mutex); +static LIST_HEAD(peer_memory_list); +static struct kobject *peers_kobj; +#define PEER_NO_INVALIDATION_ID U32_MAX +static int ib_invalidate_peer_memory(void *reg_handle, u64 core_context); +struct peer_mem_attribute { + struct attribute attr; + ssize_t (*show)(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf); + ssize_t (*store)(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, const char *buf, + size_t count); +}; + +#define PEER_ATTR_RO(_name) \ + struct peer_mem_attribute peer_attr_ ## _name = __ATTR_RO(_name) + +static ssize_t version_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf(buf, PAGE_SIZE, "%s\n", + ib_peer_client->peer_mem->version); +} + +static PEER_ATTR_RO(version); +static ssize_t num_alloc_mrs_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_alloc_mrs)); +} + +static PEER_ATTR_RO(num_alloc_mrs); +static ssize_t +num_dealloc_mrs_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_dealloc_mrs)); +} + +static PEER_ATTR_RO(num_dealloc_mrs); +static ssize_t num_reg_pages_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_reg_pages)); +} + +static PEER_ATTR_RO(num_reg_pages); +static ssize_t +num_dereg_pages_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_dereg_pages)); +} + +static PEER_ATTR_RO(num_dereg_pages); +static ssize_t num_reg_bytes_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_reg_bytes)); +} + +static PEER_ATTR_RO(num_reg_bytes); +static ssize_t +num_dereg_bytes_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf( + buf, PAGE_SIZE, "%llu\n", + (u64)atomic64_read(&ib_peer_client->stats.num_dereg_bytes)); +} + +static PEER_ATTR_RO(num_dereg_bytes); +static ssize_t +num_free_callbacks_show(struct ib_peer_memory_client *ib_peer_client, + struct peer_mem_attribute *attr, char *buf) +{ + return scnprintf(buf, PAGE_SIZE, "%lu\n", + ib_peer_client->stats.num_free_callbacks); +} + +static PEER_ATTR_RO(num_free_callbacks); +static struct attribute *peer_mem_attrs[] = { + &peer_attr_version.attr, + &peer_attr_num_alloc_mrs.attr, + &peer_attr_num_dealloc_mrs.attr, + &peer_attr_num_reg_pages.attr, + &peer_attr_num_dereg_pages.attr, + &peer_attr_num_reg_bytes.attr, + &peer_attr_num_dereg_bytes.attr, + &peer_attr_num_free_callbacks.attr, + NULL, +}; + +static const struct attribute_group peer_mem_attr_group = { + .attrs = peer_mem_attrs, +}; + +static ssize_t peer_attr_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct peer_mem_attribute *peer_attr = + container_of(attr, struct peer_mem_attribute, attr); + if (!peer_attr->show) + return -EIO; + return peer_attr->show(container_of(kobj, struct ib_peer_memory_client, + kobj), + peer_attr, buf); +} + +static const struct sysfs_ops peer_mem_sysfs_ops = { + .show = peer_attr_show, +}; + +static void ib_peer_memory_client_release(struct kobject *kobj) +{ + struct ib_peer_memory_client *ib_peer_client = + container_of(kobj, struct ib_peer_memory_client, kobj); + kfree(ib_peer_client); +} + +static struct kobj_type peer_mem_type = { + .sysfs_ops = &peer_mem_sysfs_ops, + .release = ib_peer_memory_client_release, +}; + +static int ib_memory_peer_check_mandatory(const struct peer_memory_client + *peer_client) +{ +#define PEER_MEM_MANDATORY_FUNC(x) {offsetof(struct peer_memory_client, x), #x} + int i; + static const struct { + size_t offset; + char *name; + } mandatory_table[] = { + PEER_MEM_MANDATORY_FUNC(acquire), + PEER_MEM_MANDATORY_FUNC(get_pages), + PEER_MEM_MANDATORY_FUNC(put_pages), + PEER_MEM_MANDATORY_FUNC(dma_map), + PEER_MEM_MANDATORY_FUNC(dma_unmap), + }; + for (i = 0; i < ARRAY_SIZE(mandatory_table); ++i) { + if (!*(void **)((void *)peer_client + + mandatory_table[i].offset)) { + pr_err("Peer memory %s is missing mandatory function %s\n", + peer_client->name, mandatory_table[i].name); + return -EINVAL; + } + } + return 0; +} + +void * +ib_register_peer_memory_client(const struct peer_memory_client *peer_client, + invalidate_peer_memory *invalidate_callback) +{ + struct ib_peer_memory_client *ib_peer_client; + int ret; + if (ib_memory_peer_check_mandatory(peer_client)) + return NULL; + ib_peer_client = kzalloc(sizeof(*ib_peer_client), GFP_KERNEL); + if (!ib_peer_client) + return NULL; + kobject_init(&ib_peer_client->kobj, &peer_mem_type); + refcount_set(&ib_peer_client->usecnt, 1); + init_completion(&ib_peer_client->usecnt_zero); + ib_peer_client->peer_mem = peer_client; + xa_init_flags(&ib_peer_client->umem_xa, XA_FLAGS_ALLOC); + /* + * If the peer wants the invalidation_callback then all memory users + * linked to that peer must support invalidation. + */ + if (invalidate_callback) { + *invalidate_callback = ib_invalidate_peer_memory; + ib_peer_client->invalidation_required = true; + } + mutex_lock(&peer_memory_mutex); + if (!peers_kobj) { + /* Created under /sys/kernel/mm */ + peers_kobj = kobject_create_and_add("memory_peers", mm_kobj); + if (!peers_kobj) + goto err_unlock; + } + ret = kobject_add(&ib_peer_client->kobj, peers_kobj, peer_client->name); + if (ret) + goto err_parent; + ret = sysfs_create_group(&ib_peer_client->kobj, + &peer_mem_attr_group); + if (ret) + goto err_parent; + list_add_tail(&ib_peer_client->core_peer_list, &peer_memory_list); + mutex_unlock(&peer_memory_mutex); + return ib_peer_client; +err_parent: + if (list_empty(&peer_memory_list)) { + kobject_put(peers_kobj); + peers_kobj = NULL; + } +err_unlock: + mutex_unlock(&peer_memory_mutex); + kobject_put(&ib_peer_client->kobj); + return NULL; +} +EXPORT_SYMBOL(ib_register_peer_memory_client); + +void ib_unregister_peer_memory_client(void *reg_handle) +{ + struct ib_peer_memory_client *ib_peer_client = reg_handle; + mutex_lock(&peer_memory_mutex); + list_del(&ib_peer_client->core_peer_list); + if (list_empty(&peer_memory_list)) { + kobject_put(peers_kobj); + peers_kobj = NULL; + } + mutex_unlock(&peer_memory_mutex); + /* + * Wait for all umems to be destroyed before returning. Once + * ib_unregister_peer_memory_client() returns no umems will call any + * peer_mem ops. + */ + if (refcount_dec_and_test(&ib_peer_client->usecnt)) + complete(&ib_peer_client->usecnt_zero); + wait_for_completion(&ib_peer_client->usecnt_zero); + kobject_put(&ib_peer_client->kobj); +} +EXPORT_SYMBOL(ib_unregister_peer_memory_client); + +static struct ib_peer_memory_client * +ib_get_peer_client(unsigned long addr, size_t size, + unsigned long peer_mem_flags, void **peer_client_context) +{ + struct ib_peer_memory_client *ib_peer_client; + int ret = 0; + mutex_lock(&peer_memory_mutex); + list_for_each_entry(ib_peer_client, &peer_memory_list, + core_peer_list) { + if (ib_peer_client->invalidation_required && + (!(peer_mem_flags & IB_PEER_MEM_INVAL_SUPP))) + continue; + ret = ib_peer_client->peer_mem->acquire(addr, size, NULL, NULL, + peer_client_context); + if (ret > 0) { + refcount_inc(&ib_peer_client->usecnt); + mutex_unlock(&peer_memory_mutex); + return ib_peer_client; + } + } + mutex_unlock(&peer_memory_mutex); + return NULL; +} + +static void ib_put_peer_client(struct ib_peer_memory_client *ib_peer_client, + void *peer_client_context) +{ + if (ib_peer_client->peer_mem->release) + ib_peer_client->peer_mem->release(peer_client_context); + if (refcount_dec_and_test(&ib_peer_client->usecnt)) + complete(&ib_peer_client->usecnt_zero); +} + +static void ib_peer_umem_kref_release(struct kref *kref) +{ + kfree(container_of(kref, struct ib_umem_peer, kref)); +} + +static void ib_unmap_peer_client(struct ib_umem_peer *umem_p) +{ + struct ib_peer_memory_client *ib_peer_client = umem_p->ib_peer_client; + const struct peer_memory_client *peer_mem = ib_peer_client->peer_mem; + struct ib_umem *umem = &umem_p->umem; + + lockdep_assert_held(&umem_p->mapping_lock); + + peer_mem->dma_unmap(&umem_p->umem.sg_head, umem_p->peer_client_context, + umem_p->umem.ibdev->dma_device); + peer_mem->put_pages(&umem_p->umem.sg_head, umem_p->peer_client_context); + memset(&umem->sg_head, 0, sizeof(umem->sg_head)); + + atomic64_add(umem->nmap, &ib_peer_client->stats.num_dereg_pages); + atomic64_add(umem->length, &ib_peer_client->stats.num_dereg_bytes); + atomic64_inc(&ib_peer_client->stats.num_dealloc_mrs); + + if (umem_p->xa_id != PEER_NO_INVALIDATION_ID) + xa_store(&ib_peer_client->umem_xa, umem_p->xa_id, NULL, + GFP_KERNEL); + umem_p->mapped = false; +} + +static int ib_invalidate_peer_memory(void *reg_handle, u64 core_context) +{ + struct ib_peer_memory_client *ib_peer_client = reg_handle; + struct ib_umem_peer *umem_p; + + /* + * The client is not required to fence against invalidation during + * put_pages() as that would deadlock when we call put_pages() here. + * Thus the core_context cannot be a umem pointer as we have no control + * over the lifetime. Since we won't change the kABI for this to add a + * proper kref, an xarray is used. + */ + xa_lock(&ib_peer_client->umem_xa); + ib_peer_client->stats.num_free_callbacks += 1; + umem_p = xa_load(&ib_peer_client->umem_xa, core_context); + if (!umem_p) + goto out_unlock; + kref_get(&umem_p->kref); + xa_unlock(&ib_peer_client->umem_xa); + mutex_lock(&umem_p->mapping_lock); + if (umem_p->mapped) { + /* + * At this point the invalidation_func must be !NULL as the get + * flow does not unlock mapping_lock until it is set, and umems + * that do not require invalidation are not in the xarray. + */ + umem_p->invalidation_func(&umem_p->umem, + umem_p->invalidation_private); + ib_unmap_peer_client(umem_p); + } + mutex_unlock(&umem_p->mapping_lock); + kref_put(&umem_p->kref, ib_peer_umem_kref_release); + return 0; +out_unlock: + xa_unlock(&ib_peer_client->umem_xa); + return 0; +} + +void ib_umem_activate_invalidation_notifier(struct ib_umem *umem, + umem_invalidate_func_t func, + void *priv) +{ + struct ib_umem_peer *umem_p = + container_of(umem, struct ib_umem_peer, umem); + + if (WARN_ON(!umem->is_peer)) + return; + if (umem_p->xa_id == PEER_NO_INVALIDATION_ID) + return; + + umem_p->invalidation_func = func; + umem_p->invalidation_private = priv; + /* Pairs with the lock in ib_peer_umem_get() */ + mutex_unlock(&umem_p->mapping_lock); + + /* At this point func can be called asynchronously */ +} +EXPORT_SYMBOL(ib_umem_activate_invalidation_notifier); + +struct ib_umem *ib_peer_umem_get(struct ib_umem *old_umem, int old_ret, + unsigned long peer_mem_flags) +{ + struct ib_peer_memory_client *ib_peer_client; + void *peer_client_context; + struct ib_umem_peer *umem_p; + int ret; + ib_peer_client = + ib_get_peer_client(old_umem->address, old_umem->length, + peer_mem_flags, &peer_client_context); + if (!ib_peer_client) + return ERR_PTR(old_ret); + umem_p = kzalloc(sizeof(*umem_p), GFP_KERNEL); + if (!umem_p) { + ret = -ENOMEM; + goto err_client; + } + + kref_init(&umem_p->kref); + umem_p->umem = *old_umem; + memset(&umem_p->umem.sg_head, 0, sizeof(umem_p->umem.sg_head)); + umem_p->umem.is_peer = 1; + umem_p->ib_peer_client = ib_peer_client; + umem_p->peer_client_context = peer_client_context; + mutex_init(&umem_p->mapping_lock); + umem_p->xa_id = PEER_NO_INVALIDATION_ID; + + mutex_lock(&umem_p->mapping_lock); + if (ib_peer_client->invalidation_required) { + ret = xa_alloc_cyclic(&ib_peer_client->umem_xa, &umem_p->xa_id, + umem_p, + XA_LIMIT(0, PEER_NO_INVALIDATION_ID - 1), + &ib_peer_client->xa_cyclic_next, + GFP_KERNEL); + if (ret < 0) + goto err_umem; + } + + /* + * We always request write permissions to the pages, to force breaking + * of any CoW during the registration of the MR. For read-only MRs we + * use the "force" flag to indicate that CoW breaking is required but + * the registration should not fail if referencing read-only areas. + */ + ret = ib_peer_client->peer_mem->get_pages(umem_p->umem.address, + umem_p->umem.length, 1, + !umem_p->umem.writable, NULL, + peer_client_context, + umem_p->xa_id); + if (ret) + goto err_xa; + + umem_p->umem.page_shift = + ilog2(ib_peer_client->peer_mem->get_page_size(peer_client_context)); + + ret = ib_peer_client->peer_mem->dma_map(&umem_p->umem.sg_head, + peer_client_context, + umem_p->umem.ibdev->dma_device, + 0, &umem_p->umem.nmap); + if (ret) + goto err_pages; + + umem_p->mapped = true; + atomic64_add(umem_p->umem.nmap, &ib_peer_client->stats.num_reg_pages); + atomic64_add(umem_p->umem.length, &ib_peer_client->stats.num_reg_bytes); + atomic64_inc(&ib_peer_client->stats.num_alloc_mrs); + + /* + * If invalidation is allowed then the caller must call + * ib_umem_activate_invalidation_notifier() or ib_peer_umem_release() to + * unlock this mutex. The call to should be done after the last + * read to sg_head, once the caller is ready for the invalidation + * function to be called. + */ + if (umem_p->xa_id == PEER_NO_INVALIDATION_ID) + mutex_unlock(&umem_p->mapping_lock); + /* + * On success the old umem is replaced with the new, larger, allocation + */ + kfree(old_umem); + return &umem_p->umem; +err_pages: + ib_peer_client->peer_mem->put_pages(&umem_p->umem.sg_head, + umem_p->peer_client_context); +err_xa: + if (umem_p->xa_id != PEER_NO_INVALIDATION_ID) + xa_erase(&umem_p->ib_peer_client->umem_xa, umem_p->xa_id); +err_umem: + mutex_unlock(&umem_p->mapping_lock); + kref_put(&umem_p->kref, ib_peer_umem_kref_release); +err_client: + ib_put_peer_client(ib_peer_client, peer_client_context); + return ERR_PTR(ret); +} + +void ib_peer_umem_release(struct ib_umem *umem) +{ + struct ib_umem_peer *umem_p = + container_of(umem, struct ib_umem_peer, umem); + + /* invalidation_func being set indicates activate was called */ + if (umem_p->xa_id == PEER_NO_INVALIDATION_ID || + umem_p->invalidation_func) + mutex_lock(&umem_p->mapping_lock); + + if (umem_p->mapped) + ib_unmap_peer_client(umem_p); + mutex_unlock(&umem_p->mapping_lock); + + if (umem_p->xa_id != PEER_NO_INVALIDATION_ID) + xa_erase(&umem_p->ib_peer_client->umem_xa, umem_p->xa_id); + ib_put_peer_client(umem_p->ib_peer_client, umem_p->peer_client_context); + umem_p->ib_peer_client = NULL; + + /* Must match ib_umem_release() */ + atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm); + mmdrop(umem->owning_mm); + + kref_put(&umem_p->kref, ib_peer_umem_kref_release); +} diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 698c5359f643..e7473285e470 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -42,6 +42,7 @@ #include #include "uverbs.h" +#include "ib_peer_mem.h" static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) { @@ -193,15 +194,17 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, EXPORT_SYMBOL(ib_umem_find_best_pgsz); /** - * ib_umem_get - Pin and DMA map userspace memory. + * __ib_umem_get - Pin and DMA map userspace memory. * * @device: IB device to connect UMEM * @addr: userspace virtual address to start at * @size: length of region to pin * @access: IB_ACCESS_xxx flags for memory being pinned + * @peer_mem_flags: IB_PEER_MEM_xxx flags for memory being used */ -struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, - size_t size, int access) +struct ib_umem *__ib_umem_get(struct ib_device *device, + unsigned long addr, size_t size, int access, + unsigned long peer_mem_flags) { struct ib_umem *umem; struct page **page_list; @@ -309,6 +312,24 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, umem_release: __ib_umem_release(device, umem, 0); + /* + * If the address belongs to peer memory client, then the first + * call to get_user_pages will fail. In this case, try to get + * these pages from the peers. + */ + //FIXME: this placement is horrible + if (ret < 0 && peer_mem_flags & IB_PEER_MEM_ALLOW) { + struct ib_umem *new_umem; + + new_umem = ib_peer_umem_get(umem, ret, peer_mem_flags); + if (IS_ERR(new_umem)) { + ret = PTR_ERR(new_umem); + goto vma; + } + umem = new_umem; + ret = 0; + goto out; + } vma: atomic64_sub(ib_umem_num_pages(umem), &mm->pinned_vm); out: @@ -320,8 +341,23 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, } return ret ? ERR_PTR(ret) : umem; } + +struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, + size_t size, int access) +{ + return __ib_umem_get(device, addr, size, access, 0); +} EXPORT_SYMBOL(ib_umem_get); +struct ib_umem *ib_umem_get_peer(struct ib_device *device, unsigned long addr, + size_t size, int access, + unsigned long peer_mem_flags) +{ + return __ib_umem_get(device, addr, size, access, + IB_PEER_MEM_ALLOW | peer_mem_flags); +} +EXPORT_SYMBOL(ib_umem_get_peer); + /** * ib_umem_release - release memory pinned with ib_umem_get * @umem: umem struct to release @@ -333,6 +369,8 @@ void ib_umem_release(struct ib_umem *umem) if (umem->is_odp) return ib_umem_odp_release(to_ib_umem_odp(umem)); + if (umem->is_peer) + return ib_peer_umem_release(umem); __ib_umem_release(umem->ibdev, umem, 1); atomic64_sub(ib_umem_num_pages(umem), &umem->owning_mm->pinned_vm); diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 2f5ee37c252b..cd2241bb865a 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -733,8 +733,9 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata, *cqe_size = ucmd.cqe_size; cq->buf.umem = - ib_umem_get(&dev->ib_dev, ucmd.buf_addr, - entries * ucmd.cqe_size, IB_ACCESS_LOCAL_WRITE); + ib_umem_get_peer(&dev->ib_dev, ucmd.buf_addr, + entries * ucmd.cqe_size, + IB_ACCESS_LOCAL_WRITE, 0); if (IS_ERR(cq->buf.umem)) { err = PTR_ERR(cq->buf.umem); return err; @@ -1132,9 +1133,9 @@ static int resize_user(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq, if (ucmd.cqe_size && SIZE_MAX / ucmd.cqe_size <= entries - 1) return -EINVAL; - umem = ib_umem_get(&dev->ib_dev, ucmd.buf_addr, - (size_t)ucmd.cqe_size * entries, - IB_ACCESS_LOCAL_WRITE); + umem = ib_umem_get_peer(&dev->ib_dev, ucmd.buf_addr, + (size_t)ucmd.cqe_size * entries, + IB_ACCESS_LOCAL_WRITE, 0); if (IS_ERR(umem)) { err = PTR_ERR(umem); return err; diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c index c3b4b6586d17..f8f8507c7938 100644 --- a/drivers/infiniband/hw/mlx5/devx.c +++ b/drivers/infiniband/hw/mlx5/devx.c @@ -2143,7 +2143,7 @@ static int devx_umem_get(struct mlx5_ib_dev *dev, struct ib_ucontext *ucontext, if (err) return err; - obj->umem = ib_umem_get(&dev->ib_dev, addr, size, access); + obj->umem = ib_umem_get_peer(&dev->ib_dev, addr, size, access, 0); if (IS_ERR(obj->umem)) return PTR_ERR(obj->umem); diff --git a/drivers/infiniband/hw/mlx5/doorbell.c b/drivers/infiniband/hw/mlx5/doorbell.c index 61475b571531..a2a7e121ee5f 100644 --- a/drivers/infiniband/hw/mlx5/doorbell.c +++ b/drivers/infiniband/hw/mlx5/doorbell.c @@ -64,8 +64,8 @@ int mlx5_ib_db_map_user(struct mlx5_ib_ucontext *context, page->user_virt = (virt & PAGE_MASK); page->refcnt = 0; - page->umem = ib_umem_get(context->ibucontext.device, virt & PAGE_MASK, - PAGE_SIZE, 0); + page->umem = ib_umem_get_peer(context->ibucontext.device, virt & PAGE_MASK, + PAGE_SIZE, 0, 0); if (IS_ERR(page->umem)) { err = PTR_ERR(page->umem); kfree(page); diff --git a/drivers/infiniband/hw/mlx5/mem.c b/drivers/infiniband/hw/mlx5/mem.c index b5aece786b36..174567af5ddd 100644 --- a/drivers/infiniband/hw/mlx5/mem.c +++ b/drivers/infiniband/hw/mlx5/mem.c @@ -55,16 +55,17 @@ void mlx5_ib_cont_pages(struct ib_umem *umem, u64 addr, int i = 0; struct scatterlist *sg; int entry; + int page_shift = umem->is_peer ? umem->page_shift : PAGE_SHIFT; - addr = addr >> PAGE_SHIFT; + addr = addr >> page_shift; tmp = (unsigned long)addr; m = find_first_bit(&tmp, BITS_PER_LONG); if (max_page_shift) - m = min_t(unsigned long, max_page_shift - PAGE_SHIFT, m); + m = min_t(unsigned long, max_page_shift - page_shift, m); for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - len = sg_dma_len(sg) >> PAGE_SHIFT; - pfn = sg_dma_address(sg) >> PAGE_SHIFT; + len = sg_dma_len(sg) >> page_shift; + pfn = sg_dma_address(sg) >> page_shift; if (base + p != pfn) { /* If either the offset or the new * base are unaligned update m @@ -96,7 +97,7 @@ void mlx5_ib_cont_pages(struct ib_umem *umem, u64 addr, *ncont = 0; } - *shift = PAGE_SHIFT + m; + *shift = page_shift + m; *count = i; } diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 24daf420317e..2d075ca40bfc 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -41,6 +41,8 @@ #include #include "mlx5_ib.h" +static void mlx5_invalidate_umem(struct ib_umem *umem, void *priv); + enum { MAX_PENDING_REG_MR = 8, }; @@ -754,7 +756,7 @@ static int mr_cache_max_order(struct mlx5_ib_dev *dev) static int mr_umem_get(struct mlx5_ib_dev *dev, u64 start, u64 length, int access_flags, struct ib_umem **umem, int *npages, - int *page_shift, int *ncont, int *order) + int *page_shift, int *ncont, int *order, bool allow_peer) { struct ib_umem *u; @@ -779,7 +781,13 @@ static int mr_umem_get(struct mlx5_ib_dev *dev, u64 start, u64 length, if (order) *order = ilog2(roundup_pow_of_two(*ncont)); } else { - u = ib_umem_get(&dev->ib_dev, start, length, access_flags); + if (allow_peer) + u = ib_umem_get_peer(&dev->ib_dev, start, length, + access_flags, + IB_PEER_MEM_INVAL_SUPP); + else + u = ib_umem_get(&dev->ib_dev, start, length, + access_flags); if (IS_ERR(u)) { mlx5_ib_dbg(dev, "umem get failed (%ld)\n", PTR_ERR(u)); return PTR_ERR(u); @@ -1280,7 +1288,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, } err = mr_umem_get(dev, start, length, access_flags, &umem, - &npages, &page_shift, &ncont, &order); + &npages, &page_shift, &ncont, &order, true); if (err < 0) return ERR_PTR(err); @@ -1335,6 +1343,12 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, } } + if (umem->is_peer) { + ib_umem_activate_invalidation_notifier( + umem, mlx5_invalidate_umem, mr); + /* After this point the MR can be invalidated */ + } + if (is_odp_mr(mr)) { to_ib_umem_odp(mr->umem)->private = mr; atomic_set(&mr->num_pending_prefetch, 0); @@ -1412,6 +1426,10 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, atomic_sub(mr->npages, &dev->mdev->priv.reg_pages); + /* Peer memory isn't supported */ + if (mr->umem->is_peer) + return -EOPNOTSUPP; + if (!mr->umem) return -EINVAL; @@ -1435,7 +1453,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, ib_umem_release(mr->umem); mr->umem = NULL; err = mr_umem_get(dev, addr, len, access_flags, &mr->umem, - &npages, &page_shift, &ncont, &order); + &npages, &page_shift, &ncont, &order, false); if (err) goto err; } @@ -1615,13 +1633,14 @@ static void dereg_mr(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr) * We should unregister the DMA address from the HCA before * remove the DMA mapping. */ - mlx5_mr_cache_free(dev, mr); + if (mr->allocated_from_cache) + mlx5_mr_cache_free(dev, mr); + else + kfree(mr); + ib_umem_release(umem); if (umem) atomic_sub(npages, &dev->mdev->priv.reg_pages); - - if (!mr->allocated_from_cache) - kfree(mr); } int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) @@ -2331,3 +2350,15 @@ int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents, return n; } + +static void mlx5_invalidate_umem(struct ib_umem *umem, void *priv) +{ + struct mlx5_ib_mr *mr = priv; + + /* + * DMA is turned off for the mkey, but the mkey remains otherwise + * untouched until the normal flow of dereg_mr happens. Any access to + * this mkey will generate CQEs. + */ + unreg_umr(mr->dev ,mr); +} diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 45faab9e1313..be59c6d5ba1c 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -749,7 +749,7 @@ static int mlx5_ib_umem_get(struct mlx5_ib_dev *dev, struct ib_udata *udata, { int err; - *umem = ib_umem_get(&dev->ib_dev, addr, size, 0); + *umem = ib_umem_get_peer(&dev->ib_dev, addr, size, 0, 0); if (IS_ERR(*umem)) { mlx5_ib_dbg(dev, "umem_get failed\n"); return PTR_ERR(*umem); diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c index 6d1ff13d2283..2f55f7e1923d 100644 --- a/drivers/infiniband/hw/mlx5/srq.c +++ b/drivers/infiniband/hw/mlx5/srq.c @@ -80,7 +80,7 @@ static int create_srq_user(struct ib_pd *pd, struct mlx5_ib_srq *srq, srq->wq_sig = !!(ucmd.flags & MLX5_SRQ_FLAG_SIGNATURE); - srq->umem = ib_umem_get(pd->device, ucmd.buf_addr, buf_size, 0); + srq->umem = ib_umem_get_peer(pd->device, ucmd.buf_addr, buf_size, 0, 0); if (IS_ERR(srq->umem)) { mlx5_ib_dbg(dev, "failed umem get, size %d\n", buf_size); err = PTR_ERR(srq->umem); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 9353910915d4..ec9824cbf49d 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -48,10 +48,19 @@ struct ib_umem { unsigned long address; u32 writable : 1; u32 is_odp : 1; + /* Placing at the end of the bitfield list is ABI preserving on LE */ + u32 is_peer : 1; struct work_struct work; struct sg_table sg_head; int nmap; unsigned int sg_nents; + unsigned int page_shift; +}; + +typedef void (*umem_invalidate_func_t)(struct ib_umem *umem, void *priv); +enum ib_peer_mem_flags { + IB_PEER_MEM_ALLOW = 1 << 0, + IB_PEER_MEM_INVAL_SUPP = 1 << 1, }; /* Returns the offset of the umem start relative to the first page. */ @@ -79,6 +88,13 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, unsigned long pgsz_bitmap, unsigned long virt); +struct ib_umem *ib_umem_get_peer(struct ib_device *device, unsigned long addr, + size_t size, int access, + unsigned long peer_mem_flags); +void ib_umem_activate_invalidation_notifier(struct ib_umem *umem, + umem_invalidate_func_t func, + void *cookie); + #else /* CONFIG_INFINIBAND_USER_MEM */ #include @@ -102,6 +118,19 @@ static inline unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, return 0; } +static inline struct ib_umem *ib_umem_get_peer(struct ib_device *device, + unsigned long addr, size_t size, + int access, + unsigned long peer_mem_flags) +{ + return ERR_PTR(-EINVAL); +} + +static inline void ib_umem_activate_invalidation_notifier( + struct ib_umem *umem, umem_invalidate_func_t func, void *cookie) +{ +} + #endif /* CONFIG_INFINIBAND_USER_MEM */ #endif /* IB_UMEM_H */ diff --git a/include/rdma/peer_mem.h b/include/rdma/peer_mem.h new file mode 100644 index 000000000000..563a820dbc32 --- /dev/null +++ b/include/rdma/peer_mem.h @@ -0,0 +1,165 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* + * Copyright (c) 2014-2020, Mellanox Technologies. All rights reserved. + */ +#ifndef RDMA_PEER_MEM_H +#define RDMA_PEER_MEM_H + +#include + +#define IB_PEER_MEMORY_NAME_MAX 64 +#define IB_PEER_MEMORY_VER_MAX 16 + +/* + * Prior versions used a void * for core_context, at some point this was + * switched to use u64. Be careful if compiling this as 32 bit. To help the + * value of core_context is limited to u32 so it should work OK despite the + * type change. + */ +#define PEER_MEM_U64_CORE_CONTEXT + +struct device; + +/** + * struct peer_memory_client - registration information for user virtual + * memory handlers + * + * The peer_memory_client scheme allows a driver to register with the ib_umem + * system that it has the ability to understand user virtual address ranges + * that are not compatible with get_user_pages(). For instance VMAs created + * with io_remap_pfn_range(), or other driver special VMA. + * + * For ranges the interface understands it can provide a DMA mapped sg_table + * for use by the ib_umem, allowing user virtual ranges that cannot be + * supported by get_user_pages() to be used as umems. + */ +struct peer_memory_client { + char name[IB_PEER_MEMORY_NAME_MAX]; + char version[IB_PEER_MEMORY_VER_MAX]; + + /** + * acquire - Begin working with a user space virtual address range + * + * @addr - Virtual address to be checked whether belongs to peer. + * @size - Length of the virtual memory area starting at addr. + * @peer_mem_private_data - Obsolete, always NULL + * @peer_mem_name - Obsolete, always NULL + * @client_context - Returns an opaque value for this acquire use in + * other APIs + * + * Returns 1 if the peer_memory_client supports the entire virtual + * address range, 0 or -ERRNO otherwise. If 1 is returned then + * release() will be called to release the acquire(). + */ + int (*acquire)(unsigned long addr, size_t size, + void *peer_mem_private_data, char *peer_mem_name, + void **client_context); + /** + * get_pages - Fill in the first part of a sg_table for a virtual + * address range + * + * @addr - Virtual address to be checked whether belongs to peer. + * @size - Length of the virtual memory area starting at addr. + * @write - Always 1 + * @force - 1 if write is required + * @sg_head - Obsolete, always NULL + * @client_context - Value returned by acquire() + * @core_context - Value to be passed to invalidate_peer_memory for + * this get + * + * addr/size are passed as the raw virtual address range requested by + * the user, it is not aligned to any page size. get_pages() is always + * followed by dma_map(). + * + * Upon return the caller can call the invalidate_callback(). + * + * Returns 0 on success, -ERRNO on failure. After success put_pages() + * will be called to return the pages. + */ + int (*get_pages)(unsigned long addr, size_t size, int write, int force, + struct sg_table *sg_head, void *client_context, + u64 core_context); + /** + * dma_map - Create a DMA mapped sg_table + * + * @sg_head - The sg_table to allocate + * @client_context - Value returned by acquire() + * @dma_device - The device that will be doing DMA from these addresses + * @dmasync - Obsolete, always 0 + * @nmap - Returns the number of dma mapped entries in the sg_head + * + * Must be called after get_pages(). This must fill in the sg_head with + * DMA mapped SGLs for dma_device. Each SGL start and end must meet a + * minimum alignment of at least PAGE_SIZE, though individual sgls can + * be multiples of PAGE_SIZE, in any mixture. Since the user virtual + * address/size are not page aligned, the implementation must increase + * it to the logical alignment when building the SGLs. + * + * Returns 0 on success, -ERRNO on failure. After success dma_unmap() + * will be called to unmap the pages. On failure sg_head must be left + * untouched or point to a valid sg_table. + */ + int (*dma_map)(struct sg_table *sg_head, void *client_context, + struct device *dma_device, int dmasync, int *nmap); + /** + * dma_unmap - Unmap a DMA mapped sg_table + * + * @sg_head - The sg_table to unmap + * @client_context - Value returned by acquire() + * @dma_device - The device that will be doing DMA from these addresses + * + * sg_head will not be touched after this function returns. + * + * Must return 0. + */ + int (*dma_unmap)(struct sg_table *sg_head, void *client_context, + struct device *dma_device); + /** + * put_pages - Unpin a SGL + * + * @sg_head - The sg_table to unpin + * @client_context - Value returned by acquire() + * + * sg_head must be freed on return. + */ + void (*put_pages)(struct sg_table *sg_head, void *client_context); + /* Obsolete, not used */ + unsigned long (*get_page_size)(void *client_context); + /** + * release - Undo acquire + * + * @client_context - Value returned by acquire() + * + * If acquire() returns 1 then release() must be called. All + * get_pages() and dma_map()'s must be undone before calling this + * function. + */ + void (*release)(void *client_context); +}; + +/* + * If invalidate_callback() is non-NULL then the client will only support + * umems which can be invalidated. The caller may call the + * invalidate_callback() after acquire() on return the range will no longer + * have DMA active, and release() will have been called. + * + * Note: The implementation locking must ensure that get_pages(), and + * dma_map() do not have locking dependencies with invalidate_callback(). The + * ib_core will wait until any concurrent get_pages() or dma_map() completes + * before returning. + * + * Similarly, this can call dma_unmap(), put_pages() and release() from within + * the callback, or will wait for another thread doing those operations to + * complete. + * + * For these reasons the user of invalidate_callback() must be careful with + * locking. + */ +typedef int (*invalidate_peer_memory)(void *reg_handle, u64 core_context); + +void * +ib_register_peer_memory_client(const struct peer_memory_client *peer_client, + invalidate_peer_memory *invalidate_callback); +void ib_unregister_peer_memory_client(void *reg_handle); + +#endif