Message ID | 1497446944-15759-1-git-send-email-clombard@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Salut Christophe, A few comments below, nothing major... Le 14/06/2017 à 15:29, Christophe Lombard a écrit : > This patch exports a in-kernel 'library' API which can be called by > other drivers to help interacting with an IBM XSL on a POWER9 system. > > The XSL (Translation Service Layer) is a stripped down version of the > PSL (Power Service Layer) used in some cards such as the Mellanox CX5. > Like the PSL, it implements the CAIA architecture, but has a number > of differences, mostly in it's implementation dependent registers. > > The XSL also uses a special DMA cxl mode, which uses a slightly > different init sequence for the CAPP and PHB. > > Changelog[v2] > - Rebase to latest upstream. > - Return -EFAULT in case of NULL pointer in cxllib_handle_fault(). > - Reverse parameters when copro_handle_mm_fault() is called. > The change log shouldn't be part of the commit message, but below the next "---" > +++ b/drivers/misc/cxl/cxllib.c > @@ -0,0 +1,241 @@ > +/* > + * Copyright 2016 IBM Corp. Year need to be updated > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include <linux/hugetlb.h> > +#include <linux/sched/mm.h> > +#include "cxl.h" > +#include <misc/cxllib.h> > +#include <asm/pnv-pci.h> Ordering of the #include is messy: #include <linux/hugetlb.h> #include <linux/sched/mm.h> #include <asm/pnv-pci.h> #include <misc/cxllib.h> #include "cxl.h" > +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags) > +{ > + int rc; > + > + if (flags) > + return -EINVAL; > + > + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64)); > + return rc; > +} > +EXPORT_SYMBOL_GPL(cxllib_set_device_dma); A comment in cxllib_set_device_dma() would help: /* * When switching the PHB to capi mode, the TVT#1 entry for * the Partitionable Endpoint is set in bypass mode, like * in PCI mode. * Configure the device dma to use TVT#1, which is done * by calling dma_set_mask() with a mask large enough. */ > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr) > +{ > + struct mm_struct *mm = NULL; > + > + if (translation_mode != CXL_TRANSLATED_MODE && > + translation_mode != CXL_REAL_MODE) > + return -EINVAL; > + > + attr->sr = cxl_calculate_sr(false /* master */, > + task == NULL /* kernel ctx */, > + translation_mode == CXL_REAL_MODE, > + true /* p9 */); > + attr->lpid = mfspr(SPRN_LPID); > + if (task) { > + mm = get_task_mm(task); > + if (mm == NULL) > + return -EINVAL; > + /* > + * Caller is keeping a reference on mm_users for as long > + * as XSL uses the memory context > + */ > + attr->pid = mm->context.id; > + mmput(mm); > + } else { > + attr->pid = 0; > + } > + attr->tid = 0; We'll need to remember to patch that function as welll (attr->tid) when we add support for as_notify (even though Mellanox is not expected to use as_notify in capi mode, just pci I believe) > +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) > +{ > + int rc; > + u64 dar; > + struct vm_area_struct *vma = NULL; > + unsigned long page_size; > + > + if (mm == NULL) > + return -EFAULT; > + > + down_read(&mm->mmap_sem); > + > + for (dar = addr; dar < addr + size; dar += page_size) { > + if (!vma || dar < vma->vm_start || dar > vma->vm_end) { > + vma = find_vma(mm, addr); > + if (!vma) { > + pr_err("Can't find vma for addr %016llx\n", addr); > + rc = -EFAULT; > + goto out; > + } > + /* get the size of the pages allocated */ > + page_size = vma_kernel_pagesize(vma); > + } > + > + rc = cxl_handle_page_fault(true, mm, flags, dar); Why do we pass "true" for kernel parameter? Actually do we even need a kernel input parameter for cxl_handle_page_fault() ? It seems that we can infer it based on mm. If NULL, then we are in kernel space. > diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c > index c79e39b..9db63f3 100644 > --- a/drivers/misc/cxl/fault.c > +++ b/drivers/misc/cxl/fault.c > @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct cxl_context *ctx, > return IRQ_HANDLED; > } > > -static void cxl_handle_page_fault(struct cxl_context *ctx, > - struct mm_struct *mm, u64 dsisr, u64 dar) > +int cxl_handle_page_fault(bool kernel_context, > + struct mm_struct *mm, u64 dsisr, u64 dar) > { > unsigned flt = 0; > int result; > unsigned long access, flags, inv_flags = 0; > > - trace_cxl_pte_miss(ctx, dsisr, dar); > - > if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) { > pr_devel("copro_handle_mm_fault failed: %#x\n", result); > - return cxl_ack_ae(ctx); > + return result; > } > > if (!radix_enabled()) { > @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > access |= _PAGE_WRITE; > > access |= _PAGE_PRIVILEGED; > - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID)) > + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID)) > access &= ~_PAGE_PRIVILEGED; > > if (dsisr & DSISR_NOHPTE) > @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > hash_page_mm(mm, dar, access, 0x300, inv_flags); > local_irq_restore(flags); > } > - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + return 0; > } > > /* > @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct *fault_work) > > if (cxl_is_segment_miss(ctx, dsisr)) > cxl_handle_segment_miss(ctx, mm, dar); > - else if (cxl_is_page_fault(ctx, dsisr)) > - cxl_handle_page_fault(ctx, mm, dsisr, dar); > - else > + else if (cxl_is_page_fault(ctx, dsisr)) { > + trace_cxl_pte_miss(ctx, dsisr, dar); > + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) { > + cxl_ack_ae(ctx); > + } else { > + pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + } Could we have that code in a wrapper before calling cxl_handle_page_fault()? It would keep the code cleaner and in line with what we do for cxl_handle_segment_miss(). > +++ b/include/misc/cxllib.h > @@ -0,0 +1,132 @@ > +/* > + * Copyright 2016 IBM Corp. Year update. > +/* > + * Get the Process Element structure for the given thread > + * > + * Input: > + * pid: points the struct pid for the given thread (i.e. linux pid) > + * translation_mode: whether addresses should be translated > + */ > +struct cxllib_pe_attributes { > + u64 sr; > + u32 lpid; > + u32 tid; > + u32 pid; > +}; > +#define CXL_TRANSLATED_MODE 0 > +#define CXL_REAL_MODE 1 > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr); Description in comment no longer matches reality. /* * Get the Process Element structure for the given thread * * Input: * task: task_struct for the context of the translation * translation_mode: whether addresses should be translated * Output: * attr: attributes to fill up the Process Element structure from CAIA */ Thanks, Fred
On 14/06/17 23:29, Christophe Lombard wrote: > This patch exports a in-kernel 'library' API which can be called by > other drivers to help interacting with an IBM XSL on a POWER9 system. > > The XSL (Translation Service Layer) is a stripped down version of the > PSL (Power Service Layer) used in some cards such as the Mellanox CX5. > Like the PSL, it implements the CAIA architecture, but has a number > of differences, mostly in it's implementation dependent registers. > > The XSL also uses a special DMA cxl mode, which uses a slightly > different init sequence for the CAPP and PHB. > > Changelog[v2] > - Rebase to latest upstream. > - Return -EFAULT in case of NULL pointer in cxllib_handle_fault(). > - Reverse parameters when copro_handle_mm_fault() is called. > > Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> For the parts of this I wrote: Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> I'll skip the comments that Fred's already made, some minor comments below. > --- > > This applies on top of this patch: > http://patchwork.ozlabs.org/patch/775322/ > --- > arch/powerpc/include/asm/opal-api.h | 1 + > drivers/misc/cxl/Kconfig | 5 + > drivers/misc/cxl/Makefile | 2 +- > drivers/misc/cxl/cxl.h | 7 ++ > drivers/misc/cxl/cxllib.c | 241 ++++++++++++++++++++++++++++++++++++ > drivers/misc/cxl/fault.c | 25 ++-- > drivers/misc/cxl/native.c | 16 ++- > drivers/misc/cxl/pci.c | 41 +++--- > include/misc/cxllib.h | 132 ++++++++++++++++++++ > 9 files changed, 439 insertions(+), 31 deletions(-) > create mode 100644 drivers/misc/cxl/cxllib.c > create mode 100644 include/misc/cxllib.h > > diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h > index cb3e624..3e0be78 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -877,6 +877,7 @@ enum { > OPAL_PHB_CAPI_MODE_SNOOP_OFF = 2, > OPAL_PHB_CAPI_MODE_SNOOP_ON = 3, > OPAL_PHB_CAPI_MODE_DMA = 4, > + OPAL_PHB_CAPI_MODE_DMA_TVT1 = 5, > }; > > /* OPAL I2C request */ > diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig > index b75cf83..93397cb 100644 > --- a/drivers/misc/cxl/Kconfig > +++ b/drivers/misc/cxl/Kconfig > @@ -11,11 +11,16 @@ config CXL_AFU_DRIVER_OPS > bool > default n > > +config CXL_LIB > + bool > + default n > + How necessary is this? Are there any drivers using cxllib that we're trying to get in during this cycle? > config CXL > tristate "Support for IBM Coherent Accelerators (CXL)" > depends on PPC_POWERNV && PCI_MSI && EEH > select CXL_BASE > select CXL_AFU_DRIVER_OPS > + select CXL_LIB > default m > help > Select this option to enable driver support for IBM Coherent > diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile > index c14fd6b..0b5fd74 100644 > --- a/drivers/misc/cxl/Makefile > +++ b/drivers/misc/cxl/Makefile > @@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR) += -Werror > > cxl-y += main.o file.o irq.o fault.o native.o > cxl-y += context.o sysfs.o pci.o trace.o > -cxl-y += vphb.o phb.o api.o > +cxl-y += vphb.o phb.o api.o cxllib.o > cxl-$(CONFIG_PPC_PSERIES) += flash.o guest.o of.o hcalls.o > cxl-$(CONFIG_DEBUG_FS) += debugfs.o > obj-$(CONFIG_CXL) += cxl.o > diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h > index a03f8e7..81e01f0 100644 > --- a/drivers/misc/cxl/cxl.h > +++ b/drivers/misc/cxl/cxl.h > @@ -1010,6 +1010,8 @@ static inline void cxl_debugfs_add_afu_regs_psl8(struct cxl_afu *afu, struct den > > void cxl_handle_fault(struct work_struct *work); > void cxl_prefault(struct cxl_context *ctx, u64 wed); > +int cxl_handle_page_fault(bool kernel_context, struct mm_struct *mm, > + u64 dsisr, u64 dar); > > struct cxl *get_cxl_adapter(int num); > int cxl_alloc_sst(struct cxl_context *ctx); > @@ -1061,6 +1063,11 @@ int cxl_afu_slbia(struct cxl_afu *afu); > int cxl_data_cache_flush(struct cxl *adapter); > int cxl_afu_disable(struct cxl_afu *afu); > int cxl_psl_purge(struct cxl_afu *afu); > +int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid, > + u32 *phb_index, u64 *capp_unit_id); > +int cxl_slot_is_switched(struct pci_dev *dev); > +int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg); > +u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9); > > void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx); > void cxl_native_irq_dump_regs_psl8(struct cxl_context *ctx); > diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c > new file mode 100644 > index 0000000..63b6280 > --- /dev/null > +++ b/drivers/misc/cxl/cxllib.c > @@ -0,0 +1,241 @@ > +/* > + * Copyright 2016 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#include <linux/hugetlb.h> > +#include <linux/sched/mm.h> > +#include "cxl.h" > +#include <misc/cxllib.h> > +#include <asm/pnv-pci.h> > + > +#define CXL_INVALID_DRA ~0ull > +#define CXL_DUMMY_READ_SIZE 128 > +#define CXL_DUMMY_READ_ALIGN 8 > +#define CXL_CAPI_WINDOW_START 0x2000000000000ull > +#define CXL_CAPI_WINDOW_LOG_SIZE 48 > +#define CXL_XSL_CONFIG_CURRENT_VERSION CXL_XSL_CONFIG_VERSION1 > + > + > +bool cxllib_slot_is_supported(struct pci_dev *dev, unsigned long flags) > +{ > + int rc; > + u32 phb_index; > + u64 chip_id, capp_unit_id; > + > + /* Currently none supported */ "No flags currently supported" would be clearer > + if (flags) > + return false; > + > + if (!cpu_has_feature(CPU_FTR_HVMODE)) > + return false; > + > + if (!cxl_is_power9()) > + return false; > + > + if (cxl_slot_is_switched(dev)) > + return false; > + > + /* on p9, some pci slots are not connected to a CAPP unit */ > + rc = cxl_calc_capp_routing(dev, &chip_id, &phb_index, &capp_unit_id); > + if (rc) > + return false; > + > + return true; > +} > +EXPORT_SYMBOL_GPL(cxllib_slot_is_supported); > + > +static DEFINE_MUTEX(dra_mutex); > +static u64 dummy_read_addr = CXL_INVALID_DRA; > + > +int allocate_dummy_read_buf(void) static? > +{ > + u64 buf, vaddr; > + size_t buf_size; > + > + /* > + * Dummy read buffer is 128-byte long, aligned on a > + * 256-byte boundary and we need the physical address. > + */ > + buf_size = CXL_DUMMY_READ_SIZE + (1ull << CXL_DUMMY_READ_ALIGN); > + buf = (u64) kzalloc(buf_size, GFP_KERNEL); > + if (!buf) > + return -ENOMEM; > + > + vaddr = (buf + (1ull << CXL_DUMMY_READ_ALIGN) - 1) & > + (~0ull << CXL_DUMMY_READ_ALIGN); > + > + WARN((vaddr + CXL_DUMMY_READ_SIZE) > (buf + buf_size), > + "Dummy read buffer alignment issue"); > + dummy_read_addr = virt_to_phys((void *) vaddr); > + return 0; > +} > + > +int cxllib_get_xsl_config(struct pci_dev *dev, struct cxllib_xsl_config *cfg) > +{ > + int rc; > + u32 phb_index; > + u64 chip_id, capp_unit_id; > + > + if (!cpu_has_feature(CPU_FTR_HVMODE)) > + return -EINVAL; > + > + mutex_lock(&dra_mutex); > + if (dummy_read_addr == CXL_INVALID_DRA) { > + rc = allocate_dummy_read_buf(); > + if (rc) { > + mutex_unlock(&dra_mutex); > + return rc; > + } > + } > + mutex_unlock(&dra_mutex); > + > + rc = cxl_calc_capp_routing(dev, &chip_id, &phb_index, &capp_unit_id); > + if (rc) > + return rc; > + > + rc = cxl_get_xsl9_dsnctl(capp_unit_id, &cfg->dsnctl); > + if (rc) > + return rc; > + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { > + /* workaround for DD1 - nbwind = capiind */ > + cfg->dsnctl |= ((u64)0x02 << (63-47)); > + } > + > + cfg->version = CXL_XSL_CONFIG_CURRENT_VERSION; > + cfg->log_bar_size = CXL_CAPI_WINDOW_LOG_SIZE; > + cfg->bar_addr = CXL_CAPI_WINDOW_START; > + cfg->dra = dummy_read_addr; > + return 0; > +} > +EXPORT_SYMBOL_GPL(cxllib_get_xsl_config); > + > + > +int cxllib_switch_phb_mode(struct pci_dev *dev, enum cxllib_mode mode, > + unsigned long flags) > +{ > + int rc = 0; > + > + if (!cpu_has_feature(CPU_FTR_HVMODE)) > + return -EINVAL; > + > + switch (mode) { > + case CXL_MODE_PCI: > + /* > + * We currently don't support going back to PCI mode > + * However, we'll turn the invalidations off, so that > + * the CX5 firmware doesn't have to ack them and can do > + * things like reset, etc.. with no worries. > + * So always return EPERM (can't go back to PCI) or > + * EBUSY if we couldn't even turn off snooping > + */ This comment shouldn't be CX5 specific > + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_OFF); > + if (rc) > + rc = -EBUSY; > + else > + rc = -EPERM; > + break; > + case CXL_MODE_CXL: > + /* DMA only supported on TVT1 for the time being */ > + if (flags != CXL_MODE_DMA_TVT1) > + return -EINVAL; > + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_DMA_TVT1); > + if (rc) > + return rc; > + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON); > + break; > + default: > + rc = -EINVAL; > + } > + return rc; > +} > +EXPORT_SYMBOL_GPL(cxllib_switch_phb_mode); > + > + > +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags) > +{ > + int rc; > + > + if (flags) > + return -EINVAL; > + > + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64)); > + return rc; > +} > +EXPORT_SYMBOL_GPL(cxllib_set_device_dma); > + > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr) > +{ > + struct mm_struct *mm = NULL; > + > + if (translation_mode != CXL_TRANSLATED_MODE && > + translation_mode != CXL_REAL_MODE) > + return -EINVAL; > + > + attr->sr = cxl_calculate_sr(false /* master */, > + task == NULL /* kernel ctx */, > + translation_mode == CXL_REAL_MODE, > + true /* p9 */); nitpicking: comments in the middle of a line feels slightly ugly to me :) > + attr->lpid = mfspr(SPRN_LPID); > + if (task) { > + mm = get_task_mm(task); > + if (mm == NULL) > + return -EINVAL; > + /* > + * Caller is keeping a reference on mm_users for as long > + * as XSL uses the memory context > + */ > + attr->pid = mm->context.id; > + mmput(mm); > + } else { > + attr->pid = 0; > + } > + attr->tid = 0; > + return 0; > +} > +EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes); > + > + > +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) > +{ > + int rc; > + u64 dar; > + struct vm_area_struct *vma = NULL; > + unsigned long page_size; > + > + if (mm == NULL) > + return -EFAULT; > + > + down_read(&mm->mmap_sem); > + > + for (dar = addr; dar < addr + size; dar += page_size) { > + if (!vma || dar < vma->vm_start || dar > vma->vm_end) { > + vma = find_vma(mm, addr); > + if (!vma) { > + pr_err("Can't find vma for addr %016llx\n", addr); > + rc = -EFAULT; > + goto out; > + } > + /* get the size of the pages allocated */ > + page_size = vma_kernel_pagesize(vma); > + } > + > + rc = cxl_handle_page_fault(true, mm, flags, dar); > + if (rc) { > + pr_err("_cxl_handle_page_fault failed %d", rc); Get rid of the initial underscore, that's just left over from an early version of this code > + rc = -EFAULT; > + goto out; > + } > + } > + rc = 0; > +out: > + up_read(&mm->mmap_sem); > + return rc; > +} > +EXPORT_SYMBOL_GPL(cxllib_handle_fault); > diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c > index c79e39b..9db63f3 100644 > --- a/drivers/misc/cxl/fault.c > +++ b/drivers/misc/cxl/fault.c > @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct cxl_context *ctx, > return IRQ_HANDLED; > } > > -static void cxl_handle_page_fault(struct cxl_context *ctx, > - struct mm_struct *mm, u64 dsisr, u64 dar) > +int cxl_handle_page_fault(bool kernel_context, > + struct mm_struct *mm, u64 dsisr, u64 dar) If we don't get rid of kernel_context per Fred, it would look cleaner to move it to the end of the argument list. > { > unsigned flt = 0; > int result; > unsigned long access, flags, inv_flags = 0; > > - trace_cxl_pte_miss(ctx, dsisr, dar); > - > if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) { > pr_devel("copro_handle_mm_fault failed: %#x\n", result); > - return cxl_ack_ae(ctx); > + return result; > } > > if (!radix_enabled()) { > @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > access |= _PAGE_WRITE; > > access |= _PAGE_PRIVILEGED; > - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID)) > + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID)) > access &= ~_PAGE_PRIVILEGED; While we're here... if (kernel_context && (REGION_ID(dar) != USER_REGION_ID)) access |= _PAGE_PRIVILEGED; > > if (dsisr & DSISR_NOHPTE) > @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, > hash_page_mm(mm, dar, access, 0x300, inv_flags); > local_irq_restore(flags); > } > - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + return 0; > } > > /* > @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct *fault_work) > > if (cxl_is_segment_miss(ctx, dsisr)) > cxl_handle_segment_miss(ctx, mm, dar); > - else if (cxl_is_page_fault(ctx, dsisr)) > - cxl_handle_page_fault(ctx, mm, dsisr, dar); > - else > + else if (cxl_is_page_fault(ctx, dsisr)) { > + trace_cxl_pte_miss(ctx, dsisr, dar); > + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) { > + cxl_ack_ae(ctx); > + } else { > + pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); > + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); > + } > + } else > WARN(1, "cxl_handle_fault has nothing to handle\n"); > > if (mm) > diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c > index 2b2f889..4a82c31 100644 > --- a/drivers/misc/cxl/native.c > +++ b/drivers/misc/cxl/native.c > @@ -586,17 +586,17 @@ static int activate_afu_directed(struct cxl_afu *afu) > #define set_endian(sr) ((sr) &= ~(CXL_PSL_SR_An_LE)) > #endif > > -static u64 calculate_sr(struct cxl_context *ctx) > +u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9) > { > u64 sr = 0; > > set_endian(sr); > - if (ctx->master) > + if (master) > sr |= CXL_PSL_SR_An_MP; > if (mfspr(SPRN_LPCR) & LPCR_TC) > sr |= CXL_PSL_SR_An_TC; > - if (ctx->kernel) { > - if (!ctx->real_mode) > + if (kernel) { > + if (!real_mode) > sr |= CXL_PSL_SR_An_R; > sr |= (mfmsr() & MSR_SF) | CXL_PSL_SR_An_HV; > } else { > @@ -608,7 +608,7 @@ static u64 calculate_sr(struct cxl_context *ctx) > if (!test_tsk_thread_flag(current, TIF_32BIT)) > sr |= CXL_PSL_SR_An_SF; > } > - if (cxl_is_power9()) { > + if (p9) { > if (radix_enabled()) > sr |= CXL_PSL_SR_An_XLAT_ror; > else > @@ -617,6 +617,12 @@ static u64 calculate_sr(struct cxl_context *ctx) > return sr; > } > > +static u64 calculate_sr(struct cxl_context *ctx) > +{ > + return cxl_calculate_sr(ctx->master, ctx->kernel, ctx->real_mode, > + cxl_is_power9()); > +} > + > static void update_ivtes_directed(struct cxl_context *ctx) > { > bool need_update = (ctx->status == STARTED); > diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c > index 1eb9859..d18b3d9 100644 > --- a/drivers/misc/cxl/pci.c > +++ b/drivers/misc/cxl/pci.c > @@ -375,7 +375,7 @@ static u64 get_capp_unit_id(struct device_node *np, u32 phb_index) > return 0; > } > > -static int calc_capp_routing(struct pci_dev *dev, u64 *chipid, > +int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid, > u32 *phb_index, u64 *capp_unit_id) > { > int rc; > @@ -408,17 +408,9 @@ static int calc_capp_routing(struct pci_dev *dev, u64 *chipid, > return 0; > } > > -static int init_implementation_adapter_regs_psl9(struct cxl *adapter, struct pci_dev *dev) > +int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg) > { > - u64 xsl_dsnctl, psl_fircntl; > - u64 chipid; > - u32 phb_index; > - u64 capp_unit_id; > - int rc; > - > - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > - if (rc) > - return rc; > + u64 xsl_dsnctl; > > /* > * CAPI Identifier bits [0:7] > @@ -454,6 +446,27 @@ static int init_implementation_adapter_regs_psl9(struct cxl *adapter, struct pci > xsl_dsnctl |= ((u64)0x04 << (63-55)); > } > > + *reg = xsl_dsnctl; > + return 0; > +} > + > +static int init_implementation_adapter_regs_psl9(struct cxl *adapter, > + struct pci_dev *dev) > +{ > + u64 xsl_dsnctl, psl_fircntl; > + u64 chipid; > + u32 phb_index; > + u64 capp_unit_id; > + int rc; > + > + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > + if (rc) > + return rc; > + > + rc = cxl_get_xsl9_dsnctl(capp_unit_id, &xsl_dsnctl); > + if (rc) > + return rc; > + > cxl_p1_write(adapter, CXL_XSL9_DSNCTL, xsl_dsnctl); > > /* Set fir_cntl to recommended value for production env */ > @@ -505,7 +518,7 @@ static int init_implementation_adapter_regs_psl8(struct cxl *adapter, struct pci > u64 capp_unit_id; > int rc; > > - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > if (rc) > return rc; > > @@ -538,7 +551,7 @@ static int init_implementation_adapter_regs_xsl(struct cxl *adapter, struct pci_ > u64 capp_unit_id; > int rc; > > - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); > if (rc) > return rc; > > @@ -1897,7 +1910,7 @@ static void cxl_pci_remove_adapter(struct cxl *adapter) > > #define CXL_MAX_PCIEX_PARENT 2 > > -static int cxl_slot_is_switched(struct pci_dev *dev) > +int cxl_slot_is_switched(struct pci_dev *dev) > { > struct device_node *np; > int depth = 0; > diff --git a/include/misc/cxllib.h b/include/misc/cxllib.h > new file mode 100644 > index 0000000..d2f3358 > --- /dev/null > +++ b/include/misc/cxllib.h > @@ -0,0 +1,132 @@ > +/* > + * Copyright 2016 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#ifndef _MISC_CXLLIB_H > +#define _MISC_CXLLIB_H > + > +#include <linux/pci.h> > +#include <asm/reg.h> > + > +/* > + * cxl driver exports a in-kernel 'library' API which can be called by > + * other drivers to help interacting with an IBM XSL. > + */ > + > +/* > + * tells whether capi is supported on the PCIe slot where the > + * device is seated > + * > + * Input: > + * dev: device whose slot needs to be checked > + * flags: 0 for the time being > + */ > +bool cxllib_slot_is_supported(struct pci_dev *dev, unsigned long flags); > + > + > +/* > + * Returns the configuration parameters to be used by the XSL or device > + * > + * Input: > + * dev: device, used to find PHB > + * Output: > + * struct cxllib_xsl_config: > + * version > + * capi BAR address, i.e. 0x2000000000000-0x2FFFFFFFFFFFF > + * capi BAR size > + * data send control (XSL_DSNCTL) > + * dummy read address (XSL_DRA) > + */ > +#define CXL_XSL_CONFIG_VERSION1 1 > +struct cxllib_xsl_config { > + u32 version; /* format version for register encoding */ > + u32 log_bar_size;/* log size of the capi_window */ > + u64 bar_addr; /* address of the start of capi window */ > + u64 dsnctl; /* matches definition of XSL_DSNCTL */ > + u64 dra; /* real address that can be used for dummy read */ > +}; > + > +int cxllib_get_xsl_config(struct pci_dev *dev, struct cxllib_xsl_config *cfg); > + > + > +/* > + * Activate capi for the pci host bridge associated with the device. > + * Can be extended to deactivate once we know how to do it. > + * Device must be ready to accept messages from the CAPP unit and > + * respond accordingly (TLB invalidates, ...) > + * > + * PHB is switched to capi mode through calls to skiboot. > + * CAPP snooping is activated > + * > + * Input: > + * dev: device whose PHB should switch mode > + * mode: mode to switch to i.e. CAPI or PCI > + * flags: options related to the mode > + */ > +enum cxllib_mode { > + CXL_MODE_CXL, > + CXL_MODE_PCI, > +}; > + > +#define CXL_MODE_NO_DMA 0 > +#define CXL_MODE_DMA_TVT0 1 > +#define CXL_MODE_DMA_TVT1 2 > + > +int cxllib_switch_phb_mode(struct pci_dev *dev, enum cxllib_mode mode, > + unsigned long flags); > + > + > +/* > + * Set the device for capi DMA. > + * Define its dma_ops and dma offset so that allocations will be using TVT#1 > + * > + * Input: > + * dev: device to set > + * flags: options. CXL_MODE_DMA_TVT1 should be used > + */ > +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags); > + > + > + > +/* > + * Get the Process Element structure for the given thread > + * > + * Input: > + * pid: points the struct pid for the given thread (i.e. linux pid) > + * translation_mode: whether addresses should be translated > + */ > +struct cxllib_pe_attributes { > + u64 sr; > + u32 lpid; > + u32 tid; > + u32 pid; > +}; > +#define CXL_TRANSLATED_MODE 0 > +#define CXL_REAL_MODE 1 > + > +int cxllib_get_PE_attributes(struct task_struct *task, > + unsigned long translation_mode, struct cxllib_pe_attributes *attr); > + > + > +/* > + * Handle memory fault. > + * Fault in all the pages of the specified buffer for the permissions > + * provided in ‘flags’ > + * > + * Shouldn't be called from interrupt context > + * > + * Input: > + * mm: struct mm for the thread faulting the pages > + * addr: base address of the buffer to page in > + * size: size of the buffer to page in > + * flags: permission requested (DSISR_ISSTORE...) > + */ > +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags); > + > + > +#endif /* _MISC_CXLLIB_H */ >
Le 16/06/2017 à 09:13, Andrew Donnellan a écrit : >> >> +config CXL_LIB >> + bool >> + default n >> + > > How necessary is this? Are there any drivers using cxllib that we're > trying to get in during this cycle? That was a Mellanox request, so that they can enable code in their driver. Like we've done previously, I expect we should be able to drop it once their code is upstream. Fred
Le 15/06/2017 à 14:36, Frederic Barrat a écrit : > Salut Christophe, > > A few comments below, nothing major... > > Le 14/06/2017 à 15:29, Christophe Lombard a écrit : >> This patch exports a in-kernel 'library' API which can be called by >> other drivers to help interacting with an IBM XSL on a POWER9 system. >> >> The XSL (Translation Service Layer) is a stripped down version of the >> PSL (Power Service Layer) used in some cards such as the Mellanox CX5. >> Like the PSL, it implements the CAIA architecture, but has a number >> of differences, mostly in it's implementation dependent registers. >> >> The XSL also uses a special DMA cxl mode, which uses a slightly >> different init sequence for the CAPP and PHB. >> >> Changelog[v2] >> - Rebase to latest upstream. >> - Return -EFAULT in case of NULL pointer in cxllib_handle_fault(). >> - Reverse parameters when copro_handle_mm_fault() is called. >> > > > The change log shouldn't be part of the commit message, but below the > next "---" > > sure. >> +++ b/drivers/misc/cxl/cxllib.c >> @@ -0,0 +1,241 @@ >> +/* >> + * Copyright 2016 IBM Corp. > > > Year need to be updated > > >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public License >> + * as published by the Free Software Foundation; either version >> + * 2 of the License, or (at your option) any later version. >> + */ >> + >> +#include <linux/hugetlb.h> >> +#include <linux/sched/mm.h> >> +#include "cxl.h" >> +#include <misc/cxllib.h> >> +#include <asm/pnv-pci.h> > > Ordering of the #include is messy: > #include <linux/hugetlb.h> > #include <linux/sched/mm.h> > #include <asm/pnv-pci.h> > #include <misc/cxllib.h> > #include "cxl.h" > > > >> +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags) >> +{ >> + int rc; >> + >> + if (flags) >> + return -EINVAL; >> + >> + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64)); >> + return rc; >> +} >> +EXPORT_SYMBOL_GPL(cxllib_set_device_dma); > > > A comment in cxllib_set_device_dma() would help: > /* > * When switching the PHB to capi mode, the TVT#1 entry for > * the Partitionable Endpoint is set in bypass mode, like > * in PCI mode. > * Configure the device dma to use TVT#1, which is done > * by calling dma_set_mask() with a mask large enough. > */ > > > >> + >> +int cxllib_get_PE_attributes(struct task_struct *task, >> + unsigned long translation_mode, struct cxllib_pe_attributes >> *attr) >> +{ >> + struct mm_struct *mm = NULL; >> + >> + if (translation_mode != CXL_TRANSLATED_MODE && >> + translation_mode != CXL_REAL_MODE) >> + return -EINVAL; >> + >> + attr->sr = cxl_calculate_sr(false /* master */, >> + task == NULL /* kernel ctx */, >> + translation_mode == CXL_REAL_MODE, >> + true /* p9 */); >> + attr->lpid = mfspr(SPRN_LPID); >> + if (task) { >> + mm = get_task_mm(task); >> + if (mm == NULL) >> + return -EINVAL; >> + /* >> + * Caller is keeping a reference on mm_users for as long >> + * as XSL uses the memory context >> + */ >> + attr->pid = mm->context.id; >> + mmput(mm); >> + } else { >> + attr->pid = 0; >> + } >> + attr->tid = 0; > > > We'll need to remember to patch that function as welll (attr->tid) > when we add support for as_notify (even though Mellanox is not > expected to use as_notify in capi mode, just pci I believe) > > yep. > >> +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, >> u64 flags) >> +{ >> + int rc; >> + u64 dar; >> + struct vm_area_struct *vma = NULL; >> + unsigned long page_size; >> + >> + if (mm == NULL) >> + return -EFAULT; >> + >> + down_read(&mm->mmap_sem); >> + >> + for (dar = addr; dar < addr + size; dar += page_size) { >> + if (!vma || dar < vma->vm_start || dar > vma->vm_end) { >> + vma = find_vma(mm, addr); >> + if (!vma) { >> + pr_err("Can't find vma for addr %016llx\n", addr); >> + rc = -EFAULT; >> + goto out; >> + } >> + /* get the size of the pages allocated */ >> + page_size = vma_kernel_pagesize(vma); >> + } >> + >> + rc = cxl_handle_page_fault(true, mm, flags, dar); > > > Why do we pass "true" for kernel parameter? > Actually do we even need a kernel input parameter for > cxl_handle_page_fault() ? It seems that we can infer it based on mm. > If NULL, then we are in kernel space. > > you are right. Previously, the test was based on the field ctx->kernel. This explains the kernel parameter. >> diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c >> index c79e39b..9db63f3 100644 >> --- a/drivers/misc/cxl/fault.c >> +++ b/drivers/misc/cxl/fault.c >> @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct >> cxl_context *ctx, >> return IRQ_HANDLED; >> } >> >> -static void cxl_handle_page_fault(struct cxl_context *ctx, >> - struct mm_struct *mm, u64 dsisr, u64 dar) >> +int cxl_handle_page_fault(bool kernel_context, >> + struct mm_struct *mm, u64 dsisr, u64 dar) >> { >> unsigned flt = 0; >> int result; >> unsigned long access, flags, inv_flags = 0; >> >> - trace_cxl_pte_miss(ctx, dsisr, dar); >> - >> if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) { >> pr_devel("copro_handle_mm_fault failed: %#x\n", result); >> - return cxl_ack_ae(ctx); >> + return result; >> } >> >> if (!radix_enabled()) { >> @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct >> cxl_context *ctx, >> access |= _PAGE_WRITE; >> >> access |= _PAGE_PRIVILEGED; >> - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID)) >> + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID)) >> access &= ~_PAGE_PRIVILEGED; >> >> if (dsisr & DSISR_NOHPTE) >> @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct >> cxl_context *ctx, >> hash_page_mm(mm, dar, access, 0x300, inv_flags); >> local_irq_restore(flags); >> } >> - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); >> - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); >> + return 0; >> } >> >> /* >> @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct >> *fault_work) >> >> if (cxl_is_segment_miss(ctx, dsisr)) >> cxl_handle_segment_miss(ctx, mm, dar); >> - else if (cxl_is_page_fault(ctx, dsisr)) >> - cxl_handle_page_fault(ctx, mm, dsisr, dar); >> - else >> + else if (cxl_is_page_fault(ctx, dsisr)) { >> + trace_cxl_pte_miss(ctx, dsisr, dar); >> + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) { >> + cxl_ack_ae(ctx); >> + } else { >> + pr_devel("Page fault successfully handled for pe: >> %i!\n", ctx->pe); >> + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); >> + } > > > Could we have that code in a wrapper before calling > cxl_handle_page_fault()? It would keep the code cleaner and in line > with what we do for cxl_handle_segment_miss(). > We can try to to that. > > >> +++ b/include/misc/cxllib.h >> @@ -0,0 +1,132 @@ >> +/* >> + * Copyright 2016 IBM Corp. > > > Year update. > > >> +/* >> + * Get the Process Element structure for the given thread >> + * >> + * Input: >> + * pid: points the struct pid for the given thread (i.e. linux pid) >> + * translation_mode: whether addresses should be translated >> + */ >> +struct cxllib_pe_attributes { >> + u64 sr; >> + u32 lpid; >> + u32 tid; >> + u32 pid; >> +}; >> +#define CXL_TRANSLATED_MODE 0 >> +#define CXL_REAL_MODE 1 >> + >> +int cxllib_get_PE_attributes(struct task_struct *task, >> + unsigned long translation_mode, struct cxllib_pe_attributes >> *attr); > > > Description in comment no longer matches reality. > /* > * Get the Process Element structure for the given thread > * > * Input: > * task: task_struct for the context of the translation > * translation_mode: whether addresses should be translated > * Output: > * attr: attributes to fill up the Process Element structure from CAIA > */ > > > Thanks, > > Fred
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index cb3e624..3e0be78 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -877,6 +877,7 @@ enum { OPAL_PHB_CAPI_MODE_SNOOP_OFF = 2, OPAL_PHB_CAPI_MODE_SNOOP_ON = 3, OPAL_PHB_CAPI_MODE_DMA = 4, + OPAL_PHB_CAPI_MODE_DMA_TVT1 = 5, }; /* OPAL I2C request */ diff --git a/drivers/misc/cxl/Kconfig b/drivers/misc/cxl/Kconfig index b75cf83..93397cb 100644 --- a/drivers/misc/cxl/Kconfig +++ b/drivers/misc/cxl/Kconfig @@ -11,11 +11,16 @@ config CXL_AFU_DRIVER_OPS bool default n +config CXL_LIB + bool + default n + config CXL tristate "Support for IBM Coherent Accelerators (CXL)" depends on PPC_POWERNV && PCI_MSI && EEH select CXL_BASE select CXL_AFU_DRIVER_OPS + select CXL_LIB default m help Select this option to enable driver support for IBM Coherent diff --git a/drivers/misc/cxl/Makefile b/drivers/misc/cxl/Makefile index c14fd6b..0b5fd74 100644 --- a/drivers/misc/cxl/Makefile +++ b/drivers/misc/cxl/Makefile @@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_WERROR) += -Werror cxl-y += main.o file.o irq.o fault.o native.o cxl-y += context.o sysfs.o pci.o trace.o -cxl-y += vphb.o phb.o api.o +cxl-y += vphb.o phb.o api.o cxllib.o cxl-$(CONFIG_PPC_PSERIES) += flash.o guest.o of.o hcalls.o cxl-$(CONFIG_DEBUG_FS) += debugfs.o obj-$(CONFIG_CXL) += cxl.o diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h index a03f8e7..81e01f0 100644 --- a/drivers/misc/cxl/cxl.h +++ b/drivers/misc/cxl/cxl.h @@ -1010,6 +1010,8 @@ static inline void cxl_debugfs_add_afu_regs_psl8(struct cxl_afu *afu, struct den void cxl_handle_fault(struct work_struct *work); void cxl_prefault(struct cxl_context *ctx, u64 wed); +int cxl_handle_page_fault(bool kernel_context, struct mm_struct *mm, + u64 dsisr, u64 dar); struct cxl *get_cxl_adapter(int num); int cxl_alloc_sst(struct cxl_context *ctx); @@ -1061,6 +1063,11 @@ int cxl_afu_slbia(struct cxl_afu *afu); int cxl_data_cache_flush(struct cxl *adapter); int cxl_afu_disable(struct cxl_afu *afu); int cxl_psl_purge(struct cxl_afu *afu); +int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid, + u32 *phb_index, u64 *capp_unit_id); +int cxl_slot_is_switched(struct pci_dev *dev); +int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg); +u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9); void cxl_native_irq_dump_regs_psl9(struct cxl_context *ctx); void cxl_native_irq_dump_regs_psl8(struct cxl_context *ctx); diff --git a/drivers/misc/cxl/cxllib.c b/drivers/misc/cxl/cxllib.c new file mode 100644 index 0000000..63b6280 --- /dev/null +++ b/drivers/misc/cxl/cxllib.c @@ -0,0 +1,241 @@ +/* + * Copyright 2016 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include <linux/hugetlb.h> +#include <linux/sched/mm.h> +#include "cxl.h" +#include <misc/cxllib.h> +#include <asm/pnv-pci.h> + +#define CXL_INVALID_DRA ~0ull +#define CXL_DUMMY_READ_SIZE 128 +#define CXL_DUMMY_READ_ALIGN 8 +#define CXL_CAPI_WINDOW_START 0x2000000000000ull +#define CXL_CAPI_WINDOW_LOG_SIZE 48 +#define CXL_XSL_CONFIG_CURRENT_VERSION CXL_XSL_CONFIG_VERSION1 + + +bool cxllib_slot_is_supported(struct pci_dev *dev, unsigned long flags) +{ + int rc; + u32 phb_index; + u64 chip_id, capp_unit_id; + + /* Currently none supported */ + if (flags) + return false; + + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return false; + + if (!cxl_is_power9()) + return false; + + if (cxl_slot_is_switched(dev)) + return false; + + /* on p9, some pci slots are not connected to a CAPP unit */ + rc = cxl_calc_capp_routing(dev, &chip_id, &phb_index, &capp_unit_id); + if (rc) + return false; + + return true; +} +EXPORT_SYMBOL_GPL(cxllib_slot_is_supported); + +static DEFINE_MUTEX(dra_mutex); +static u64 dummy_read_addr = CXL_INVALID_DRA; + +int allocate_dummy_read_buf(void) +{ + u64 buf, vaddr; + size_t buf_size; + + /* + * Dummy read buffer is 128-byte long, aligned on a + * 256-byte boundary and we need the physical address. + */ + buf_size = CXL_DUMMY_READ_SIZE + (1ull << CXL_DUMMY_READ_ALIGN); + buf = (u64) kzalloc(buf_size, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + vaddr = (buf + (1ull << CXL_DUMMY_READ_ALIGN) - 1) & + (~0ull << CXL_DUMMY_READ_ALIGN); + + WARN((vaddr + CXL_DUMMY_READ_SIZE) > (buf + buf_size), + "Dummy read buffer alignment issue"); + dummy_read_addr = virt_to_phys((void *) vaddr); + return 0; +} + +int cxllib_get_xsl_config(struct pci_dev *dev, struct cxllib_xsl_config *cfg) +{ + int rc; + u32 phb_index; + u64 chip_id, capp_unit_id; + + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return -EINVAL; + + mutex_lock(&dra_mutex); + if (dummy_read_addr == CXL_INVALID_DRA) { + rc = allocate_dummy_read_buf(); + if (rc) { + mutex_unlock(&dra_mutex); + return rc; + } + } + mutex_unlock(&dra_mutex); + + rc = cxl_calc_capp_routing(dev, &chip_id, &phb_index, &capp_unit_id); + if (rc) + return rc; + + rc = cxl_get_xsl9_dsnctl(capp_unit_id, &cfg->dsnctl); + if (rc) + return rc; + if (cpu_has_feature(CPU_FTR_POWER9_DD1)) { + /* workaround for DD1 - nbwind = capiind */ + cfg->dsnctl |= ((u64)0x02 << (63-47)); + } + + cfg->version = CXL_XSL_CONFIG_CURRENT_VERSION; + cfg->log_bar_size = CXL_CAPI_WINDOW_LOG_SIZE; + cfg->bar_addr = CXL_CAPI_WINDOW_START; + cfg->dra = dummy_read_addr; + return 0; +} +EXPORT_SYMBOL_GPL(cxllib_get_xsl_config); + + +int cxllib_switch_phb_mode(struct pci_dev *dev, enum cxllib_mode mode, + unsigned long flags) +{ + int rc = 0; + + if (!cpu_has_feature(CPU_FTR_HVMODE)) + return -EINVAL; + + switch (mode) { + case CXL_MODE_PCI: + /* + * We currently don't support going back to PCI mode + * However, we'll turn the invalidations off, so that + * the CX5 firmware doesn't have to ack them and can do + * things like reset, etc.. with no worries. + * So always return EPERM (can't go back to PCI) or + * EBUSY if we couldn't even turn off snooping + */ + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_OFF); + if (rc) + rc = -EBUSY; + else + rc = -EPERM; + break; + case CXL_MODE_CXL: + /* DMA only supported on TVT1 for the time being */ + if (flags != CXL_MODE_DMA_TVT1) + return -EINVAL; + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_DMA_TVT1); + if (rc) + return rc; + rc = pnv_phb_to_cxl_mode(dev, OPAL_PHB_CAPI_MODE_SNOOP_ON); + break; + default: + rc = -EINVAL; + } + return rc; +} +EXPORT_SYMBOL_GPL(cxllib_switch_phb_mode); + + +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags) +{ + int rc; + + if (flags) + return -EINVAL; + + rc = dma_set_mask(&dev->dev, DMA_BIT_MASK(64)); + return rc; +} +EXPORT_SYMBOL_GPL(cxllib_set_device_dma); + + +int cxllib_get_PE_attributes(struct task_struct *task, + unsigned long translation_mode, struct cxllib_pe_attributes *attr) +{ + struct mm_struct *mm = NULL; + + if (translation_mode != CXL_TRANSLATED_MODE && + translation_mode != CXL_REAL_MODE) + return -EINVAL; + + attr->sr = cxl_calculate_sr(false /* master */, + task == NULL /* kernel ctx */, + translation_mode == CXL_REAL_MODE, + true /* p9 */); + attr->lpid = mfspr(SPRN_LPID); + if (task) { + mm = get_task_mm(task); + if (mm == NULL) + return -EINVAL; + /* + * Caller is keeping a reference on mm_users for as long + * as XSL uses the memory context + */ + attr->pid = mm->context.id; + mmput(mm); + } else { + attr->pid = 0; + } + attr->tid = 0; + return 0; +} +EXPORT_SYMBOL_GPL(cxllib_get_PE_attributes); + + +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags) +{ + int rc; + u64 dar; + struct vm_area_struct *vma = NULL; + unsigned long page_size; + + if (mm == NULL) + return -EFAULT; + + down_read(&mm->mmap_sem); + + for (dar = addr; dar < addr + size; dar += page_size) { + if (!vma || dar < vma->vm_start || dar > vma->vm_end) { + vma = find_vma(mm, addr); + if (!vma) { + pr_err("Can't find vma for addr %016llx\n", addr); + rc = -EFAULT; + goto out; + } + /* get the size of the pages allocated */ + page_size = vma_kernel_pagesize(vma); + } + + rc = cxl_handle_page_fault(true, mm, flags, dar); + if (rc) { + pr_err("_cxl_handle_page_fault failed %d", rc); + rc = -EFAULT; + goto out; + } + } + rc = 0; +out: + up_read(&mm->mmap_sem); + return rc; +} +EXPORT_SYMBOL_GPL(cxllib_handle_fault); diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c index c79e39b..9db63f3 100644 --- a/drivers/misc/cxl/fault.c +++ b/drivers/misc/cxl/fault.c @@ -132,18 +132,16 @@ static int cxl_handle_segment_miss(struct cxl_context *ctx, return IRQ_HANDLED; } -static void cxl_handle_page_fault(struct cxl_context *ctx, - struct mm_struct *mm, u64 dsisr, u64 dar) +int cxl_handle_page_fault(bool kernel_context, + struct mm_struct *mm, u64 dsisr, u64 dar) { unsigned flt = 0; int result; unsigned long access, flags, inv_flags = 0; - trace_cxl_pte_miss(ctx, dsisr, dar); - if ((result = copro_handle_mm_fault(mm, dar, dsisr, &flt))) { pr_devel("copro_handle_mm_fault failed: %#x\n", result); - return cxl_ack_ae(ctx); + return result; } if (!radix_enabled()) { @@ -156,7 +154,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, access |= _PAGE_WRITE; access |= _PAGE_PRIVILEGED; - if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID)) + if (!kernel_context || (REGION_ID(dar) == USER_REGION_ID)) access &= ~_PAGE_PRIVILEGED; if (dsisr & DSISR_NOHPTE) @@ -166,8 +164,7 @@ static void cxl_handle_page_fault(struct cxl_context *ctx, hash_page_mm(mm, dar, access, 0x300, inv_flags); local_irq_restore(flags); } - pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); - cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); + return 0; } /* @@ -261,9 +258,15 @@ void cxl_handle_fault(struct work_struct *fault_work) if (cxl_is_segment_miss(ctx, dsisr)) cxl_handle_segment_miss(ctx, mm, dar); - else if (cxl_is_page_fault(ctx, dsisr)) - cxl_handle_page_fault(ctx, mm, dsisr, dar); - else + else if (cxl_is_page_fault(ctx, dsisr)) { + trace_cxl_pte_miss(ctx, dsisr, dar); + if (cxl_handle_page_fault(ctx->kernel, mm, dsisr, dar)) { + cxl_ack_ae(ctx); + } else { + pr_devel("Page fault successfully handled for pe: %i!\n", ctx->pe); + cxl_ops->ack_irq(ctx, CXL_PSL_TFC_An_R, 0); + } + } else WARN(1, "cxl_handle_fault has nothing to handle\n"); if (mm) diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c index 2b2f889..4a82c31 100644 --- a/drivers/misc/cxl/native.c +++ b/drivers/misc/cxl/native.c @@ -586,17 +586,17 @@ static int activate_afu_directed(struct cxl_afu *afu) #define set_endian(sr) ((sr) &= ~(CXL_PSL_SR_An_LE)) #endif -static u64 calculate_sr(struct cxl_context *ctx) +u64 cxl_calculate_sr(bool master, bool kernel, bool real_mode, bool p9) { u64 sr = 0; set_endian(sr); - if (ctx->master) + if (master) sr |= CXL_PSL_SR_An_MP; if (mfspr(SPRN_LPCR) & LPCR_TC) sr |= CXL_PSL_SR_An_TC; - if (ctx->kernel) { - if (!ctx->real_mode) + if (kernel) { + if (!real_mode) sr |= CXL_PSL_SR_An_R; sr |= (mfmsr() & MSR_SF) | CXL_PSL_SR_An_HV; } else { @@ -608,7 +608,7 @@ static u64 calculate_sr(struct cxl_context *ctx) if (!test_tsk_thread_flag(current, TIF_32BIT)) sr |= CXL_PSL_SR_An_SF; } - if (cxl_is_power9()) { + if (p9) { if (radix_enabled()) sr |= CXL_PSL_SR_An_XLAT_ror; else @@ -617,6 +617,12 @@ static u64 calculate_sr(struct cxl_context *ctx) return sr; } +static u64 calculate_sr(struct cxl_context *ctx) +{ + return cxl_calculate_sr(ctx->master, ctx->kernel, ctx->real_mode, + cxl_is_power9()); +} + static void update_ivtes_directed(struct cxl_context *ctx) { bool need_update = (ctx->status == STARTED); diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 1eb9859..d18b3d9 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -375,7 +375,7 @@ static u64 get_capp_unit_id(struct device_node *np, u32 phb_index) return 0; } -static int calc_capp_routing(struct pci_dev *dev, u64 *chipid, +int cxl_calc_capp_routing(struct pci_dev *dev, u64 *chipid, u32 *phb_index, u64 *capp_unit_id) { int rc; @@ -408,17 +408,9 @@ static int calc_capp_routing(struct pci_dev *dev, u64 *chipid, return 0; } -static int init_implementation_adapter_regs_psl9(struct cxl *adapter, struct pci_dev *dev) +int cxl_get_xsl9_dsnctl(u64 capp_unit_id, u64 *reg) { - u64 xsl_dsnctl, psl_fircntl; - u64 chipid; - u32 phb_index; - u64 capp_unit_id; - int rc; - - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); - if (rc) - return rc; + u64 xsl_dsnctl; /* * CAPI Identifier bits [0:7] @@ -454,6 +446,27 @@ static int init_implementation_adapter_regs_psl9(struct cxl *adapter, struct pci xsl_dsnctl |= ((u64)0x04 << (63-55)); } + *reg = xsl_dsnctl; + return 0; +} + +static int init_implementation_adapter_regs_psl9(struct cxl *adapter, + struct pci_dev *dev) +{ + u64 xsl_dsnctl, psl_fircntl; + u64 chipid; + u32 phb_index; + u64 capp_unit_id; + int rc; + + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); + if (rc) + return rc; + + rc = cxl_get_xsl9_dsnctl(capp_unit_id, &xsl_dsnctl); + if (rc) + return rc; + cxl_p1_write(adapter, CXL_XSL9_DSNCTL, xsl_dsnctl); /* Set fir_cntl to recommended value for production env */ @@ -505,7 +518,7 @@ static int init_implementation_adapter_regs_psl8(struct cxl *adapter, struct pci u64 capp_unit_id; int rc; - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); if (rc) return rc; @@ -538,7 +551,7 @@ static int init_implementation_adapter_regs_xsl(struct cxl *adapter, struct pci_ u64 capp_unit_id; int rc; - rc = calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); + rc = cxl_calc_capp_routing(dev, &chipid, &phb_index, &capp_unit_id); if (rc) return rc; @@ -1897,7 +1910,7 @@ static void cxl_pci_remove_adapter(struct cxl *adapter) #define CXL_MAX_PCIEX_PARENT 2 -static int cxl_slot_is_switched(struct pci_dev *dev) +int cxl_slot_is_switched(struct pci_dev *dev) { struct device_node *np; int depth = 0; diff --git a/include/misc/cxllib.h b/include/misc/cxllib.h new file mode 100644 index 0000000..d2f3358 --- /dev/null +++ b/include/misc/cxllib.h @@ -0,0 +1,132 @@ +/* + * Copyright 2016 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _MISC_CXLLIB_H +#define _MISC_CXLLIB_H + +#include <linux/pci.h> +#include <asm/reg.h> + +/* + * cxl driver exports a in-kernel 'library' API which can be called by + * other drivers to help interacting with an IBM XSL. + */ + +/* + * tells whether capi is supported on the PCIe slot where the + * device is seated + * + * Input: + * dev: device whose slot needs to be checked + * flags: 0 for the time being + */ +bool cxllib_slot_is_supported(struct pci_dev *dev, unsigned long flags); + + +/* + * Returns the configuration parameters to be used by the XSL or device + * + * Input: + * dev: device, used to find PHB + * Output: + * struct cxllib_xsl_config: + * version + * capi BAR address, i.e. 0x2000000000000-0x2FFFFFFFFFFFF + * capi BAR size + * data send control (XSL_DSNCTL) + * dummy read address (XSL_DRA) + */ +#define CXL_XSL_CONFIG_VERSION1 1 +struct cxllib_xsl_config { + u32 version; /* format version for register encoding */ + u32 log_bar_size;/* log size of the capi_window */ + u64 bar_addr; /* address of the start of capi window */ + u64 dsnctl; /* matches definition of XSL_DSNCTL */ + u64 dra; /* real address that can be used for dummy read */ +}; + +int cxllib_get_xsl_config(struct pci_dev *dev, struct cxllib_xsl_config *cfg); + + +/* + * Activate capi for the pci host bridge associated with the device. + * Can be extended to deactivate once we know how to do it. + * Device must be ready to accept messages from the CAPP unit and + * respond accordingly (TLB invalidates, ...) + * + * PHB is switched to capi mode through calls to skiboot. + * CAPP snooping is activated + * + * Input: + * dev: device whose PHB should switch mode + * mode: mode to switch to i.e. CAPI or PCI + * flags: options related to the mode + */ +enum cxllib_mode { + CXL_MODE_CXL, + CXL_MODE_PCI, +}; + +#define CXL_MODE_NO_DMA 0 +#define CXL_MODE_DMA_TVT0 1 +#define CXL_MODE_DMA_TVT1 2 + +int cxllib_switch_phb_mode(struct pci_dev *dev, enum cxllib_mode mode, + unsigned long flags); + + +/* + * Set the device for capi DMA. + * Define its dma_ops and dma offset so that allocations will be using TVT#1 + * + * Input: + * dev: device to set + * flags: options. CXL_MODE_DMA_TVT1 should be used + */ +int cxllib_set_device_dma(struct pci_dev *dev, unsigned long flags); + + + +/* + * Get the Process Element structure for the given thread + * + * Input: + * pid: points the struct pid for the given thread (i.e. linux pid) + * translation_mode: whether addresses should be translated + */ +struct cxllib_pe_attributes { + u64 sr; + u32 lpid; + u32 tid; + u32 pid; +}; +#define CXL_TRANSLATED_MODE 0 +#define CXL_REAL_MODE 1 + +int cxllib_get_PE_attributes(struct task_struct *task, + unsigned long translation_mode, struct cxllib_pe_attributes *attr); + + +/* + * Handle memory fault. + * Fault in all the pages of the specified buffer for the permissions + * provided in ‘flags’ + * + * Shouldn't be called from interrupt context + * + * Input: + * mm: struct mm for the thread faulting the pages + * addr: base address of the buffer to page in + * size: size of the buffer to page in + * flags: permission requested (DSISR_ISSTORE...) + */ +int cxllib_handle_fault(struct mm_struct *mm, u64 addr, u64 size, u64 flags); + + +#endif /* _MISC_CXLLIB_H */
This patch exports a in-kernel 'library' API which can be called by other drivers to help interacting with an IBM XSL on a POWER9 system. The XSL (Translation Service Layer) is a stripped down version of the PSL (Power Service Layer) used in some cards such as the Mellanox CX5. Like the PSL, it implements the CAIA architecture, but has a number of differences, mostly in it's implementation dependent registers. The XSL also uses a special DMA cxl mode, which uses a slightly different init sequence for the CAPP and PHB. Changelog[v2] - Rebase to latest upstream. - Return -EFAULT in case of NULL pointer in cxllib_handle_fault(). - Reverse parameters when copro_handle_mm_fault() is called. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> --- This applies on top of this patch: http://patchwork.ozlabs.org/patch/775322/ --- arch/powerpc/include/asm/opal-api.h | 1 + drivers/misc/cxl/Kconfig | 5 + drivers/misc/cxl/Makefile | 2 +- drivers/misc/cxl/cxl.h | 7 ++ drivers/misc/cxl/cxllib.c | 241 ++++++++++++++++++++++++++++++++++++ drivers/misc/cxl/fault.c | 25 ++-- drivers/misc/cxl/native.c | 16 ++- drivers/misc/cxl/pci.c | 41 +++--- include/misc/cxllib.h | 132 ++++++++++++++++++++ 9 files changed, 439 insertions(+), 31 deletions(-) create mode 100644 drivers/misc/cxl/cxllib.c create mode 100644 include/misc/cxllib.h