diff mbox series

[2/3] s390x/pci: fixup global refresh

Message ID 20180130094715.11578-3-zyimin@linux.vnet.ibm.com
State New
Headers show
Series s390x/pci: fixup and optimize IOTLB code | expand

Commit Message

Yi Min Zhao Jan. 30, 2018, 9:47 a.m. UTC
The VFIO common code doesn't provide the possibility to modify a
previous mapping entry in another way than unmapping and mapping again
with new properties.

To avoid -EEXIST DMA mapping error, this we introduce a GHashTable to
store S390IOTLBEntry instances in order to cache the mapped entries.
When intercepting rpcit instruction, ignore the identical mapped
entries to avoid doing map operations multiple times and do unmap and
re-map operations for the case of updating the valid entries. To
achieve that goal, we also export the DMA walking function and
optimize the code handling errors in rpcit handler.

Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
---
 hw/s390x/s390-pci-bus.c  | 28 +++++++++-----
 hw/s390x/s390-pci-bus.h  |  3 ++
 hw/s390x/s390-pci-inst.c | 95 ++++++++++++++++++++++++++++++++++--------------
 3 files changed, 90 insertions(+), 36 deletions(-)

Comments

Cornelia Huck Jan. 31, 2018, 11:35 a.m. UTC | #1
On Tue, 30 Jan 2018 10:47:14 +0100
Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:

> The VFIO common code doesn't provide the possibility to modify a
> previous mapping entry in another way than unmapping and mapping again
> with new properties.

I'm wondering why other architectures don't need that. Is this
refreshing an unique ability on s390 (due to that instruction)?

> 
> To avoid -EEXIST DMA mapping error, this we introduce a GHashTable to

s/this//

> store S390IOTLBEntry instances in order to cache the mapped entries.
> When intercepting rpcit instruction, ignore the identical mapped
> entries to avoid doing map operations multiple times and do unmap and
> re-map operations for the case of updating the valid entries. To
> achieve that goal, we also export the DMA walking function and
> optimize the code handling errors in rpcit handler.

How often does such a thing happen in practice?

> 
> Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
> ---
>  hw/s390x/s390-pci-bus.c  | 28 +++++++++-----
>  hw/s390x/s390-pci-bus.h  |  3 ++
>  hw/s390x/s390-pci-inst.c | 95 ++++++++++++++++++++++++++++++++++--------------
>  3 files changed, 90 insertions(+), 36 deletions(-)

Can't really review the rest due to -ENODOC, sorry.
Pierre Morel Feb. 1, 2018, 12:55 p.m. UTC | #2
On 31/01/2018 12:35, Cornelia Huck wrote:
> On Tue, 30 Jan 2018 10:47:14 +0100
> Yi Min Zhao <zyimin@linux.vnet.ibm.com> wrote:
>
>> The VFIO common code doesn't provide the possibility to modify a
>> previous mapping entry in another way than unmapping and mapping again
>> with new properties.
> I'm wondering why other architectures don't need that. Is this
> refreshing an unique ability on s390 (due to that instruction)?

It is not specific to S390 but to the iommu_type1.

May be we should have different VFIO_IOMMU for mediated devices and
architectures with SR_IOV capabilities to suppress this restriction.

Other architectures using vfio_iommu_spapr_tce do not seem to have
the problem (AFAIU the code).

>
>> To avoid -EEXIST DMA mapping error, this we introduce a GHashTable to
> s/this//
>
>> store S390IOTLBEntry instances in order to cache the mapped entries.
>> When intercepting rpcit instruction, ignore the identical mapped
>> entries to avoid doing map operations multiple times and do unmap and
>> re-map operations for the case of updating the valid entries. To
>> achieve that goal, we also export the DMA walking function and
>> optimize the code handling errors in rpcit handler.
> How often does such a thing happen in practice?

It depends on the size of the guest memory and of the guest OS.
Linux allows an IOMMU IOVA size up to the minimal of high_memory, iommu 
aperture
and maximal size of RT IOMMU entry.
Linux guest use lazy TLB flush and waits for the allowed IOMMU size is 
used to
flush the TLB.
When the flush is done it is a global flush not a global invalidation, 
it follows
that some entries in the IOTLB are still valid and must be kept while other
are to be invalidated.

Since we have no possibility to know which the entries of the IOMMU TLB
have been modified, we must check all entries.

When does the -EEXIST error happen?
- Linux before lazy flush : (c60d1ae4e 2014... we did not have zPCI at 
that time)
     never
- Linux with lazy flush and new IOMMU_TYPE1:
     always as soon as the global flush occurs
     which depends of the number of allowed IOMMU IOVA size and mapped 
entries in IOTLB

When does the implemented cache be used and avoid errors?
- an entry in the cache exist for all mapped IOTLB entries
- a miss -> add the entry and issue a DMA_MAP to the host
- a hit for IOMMU_NONE -> issue a DMA_UNMAP and remove the entry
- a hit for valid unmodified entry -> do nothing (already mapped)
- hit for a valid but nmodified entry -> issue UNMAP/MAP update the entry

Regards,

Pierre
diff mbox series

Patch

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index e349d73abe..b75af26db7 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -461,8 +461,8 @@  out:
     return nto;
 }
 
-static uint16_t s390_guest_io_table_walk(uint64_t g_iota, hwaddr addr,
-                                         S390IOTLBEntry *entry)
+uint16_t s390_guest_io_table_walk(uint64_t g_iota, hwaddr addr,
+                                  S390IOTLBEntry *entry)
 {
     uint64_t to = s390_pci_get_table_origin(g_iota);
     int8_t ett = 1;
@@ -487,7 +487,8 @@  static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
                                           IOMMUAccessFlags flag)
 {
     S390PCIIOMMU *iommu = container_of(mr, S390PCIIOMMU, iommu_mr);
-    S390IOTLBEntry entry;
+    S390IOTLBEntry *entry;
+    uint64_t iova = addr & PAGE_MASK;
     uint16_t error = 0;
     IOMMUTLBEntry ret = {
         .target_as = &address_space_memory,
@@ -515,12 +516,17 @@  static IOMMUTLBEntry s390_translate_iommu(IOMMUMemoryRegion *mr, hwaddr addr,
         goto err;
     }
 
-    error = s390_guest_io_table_walk(iommu->g_iota, addr, &entry);
-
-    ret.iova = entry.iova;
-    ret.translated_addr = entry.translated_addr;
-    ret.addr_mask = entry.len - 1;
-    ret.perm = entry.perm;
+    entry = g_hash_table_lookup(iommu->iotlb, &iova);
+    if (entry) {
+        ret.iova = entry->iova;
+        ret.translated_addr = entry->translated_addr;
+        ret.addr_mask = entry->len - 1;
+        ret.perm = entry->perm;
+    } else {
+        ret.iova = iova;
+        ret.addr_mask = ~PAGE_MASK;
+        ret.perm = IOMMU_NONE;
+    }
 
     if ((flag != IOMMU_NONE) && !(flag & ret.perm)) {
         error = ERR_EVENT_TPROTE;
@@ -572,6 +578,8 @@  static S390PCIIOMMU *s390_pci_get_iommu(S390pciState *s, PCIBus *bus,
                                         PCI_FUNC(devfn));
         memory_region_init(&iommu->mr, OBJECT(iommu), mr_name, UINT64_MAX);
         address_space_init(&iommu->as, &iommu->mr, as_name);
+        iommu->iotlb = g_hash_table_new_full(g_int64_hash, g_int64_equal,
+                                             NULL, g_free);
         table->iommu[PCI_SLOT(devfn)] = iommu;
 
         g_free(mr_name);
@@ -661,6 +669,7 @@  void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
 void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
 {
     iommu->enabled = false;
+    g_hash_table_remove_all(iommu->iotlb);
     memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
     object_unparent(OBJECT(&iommu->iommu_mr));
 }
@@ -676,6 +685,7 @@  static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
     }
 
     table->iommu[PCI_SLOT(devfn)] = NULL;
+    g_hash_table_destroy(iommu->iotlb);
     address_space_destroy(&iommu->as);
     object_unparent(OBJECT(&iommu->mr));
     object_unparent(OBJECT(iommu));
diff --git a/hw/s390x/s390-pci-bus.h b/hw/s390x/s390-pci-bus.h
index ca22ef393b..395bbf0e13 100644
--- a/hw/s390x/s390-pci-bus.h
+++ b/hw/s390x/s390-pci-bus.h
@@ -274,6 +274,7 @@  typedef struct S390PCIIOMMU {
     uint64_t g_iota;
     uint64_t pba;
     uint64_t pal;
+    GHashTable *iotlb;
 } S390PCIIOMMU;
 
 typedef struct S390PCIIOMMUTable {
@@ -330,6 +331,8 @@  void s390_pci_iommu_enable(S390PCIIOMMU *iommu);
 void s390_pci_iommu_disable(S390PCIIOMMU *iommu);
 void s390_pci_generate_error_event(uint16_t pec, uint32_t fh, uint32_t fid,
                                    uint64_t faddr, uint32_t e);
+uint16_t s390_guest_io_table_walk(uint64_t g_iota, hwaddr addr,
+                                  S390IOTLBEntry *entry);
 S390PCIBusDevice *s390_pci_find_dev_by_idx(S390pciState *s, uint32_t idx);
 S390PCIBusDevice *s390_pci_find_dev_by_fh(S390pciState *s, uint32_t fh);
 S390PCIBusDevice *s390_pci_find_dev_by_fid(S390pciState *s, uint32_t fid);
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 63fa06fb97..997a9cc2e9 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -571,27 +571,65 @@  int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
     return 0;
 }
 
+static void s390_pci_update_iotlb(S390PCIIOMMU *iommu, S390IOTLBEntry *entry)
+{
+    S390IOTLBEntry *cache = g_hash_table_lookup(iommu->iotlb, &entry->iova);
+    IOMMUTLBEntry notify = {
+        .target_as = &address_space_memory,
+        .iova = entry->iova,
+        .translated_addr = entry->translated_addr,
+        .perm = entry->perm,
+        .addr_mask = ~PAGE_MASK,
+    };
+
+    if (entry->perm == IOMMU_NONE) {
+        if (!cache) {
+            return;
+        }
+        g_hash_table_remove(iommu->iotlb, &entry->iova);
+    } else {
+        if (cache) {
+            if (cache->perm == entry->perm &&
+                cache->translated_addr == entry->translated_addr) {
+                return;
+            }
+
+            notify.perm = IOMMU_NONE;
+            memory_region_notify_iommu(&iommu->iommu_mr, notify);
+            notify.perm = entry->perm;
+        }
+
+        cache = g_new(S390IOTLBEntry, 1);
+        cache->iova = entry->iova;
+        cache->translated_addr = entry->translated_addr;
+        cache->len = PAGE_SIZE;
+        cache->perm = entry->perm;
+        g_hash_table_replace(iommu->iotlb, &cache->iova, cache);
+    }
+
+    memory_region_notify_iommu(&iommu->iommu_mr, notify);
+}
+
 int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
 {
     CPUS390XState *env = &cpu->env;
     uint32_t fh;
+    uint16_t error = 0;
     S390PCIBusDevice *pbdev;
     S390PCIIOMMU *iommu;
+    S390IOTLBEntry entry;
     hwaddr start, end;
-    IOMMUTLBEntry entry;
-    IOMMUMemoryRegion *iommu_mr;
-    IOMMUMemoryRegionClass *imrc;
 
     cpu_synchronize_state(CPU(cpu));
 
     if (env->psw.mask & PSW_MASK_PSTATE) {
         s390_program_interrupt(env, PGM_PRIVILEGED, 4, ra);
-        goto out;
+        return 0;
     }
 
     if (r2 & 0x1) {
         s390_program_interrupt(env, PGM_SPECIFICATION, 4, ra);
-        goto out;
+        return 0;
     }
 
     fh = env->regs[r1] >> 32;
@@ -602,7 +640,7 @@  int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
     if (!pbdev) {
         DPRINTF("rpcit no pci dev\n");
         setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
-        goto out;
+        return 0;
     }
 
     switch (pbdev->state) {
@@ -622,34 +660,37 @@  int rpcit_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
 
     iommu = pbdev->iommu;
     if (!iommu->g_iota) {
-        pbdev->state = ZPCI_FS_ERROR;
-        setcc(cpu, ZPCI_PCI_LS_ERR);
-        s390_set_status_code(env, r1, ZPCI_PCI_ST_INSUF_RES);
-        s390_pci_generate_error_event(ERR_EVENT_INVALAS, pbdev->fh, pbdev->fid,
-                                      start, 0);
-        goto out;
+        error = ERR_EVENT_INVALAS;
+        goto err;
     }
 
     if (end < iommu->pba || start > iommu->pal) {
-        pbdev->state = ZPCI_FS_ERROR;
-        setcc(cpu, ZPCI_PCI_LS_ERR);
-        s390_set_status_code(env, r1, ZPCI_PCI_ST_INSUF_RES);
-        s390_pci_generate_error_event(ERR_EVENT_OORANGE, pbdev->fh, pbdev->fid,
-                                      start, 0);
-        goto out;
+        error = ERR_EVENT_OORANGE;
+        goto err;
     }
 
-    iommu_mr = &iommu->iommu_mr;
-    imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
-
     while (start < end) {
-        entry = imrc->translate(iommu_mr, start, IOMMU_NONE);
-        memory_region_notify_iommu(iommu_mr, entry);
-        start += entry.addr_mask + 1;
-    }
+        error = s390_guest_io_table_walk(iommu->g_iota, start, &entry);
+        if (error) {
+            break;
+        }
 
-    setcc(cpu, ZPCI_PCI_LS_OK);
-out:
+        start += entry.len;
+        while (entry.iova < start && entry.iova < end) {
+            s390_pci_update_iotlb(iommu, &entry);
+            entry.iova += PAGE_SIZE;
+            entry.translated_addr += PAGE_SIZE;
+        }
+    }
+err:
+    if (error) {
+        pbdev->state = ZPCI_FS_ERROR;
+        setcc(cpu, ZPCI_PCI_LS_ERR);
+        s390_set_status_code(env, r1, ZPCI_PCI_ST_FUNC_IN_ERR);
+        s390_pci_generate_error_event(error, pbdev->fh, pbdev->fid, start, 0);
+    } else {
+        setcc(cpu, ZPCI_PCI_LS_OK);
+    }
     return 0;
 }