diff mbox series

[v8,1/5] i386/tcg: implement x2APIC registers MSR access

Message ID 20230926160637.27995-2-minhquangbui99@gmail.com
State New
Headers show
Series [v8,1/5] i386/tcg: implement x2APIC registers MSR access | expand

Commit Message

Bui Quang Minh Sept. 26, 2023, 4:06 p.m. UTC
This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 79 ++++++++++++++++++----------
 hw/intc/trace-events                 |  4 +-
 include/hw/i386/apic.h               |  3 ++
 target/i386/cpu.h                    |  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++++++++++
 5 files changed, 86 insertions(+), 30 deletions(-)

Comments

Phil Dennis-Jordan Oct. 22, 2023, 1:59 p.m. UTC | #1
I can confirm that this works. The build issue obviously needs fixing,
but once that's fixed, this improves on the status quo.

I've tested this and patch 2/5 with x2apic CPUID bit enabled with the
hvf backend on macOS. To make it work in hvf mode, I used the attached
additional minimal patch to wire it up, but with that in place it
noticeably improves guest OS performance. (This patch doesn't yet
implement raising exceptions or checking for x2apic mode, more on that
in my comments below.)

Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>

On Tue, 26 Sept 2023 at 18:08, Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> @@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
>          val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
>          break;
>      }
> +    case MSR_APIC_START ... MSR_APIC_END: {
> +        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
> +
> +        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
> +            raise_exception_ra(env, EXCP0D_GPF, GETPC());
> +        }
> +
> +        qemu_mutex_lock_iothread();
> +        val = apic_register_read(index);
> +        qemu_mutex_unlock_iothread();

Shouldn't the x2apic mode check technically be inside the lock?
Furthermore, we need the mode check logic in each accelerator whose
MSR read and write we wire up. Finally, there's the exception raising
issue which Michael noted.

So my suggestion would be to wrap the x2apic mode check and the call
to the lower level apic_register_read into a standalone
apic_x2apic_msr_read() or similar, and the equivalent for writes.
These functions should then also return success or failure, the latter
indicating an exception should be raised. Raising the exception can
then also be implemented for each accelerator at the relevant call
site. That contains the raise_exception_ra call in the TCG specific
code, and I can do the equivalent on the hvf side.

It may also be cleaner to only implement the shared xAPIC and x2APIC
functionality in the apic_register_{read|write} functions, and put the
code that's specific to MMIO and MSR paths in the
apic_mem_{read|write} and apic_x2apic_msr_{read|write} wrappers,
respectively? Not sure.
Bui Quang Minh Oct. 24, 2023, 3:27 p.m. UTC | #2
On 10/22/23 20:59, Phil Dennis-Jordan wrote:
> I can confirm that this works. The build issue obviously needs fixing,
> but once that's fixed, this improves on the status quo.
> 
> I've tested this and patch 2/5 with x2apic CPUID bit enabled with the
> hvf backend on macOS. To make it work in hvf mode, I used the attached
> additional minimal patch to wire it up, but with that in place it
> noticeably improves guest OS performance. (This patch doesn't yet
> implement raising exceptions or checking for x2apic mode, more on that
> in my comments below.)
> 
> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
> 
> On Tue, 26 Sept 2023 at 18:08, Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> @@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
>>           val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
>>           break;
>>       }
>> +    case MSR_APIC_START ... MSR_APIC_END: {
>> +        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
>> +
>> +        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
>> +            raise_exception_ra(env, EXCP0D_GPF, GETPC());
>> +        }
>> +
>> +        qemu_mutex_lock_iothread();
>> +        val = apic_register_read(index);
>> +        qemu_mutex_unlock_iothread();
> 
> Shouldn't the x2apic mode check technically be inside the lock?
> Furthermore, we need the mode check logic in each accelerator whose
> MSR read and write we wire up. Finally, there's the exception raising
> issue which Michael noted.
> 
> So my suggestion would be to wrap the x2apic mode check and the call
> to the lower level apic_register_read into a standalone
> apic_x2apic_msr_read() or similar, and the equivalent for writes.
> These functions should then also return success or failure, the latter
> indicating an exception should be raised. Raising the exception can
> then also be implemented for each accelerator at the relevant call
> site. That contains the raise_exception_ra call in the TCG specific
> code, and I can do the equivalent on the hvf side.

Thanks a lot for your suggestion, I've taken this approach and 
implemented an apic_msr_read/write wrapper as you suggested in version 9 
(https://lore.kernel.org/qemu-devel/20231024152105.35942-1-minhquangbui99@gmail.com/)

Thank you,
Quang Minh.
diff mbox series

Patch

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index ac3d47d231..cb8c20de93 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@  void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+    APICCommonState *s = APIC(dev);
+
+    return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
     s->apicbase = (val & 0xfffff000) |
@@ -636,16 +643,11 @@  static void apic_timer(void *opaque)
     apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
     DeviceState *dev;
     APICCommonState *s;
-    uint32_t val;
-    int index;
-
-    if (size < 4) {
-        return 0;
-    }
+    uint64_t val;
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -653,7 +655,6 @@  static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
     }
     s = APIC(dev);
 
-    index = (addr >> 4) & 0xff;
     switch(index) {
     case 0x02: /* id */
         val = s->id << 24;
@@ -720,7 +721,23 @@  static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
         val = 0;
         break;
     }
-    trace_apic_mem_readl(addr, val);
+
+    trace_apic_register_read(index, val);
+    return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint32_t val;
+    int index;
+
+    if (size < 4) {
+        return 0;
+    }
+
+    index = (addr >> 4) & 0xff;
+    val = (uint32_t)apic_register_read(index);
+
     return val;
 }
 
@@ -737,27 +754,10 @@  static void apic_send_msi(MSIMessage *msi)
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-                           unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
     DeviceState *dev;
     APICCommonState *s;
-    int index = (addr >> 4) & 0xff;
-
-    if (size < 4) {
-        return;
-    }
-
-    if (addr > 0xfff || !index) {
-        /* MSI and MMIO APIC are at the same memory location,
-         * but actually not on the global bus: MSI is on PCI bus
-         * APIC is connected directly to the CPU.
-         * Mapping them on the global bus happens to work because
-         * MSI registers are reserved in APIC MMIO and vice versa. */
-        MSIMessage msi = { .address = addr, .data = val };
-        apic_send_msi(&msi);
-        return;
-    }
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -765,7 +765,7 @@  static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
     s = APIC(dev);
 
-    trace_apic_mem_writel(addr, val);
+    trace_apic_register_write(index, val);
 
     switch(index) {
     case 0x02:
@@ -843,6 +843,29 @@  static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+                           unsigned size)
+{
+    int index = (addr >> 4) & 0xff;
+
+    if (size < 4) {
+        return;
+    }
+
+    if (addr > 0xfff || !index) {
+        /* MSI and MMIO APIC are at the same memory location,
+         * but actually not on the global bus: MSI is on PCI bus
+         * APIC is connected directly to the CPU.
+         * Mapping them on the global bus happens to work because
+         * MSI registers are reserved in APIC MMIO and vice versa. */
+        MSIMessage msi = { .address = addr, .data = val };
+        apic_send_msi(&msi);
+        return;
+    }
+
+    apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 36ff71f947..1ef29d0256 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@  cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
+apic_register_write(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
 
 # ioapic.c
 ioapic_set_remote_irr(int n) "set remote irr for pin %d"
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index bdc15a7a73..2cebeb4faf 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -18,6 +18,9 @@  void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
 int apic_get_highest_priority_irr(DeviceState *dev);
+uint64_t apic_register_read(int index);
+void apic_register_write(int index, uint64_t val);
+bool is_x2apic_mode(DeviceState *d);
 
 /* pc.c */
 DeviceState *cpu_get_current_apic(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d3f377d48a..78489378ad 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -545,6 +545,9 @@  typedef enum X86Seg {
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 
+#define MSR_APIC_START                  0x00000800
+#define MSR_APIC_END                    0x000008ff
+
 #define XSTATE_FP_BIT                   0
 #define XSTATE_SSE_BIT                  1
 #define XSTATE_YMM_BIT                  2
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index e1528b7f80..1fce2076a3 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -25,6 +25,7 @@ 
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
 #include "tcg/helper-tcg.h"
+#include "hw/i386/apic.h"
 
 void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
 {
@@ -289,6 +290,19 @@  void helper_wrmsr(CPUX86State *env)
         env->msr_bndcfgs = val;
         cpu_sync_bndcs_hflags(env);
         break;
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            goto error;
+        }
+
+        qemu_mutex_lock_iothread();
+        apic_register_write(index, val);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
@@ -455,6 +469,19 @@  void helper_rdmsr(CPUX86State *env)
         val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
         break;
     }
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            raise_exception_ra(env, EXCP0D_GPF, GETPC());
+        }
+
+        qemu_mutex_lock_iothread();
+        val = apic_register_read(index);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +