Message ID | 1370882347-31129-1-git-send-email-mst@redhat.com |
---|---|
State | New |
Headers | show |
"Michael S. Tsirkin" <mst@redhat.com> writes: > This adds support for device hotplug behind > pci bridges. Bridge devices themselves need > to be pre-configured on qemu command line. > > One of the goals of this project was to > demonstrate the advantages of generating > ACPI tables in QEMU. > In my opinion, it does this successfully. Since you've gone out of your way to make this difficult to actually review... > > * This saves the need to provide a new interface for firmware > to discover bus number to pci brige mapping Do you mean fw_cfg? The BIOS normally does bus numbering. I see no reason for it not to. > * No need for yet another interface to bios to detect qemu version > to check if it's safe to activate new code, > and to ship multiple table versions: We only care about supporting one version of SeaBIOS. SeaBIOS should only care about supporting one version of QEMU. We're not asking it to support multiple versions. > we generated code so we know this is new qemu > * Use of glib's GArray makes it much easier to build > up tables in code without need for iasl and code patching Adding a dynamic array to SeaBIOS isn't rocket science. > > I have also tried to address the concerns that > Anthony has raised with this approach, please see below. > > Design: > - each bus gets assigned a number 0-255 > - generated ACPI code writes this number > to a new BSEL register, then uses existing > UP/DOWN registers to probe slot status; > to eject, write number to BSEL register, > then slot into existing EJ > > This is to address the ACPI spec requirement to > avoid config cycle access to any bus except PCI roots. > > Portability: > - Non x86 (or any Linux) platforms don't need any of this code. > They can keep happily using SHPC the way > they always did. > > Things to note: > - this is on top of acpi patchset posted a month ago, > with a small patch adding a core function to walk all > pci buses, on top. > Can also be found in my git tree > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi > > - Extensive use of glib completely removes > pointer math in new code: we use > g_array_append_vals exclusively. > There's no code patching in new code. > This is in response to comments about security > concerns with adding code to qemu. > In this sense it is more secure than existing > code in hw/acpi/core.c - that can be switched to > glib if desired. > > - New code (not imported from seabios) > uses glib to reduce dependency on iasl. > With time all code can be rewritted to > use glib and avoid iasl, if desired. > > - As was the case previously, > systems that lack working iasl are detected at configure time, > pre-generated hex files in source tree are used in this case. > This addresses the concern with iasl/big-endian > systems. > > - Compile tested only. Migration is known to be > broken - there are a bunch of TODO tags in code. > It was agreed on the last qemu conf meeting that > this code is posted without looking at migration, > with understanding that it is addressed before > being merged. Please do not mistake this > limitation for a fundamental one - I have a > very good idea how to fix it, including > cross-version migration. > > - Cross version migration: when running with -M 1.5 > and older, all ACPI table generation should be disabled. > We'll present FW_CFG interface compatible with 1.5. So TL;DR, you break a bunch of stuff and introduce a mess of code. It would be less code and wouldn't break anything to add this logic to SeaBIOS. How is this even a discussion? Regards, Anthony Liguori > > Cc: Jordan Justen <jljusten@gmail.com> > Cc: Anthony Liguori <anthony@codemonkey.ws> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: seabios@seabios.org > Cc: David Woodhouse <dwmw2@infradead.org> > Cc: Kevin O'Connor <kevin@koconnor.net> > Cc: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > --- > docs/specs/acpi_pci_hotplug.txt | 8 + > include/hw/i386/pc.h | 4 +- > hw/i386/acpi-dsdt.dsl | 36 +++- > hw/i386/ssdt-pcihp.dsl | 51 ----- > hw/acpi/piix4.c | 145 ++++++++++---- > hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- > hw/i386/Makefile.objs | 2 +- > hw/i386/ssdt-pcihp.hex.generated | 108 ---------- > 8 files changed, 510 insertions(+), 255 deletions(-) > delete mode 100644 hw/i386/ssdt-pcihp.dsl > delete mode 100644 hw/i386/ssdt-pcihp.hex.generated > > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt > index a839434..951b99a 100644 > --- a/docs/specs/acpi_pci_hotplug.txt > +++ b/docs/specs/acpi_pci_hotplug.txt > @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access): > > Used by ACPI BIOS _RMV method to indicate removability status to OS. One > bit per slot. Read-only > + > +PCI bus selector (IO port 0xae10-0xae13, 4-byte access): > +----------------------------------------------- > + > +Used by ACPI BIOS methods to select the current PCI bus. > +When written, makes all the other PCI registers (0xae00 - 0xae0f) > +to refer to the appropriate bus. > +0 selects PCI root bus (default). > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > index f8d0871..66ec787 100644 > --- a/include/hw/i386/pc.h > +++ b/include/hw/i386/pc.h > @@ -38,7 +38,9 @@ struct PcGuestInfo { > bool s3_disabled; > bool s4_disabled; > uint8_t s4_val; > - DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX); > +#define GUEST_INFO_MAX_HOTPLUG_BUS 256 > + PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS]; > + DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); > uint16_t sci_int; > uint8_t acpi_enable_cmd; > uint8_t acpi_disable_cmd; > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl > index 90efce0..dc05668 100644 > --- a/hw/i386/acpi-dsdt.dsl > +++ b/hw/i386/acpi-dsdt.dsl > @@ -133,11 +133,18 @@ DefinitionBlock ( > B0EJ, 32, > } > > + OperationRegion(SEL, SystemIO, 0xae10, 0x04) > + Field(SEL, DWordAcc, NoLock, WriteAsZeros) { > + BSEL, 32, > + } > + > /* Methods called by bulk generated PCI devices below */ > + Name(BNUM, 0x00) > > /* Methods called by hotplug devices */ > - Method(PCEJ, 1, NotSerialized) { > + Method(PCEJ, 2, NotSerialized) { > // _EJ0 method - eject callback > + Store(Arg1, BSEL) > Store(ShiftLeft(1, Arg0), B0EJ) > Return (0x0) > } > @@ -147,16 +154,25 @@ DefinitionBlock ( > > /* PCI hotplug notify method */ > Method(PCNF, 0) { > - // Local0 = iterator > + // Local0 = bridge selector > Store(Zero, Local0) > - While (LLess(Local0, 31)) { > - Increment(Local0) > - If (And(PCIU, ShiftLeft(1, Local0))) { > - PCNT(Local0, 1) > - } > - If (And(PCID, ShiftLeft(1, Local0))) { > - PCNT(Local0, 3) > - } > + While (LLess(Local0, 255)) { > + Store(Local0, BSEL) > + // Local1 = slot iterator > + Store(Zero, Local1) > + > + Store(PCIU, Local2) > + Store(PCID, Local3) > + While (LLess(Local1, 31)) { > + Increment(Local1) > + If (And(Local2, ShiftLeft(1, Local1))) { > + PCNT(Add(Local1, Multiply(Local0, 32)), 1) > + } > + If (And(Local3, ShiftLeft(1, Local1))) { > + PCNT(Add(Local1, Multiply(Local0, 32)), 3) > + } > + } > + Increment(Local0) > } > } > } > diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl > deleted file mode 100644 > index d29a5b9..0000000 > --- a/hw/i386/ssdt-pcihp.dsl > +++ /dev/null > @@ -1,51 +0,0 @@ > -/* > - * This program is free software; you can redistribute it and/or modify > - * it under the terms of the GNU General Public License as published by > - * the Free Software Foundation; either version 2 of the License, or > - * (at your option) any later version. > - > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - * GNU General Public License for more details. > - > - * You should have received a copy of the GNU General Public License along > - * with this program; if not, see <http://www.gnu.org/licenses/>. > - */ > - > -ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml > - > -DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) > -{ > - > -/**************************************************************** > - * PCI hotplug > - ****************************************************************/ > - > - /* Objects supplied by DSDT */ > - External(\_SB.PCI0, DeviceObj) > - External(\_SB.PCI0.PCEJ, MethodObj) > - > - Scope(\_SB.PCI0) { > - > - /* Bulk generated PCI hotplug devices */ > - ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start > - ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end > - ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name > - > - // Method _EJ0 can be patched by BIOS to EJ0_ > - // at runtime, if the slot is detected to not support hotplug. > - // Extract the offset of the address dword and the > - // _EJ0 name to allow this patching. > - Device(SAA) { > - ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id > - Name(_SUN, 0xAA) > - ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr > - Name(_ADR, 0xAA0000) > - ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0 > - Method(_EJ0, 1) { > - Return (PCEJ(_SUN)) > - } > - } > - } > -} > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c > index 19a26f5..10106fd 100644 > --- a/hw/acpi/piix4.c > +++ b/hw/acpi/piix4.c > @@ -29,6 +29,7 @@ > #include "exec/ioport.h" > #include "hw/nvram/fw_cfg.h" > #include "exec/address-spaces.h" > +#include "hw/pci/pci_bus.h" > > //#define DEBUG > > @@ -42,11 +43,12 @@ > #define GPE_LEN 4 > > #define PCI_HOTPLUG_ADDR 0xae00 > -#define PCI_HOTPLUG_SIZE 0x000f > +#define PCI_HOTPLUG_SIZE 0x0014 > #define PCI_UP_BASE 0xae00 > #define PCI_DOWN_BASE 0xae04 > #define PCI_EJ_BASE 0xae08 > #define PCI_RMV_BASE 0xae0c > +#define PCI_SEL_BASE 0xae10 > > #define PIIX4_PROC_BASE 0xaf00 > #define PIIX4_PROC_LEN 32 > @@ -57,6 +59,8 @@ > struct pci_status { > uint32_t up; /* deprecated, maintained for migration compatibility */ > uint32_t down; > + uint32_t hotplug_enable; > + uint32_t device_present; > }; > > typedef struct CPUStatus { > @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { > Notifier powerdown_notifier; > > /* for pci hotplug */ > - struct pci_status pci0_status; > - uint32_t pci0_hotplug_enable; > - uint32_t pci0_slot_device_present; > + struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS]; > + uint32_t hotplug_select; > > uint8_t disable_s3; > uint8_t disable_s4; > @@ -98,6 +101,17 @@ typedef struct PIIX4PMState { > PcGuestInfo *guest_info; > } PIIX4PMState; > > +static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) > +{ > + int i; > + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { > + if (guest_info->hotplug_buses[i] == bus) { > + return i; > + } > + } > + return -1; > +} > + > static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > PCIBus *bus, PIIX4PMState *s); > > @@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d, > > static void vmstate_pci_status_pre_save(void *opaque) > { > +#if 0 > + /* TODO */ > struct pci_status *pci0_status = opaque; > PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status); > > @@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) > * to a version that still does... of course these might get lost > * by an old buggy implementation, but we try. */ > pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; > +#endif > } > > static int vmstate_acpi_post_load(void *opaque, int version_id) > @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) > qemu_get_be16s(f, &temp); > } > > +#if 0 > + TODO > ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1); > +#endif > return ret; > } > > @@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { > VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), > VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), > VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), > +#if 0 > VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, > struct pci_status), > +#endif > VMSTATE_END_OF_LIST() > } > }; > > -static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) > { > BusChild *kid, *next; > - BusState *bus = qdev_get_parent_bus(&s->dev.qdev); > + BusState *bus; > int slot = ffs(slots) - 1; > bool slot_free = true; > > + if (!s->guest_info->hotplug_buses[bsel]) { > + return; > + } > + bus = &s->guest_info->hotplug_buses[bsel]->qbus; > + > /* Mark request as complete */ > - s->pci0_status.down &= ~(1U << slot); > + s->pci_status[bsel].down &= ~(1U << slot); > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > DeviceState *qdev = kid->child; > @@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) > } > } > if (slot_free) { > - s->pci0_slot_device_present &= ~(1U << slot); > + s->pci_status[bsel].device_present &= ~(1U << slot); > } > } > > -static void piix4_update_hotplug(PIIX4PMState *s) > +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) > { > - PCIDevice *dev = &s->dev; > - BusState *bus = qdev_get_parent_bus(&dev->qdev); > BusChild *kid, *next; > + BusState *bus; > > + if (!s->guest_info->hotplug_buses[bsel]) { > + return; > + } > + bus = &s->guest_info->hotplug_buses[bsel]->qbus; > /* Execute any pending removes during reset */ > - while (s->pci0_status.down) { > - acpi_piix_eject_slot(s, s->pci0_status.down); > + while (s->pci_status[bsel].down) { > + acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down); > } > > - s->pci0_hotplug_enable = ~0; > - s->pci0_slot_device_present = 0; > + s->pci_status[bsel].hotplug_enable = ~0; > + s->pci_status[bsel].device_present = 0; > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > DeviceState *qdev = kid->child; > @@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) > int slot = PCI_SLOT(pdev->devfn); > > if (pc->no_hotplug) { > - s->pci0_hotplug_enable &= ~(1U << slot); > + s->pci_status[bsel].hotplug_enable &= ~(1U << slot); > } > > - s->pci0_slot_device_present |= (1U << slot); > + s->pci_status[bsel].device_present |= (1U << slot); > } > } > > @@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) > { > PIIX4PMState *s = opaque; > uint8_t *pci_conf = s->dev.config; > + int i; > > pci_conf[0x58] = 0; > pci_conf[0x59] = 0; > @@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) > /* Mark SMM as already inited (until KVM supports SMM). */ > pci_conf[0x5B] = 0x02; > } > - piix4_update_hotplug(s); > + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { > + piix4_update_hotplug(s, i); > + } > } > > static void piix4_pm_powerdown_req(Notifier *n, void *opaque) > @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) > acpi_pm1_evt_power_down(&s->ar); > } > > -static void piix4_update_guest_info(PIIX4PMState *s) > +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > + PCIHotplugState state); > + > +static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) > { > - PCIDevice *dev = &s->dev; > - BusState *bus = qdev_get_parent_bus(&dev->qdev); > + PIIX4PMState *s = opaque; > + PcGuestInfo *guest_info = s->guest_info; > BusChild *kid, *next; > + BusState *bus = &pci_bus->qbus; > > - memset(s->guest_info->slot_hotplug_enable, 0xff, > - DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE)); > + int i = piix4_find_hotplug_bus(guest_info, NULL); > + assert(i >= 0); > + > + guest_info->hotplug_buses[i] = pci_bus; > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > DeviceState *qdev = kid->child; > @@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) > int slot = PCI_SLOT(pdev->devfn); > > if (pc->no_hotplug) { > - clear_bit(slot, s->guest_info->slot_hotplug_enable); > + clear_bit(slot, guest_info->hotplug_enable[i]); > } > } > + > + pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev); > +} > + > +static void piix4_update_guest_info(PIIX4PMState *s) > +{ > + PCIDevice *dev = &s->dev; > + > + memset(s->guest_info->hotplug_enable, 0xff, > + sizeof s->guest_info->hotplug_enable); > + > + pci_for_each_bus(dev->bus, piix4_update_bus_guest_info, > + s); > } > > static void piix4_pm_machine_ready(Notifier *n, void *opaque) > @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > { > PIIX4PMState *s = opaque; > uint32_t val = 0; > + int bsel = s->hotplug_select; > + > + if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) { > + return 0; > + } > > switch (addr) { > case PCI_UP_BASE - PCI_HOTPLUG_ADDR: > /* Manufacture an "up" value to cause a device check on any hotplug > * slot with a device. Extra device checks are harmless. */ > - val = s->pci0_slot_device_present & s->pci0_hotplug_enable; > + val = s->pci_status[bsel].device_present & > + s->pci_status[bsel].hotplug_enable; > PIIX4_DPRINTF("pci_up_read %x\n", val); > break; > case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR: > - val = s->pci0_status.down; > + val = s->pci_status[bsel].down; > PIIX4_DPRINTF("pci_down_read %x\n", val); > break; > case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: > @@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > PIIX4_DPRINTF("pci_features_read %x\n", val); > break; > case PCI_RMV_BASE - PCI_HOTPLUG_ADDR: > - val = s->pci0_hotplug_enable; > + val = s->pci_status[bsel].hotplug_enable; > break; > + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: > + val = s->hotplug_select; > default: > break; > } > @@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > static void pci_write(void *opaque, hwaddr addr, uint64_t data, > unsigned int size) > { > + PIIX4PMState *s = opaque; > switch (addr) { > case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: > - acpi_piix_eject_slot(opaque, (uint32_t)data); > + if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) { > + break; > + } > + acpi_piix_eject_slot(s, s->hotplug_select, data); > PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", > addr, data); > break; > + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: > + s->hotplug_select = data; > default: > break; > } > @@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) > g->sts[id / 8] |= (1 << (id % 8)); > } > > -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > - PCIHotplugState state); > - > static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > PCIBus *bus, PIIX4PMState *s) > { > @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > PCI_HOTPLUG_SIZE); > memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, > &s->io_pci); > - pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev); > - > qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); > memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", > PIIX4_PROC_LEN); > @@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > qemu_register_cpu_added_notifier(&s->cpu_added_notifier); > } > > -static void enable_device(PIIX4PMState *s, int slot) > +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) > { > s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; > - s->pci0_slot_device_present |= (1U << slot); > + s->pci_status[bsel].device_present |= (1U << slot); > } > > -static void disable_device(PIIX4PMState *s, int slot) > +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) > { > s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; > - s->pci0_status.down |= (1U << slot); > + s->pci_status[bsel].down |= (1U << slot); > } > > static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > int slot = PCI_SLOT(dev->devfn); > PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, > PCI_DEVICE(qdev)); > + int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus); > + if (bsel < 0) { > + return -1; > + } > > /* Don't send event when device is enabled during qemu machine creation: > * it is present on boot, no hotplug event is necessary. We do send an > * event when the device is disabled later. */ > if (state == PCI_COLDPLUG_ENABLED) { > - s->pci0_slot_device_present |= (1U << slot); > + s->pci_status[bsel].device_present |= (1U << slot); > return 0; > } > > if (state == PCI_HOTPLUG_ENABLED) { > - enable_device(s, slot); > + enable_device(s, bsel, slot); > } else { > - disable_device(s, slot); > + disable_device(s, bsel, slot); > } > > pm_update_sci(s); > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 72606a8..43e4988 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -34,6 +34,7 @@ > #include "hw/acpi/acpi.h" > #include "hw/nvram/fw_cfg.h" > #include "hw/i386/bios-linker-loader.h" > +#include "hw/pci/pci_bus.h" > > #define ACPI_BUILD_APPNAME "Bochs" > #define ACPI_BUILD_APPNAME6 "BOCHS " > @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) > #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) > #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) > > -/* 0x5B 0x82 DeviceOp PkgLength NameString */ > -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) > -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) > -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) > -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) > -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) > -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) > - > #define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ > #define ACPI_SSDT_HEADER_LENGTH 36 > > #include "hw/i386/ssdt-misc.hex" > -#include "hw/i386/ssdt-pcihp.hex" > + > +static inline GArray *build_alloc_array(void) > +{ > + return g_array_new(false, true /* clear */, 1); > +} > + > +static inline void build_free_array(GArray *array) > +{ > + g_array_free(array, true); > +} > + > +static inline void build_prepend_byte(GArray *array, uint8_t val) > +{ > + g_array_prepend_val(array, val); > +} > + > +static inline void build_append_byte(GArray *array, uint8_t val) > +{ > + g_array_append_val(array, val); > +} > + > +static inline void build_prepend_array(GArray *array, GArray *val) > +{ > + g_array_prepend_vals(array, val->data, val->len); > +} > + > +static inline void build_append_array(GArray *array, GArray *val) > +{ > + g_array_append_vals(array, val->data, val->len); > +} > + > +static void build_append_nameseg(GArray *array, const char *format, ...) > +{ > + GString *s = g_string_new(""); > + va_list args; > + > + va_start(args, format); > + g_string_vprintf(s, format, args); > + va_end(args); > + > + assert(s->len == 4); > + g_array_append_vals(array, s->str, s->len); > + g_string_free(s, true); > +} > + > +static void build_prepend_nameseg(GArray *array, const char *format, ...) > +{ > + GString *s = g_string_new(""); > + va_list args; > + > + va_start(args, format); > + g_string_vprintf(s, format, args); > + va_end(args); > + > + assert(s->len == 4); > + g_array_prepend_vals(array, s->str, s->len); > + g_string_free(s, true); > +} > + > +static void build_prepend_devfn(GArray *name, uint8_t devfn) > +{ > + build_prepend_nameseg(name, "S%0.02x_", devfn); > +} > + > +static void build_append_devfn(GArray *name, uint8_t devfn) > +{ > + GArray *array = build_alloc_array(); > + build_prepend_devfn(array, devfn); > + build_append_array(name, array); > + build_free_array(array); > +} > + > +static void build_prepend_name_string(GArray *name, PCIBus *bus) > +{ > + int segcount; > + > + for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) { > + build_prepend_devfn(name, bus->parent_dev->devfn); > + } > + g_array_prepend_vals(name, "PCI0", 4); > + g_array_prepend_vals(name, "_SB_", 4); > + /* Name segment length is a multiple of 4 */ > + assert(name->len % 4 == 0); > + segcount = name->len / 4; > + build_prepend_byte(name, segcount); > + build_prepend_byte(name, 0x2f); /* MultiNamePrefix */ > + build_prepend_byte(name, '\\'); /* RootChar */ > +} > + > +static void build_append_name_string(GArray *name, PCIBus *bus) > +{ > + GArray *append = build_alloc_array(); > + build_prepend_name_string(append, bus); > + build_append_array(name, append); > + build_free_array(append); > +} > + > +/* 5.4 Definition Block Encoding */ > +enum { > + PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */ > + PACKAGE_LENGTH_2BYTE_SHIFT = 4, > + PACKAGE_LENGTH_3BYTE_SHIFT = 12, > + PACKAGE_LENGTH_4BYTE_SHIFT = 20, > +}; > + > +static void build_prepend_package_length(GArray *package) > +{ > + unsigned bytes; > + uint8_t byte; > + unsigned package_len = package->len; > + unsigned length = package_len; > + > + if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) { > + byte = length; > + build_prepend_byte(package, byte); > + return; > + } > + > + if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) { > + byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT; > + build_prepend_byte(package, byte); > + length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1; > + } > + if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) { > + byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT; > + build_prepend_byte(package, byte); > + length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1; > + } > + if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) { > + byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT; > + build_prepend_byte(package, byte); > + length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1; > + } > + > + bytes = package->len - package_len; > + byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length; > + build_prepend_byte(package, byte); > +} > + > +static void build_package(GArray *package, uint8_t op) > +{ > + build_prepend_package_length(package); > + build_prepend_byte(package, op); > +} > + > +static void build_extop_package(GArray *package, uint8_t op) > +{ > + build_package(package, op); > + build_prepend_byte(package, 0x5B); /* ExtOpPrefix */ > +} > + > +static void build_append_notify(GArray *method, GArray *target_name, > + uint16_t value) > +{ > + GArray *notify = build_alloc_array(); > + uint8_t op = 0xA0; /* IfOp */ > + > + build_append_byte(notify, 0x93); /* LEqualOp */ > + build_append_byte(notify, 0x68); /* Arg0Op */ > + build_append_byte(notify, 0x0A); /* BytePrefix */ > + build_append_byte(notify, value); > + build_append_byte(notify, value >> 8); > + build_append_byte(notify, 0x86); /* NotifyOp */ > + build_append_array(notify, target_name); > + build_append_byte(notify, 0x69); /* Arg1Op */ > + > + /* Pack it up */ > + build_package(notify, op); > + > + build_append_array(method, notify); > + > + build_free_array(notify); > +} > + > +static void > +build_append_notify_slots(GArray *method, PCIBus *bus, > + unsigned bus_select, > + DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX)) > +{ > + int i; > + > + if (!bus) { > + return; > + } > + > + for (i = 0; i < 32; i++) { > + GArray *target; > + if (!test_bit(i, hotplug_enable)) { > + continue; > + } > + target = build_alloc_array(); > + build_prepend_devfn(target, i); > + build_prepend_name_string(target, bus); > + /* *32 matches Multiply in PCNF */ > + build_append_notify(method, target, bus_select * 32 + i); > + build_free_array(target); > + } > +} > + > +static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) > +{ > + int i; > + GArray *method = build_alloc_array(); > + > + uint8_t op = 0x14; /* MethodOp */ > + > + build_append_nameseg(method, "PCNT"); > + build_append_byte(method, 0x02); /* MethodFlags: ArgCount */ > + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { > + build_append_notify_slots(method, guest_info->hotplug_buses[i], i, > + guest_info->hotplug_enable[i]); > + } > + build_package(method, op); > + > + build_append_array(table, method); > + build_free_array(method); > +} > + > +static void build_append_device(GArray *table, unsigned devfn) > +{ > + uint8_t op, ej0_op; > + > + GArray *device = build_alloc_array(); > + GArray *ej0 = build_alloc_array(); > + > + op = 0x82; /* DeviceOp */ > + > + build_append_devfn(device, devfn); > + > + build_append_byte(device, 0x08); /* NameOp */ > + build_append_nameseg(device, "_SUN"); > + build_append_byte(device, 0x0A); /* BytePrefix */ > + build_append_byte(device, PCI_SLOT(devfn)); > + build_append_byte(device, 0x08); /* NameOp */ > + build_append_nameseg(device, "_ADR"); > + build_append_byte(device, 0x0C); /* DWordPrefix */ > + build_append_byte(device, PCI_FUNC(devfn)); > + build_append_byte(device, 0x00); > + build_append_byte(device, PCI_SLOT(devfn)); > + build_append_byte(device, 0x00); > + > + ej0_op = 0x14; /* MethodOp */ > + build_append_nameseg(ej0, "_EJ0"); > + build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */ > + build_append_byte(ej0, 0xA4); /* ReturnOp */ > + build_append_nameseg(ej0, "PCEJ"); > + build_append_nameseg(ej0, "_SUN"); > + build_append_nameseg(ej0, "BNUM"); > + build_package(ej0, ej0_op); > + build_append_array(device, ej0); > + > + build_extop_package(device, op); > + > + build_append_array(table, device); > + build_free_array(device); > + build_free_array(ej0); > +} > + > +static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS], > + PCIBus *bus, unsigned devfn) > +{ > + int i; > + for (i = 0; i < ARRAY_SIZE(buses); ++i) { > + if (buses[i] && buses[i]->parent_dev && > + buses[i]->parent_dev->bus == bus && > + buses[i]->parent_dev->devfn == devfn) { > + return true; > + } > + } > + return false; > +} > + > +static void > +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) > +{ > + uint8_t op; > + int slot; > + unsigned devfn; > + GArray *device; > + PCIBus *bus = guest_info->hotplug_buses[bnum]; > + > + if (!bus) { > + return; > + } > + > + device = build_alloc_array(); > + > + /* Root device described in DSDT */ > + if (bnum) { > + devfn = bus->parent_dev->devfn; > + > + op = 0x82; /* DeviceOp */ > + > + build_append_name_string(device, bus); > + > + build_append_byte(device, 0x08); /* NameOp */ > + build_append_nameseg(device, "_SUN"); > + build_append_byte(device, 0x0A); /* BytePrefix */ > + build_append_byte(device, PCI_SLOT(devfn)); > + build_append_byte(device, 0x08); /* NameOp */ > + build_append_nameseg(device, "BNUM"); > + build_append_byte(device, 0x0A); /* BytePrefix */ > + build_append_byte(device, bnum); > + build_append_byte(device, 0x08); /* NameOp */ > + build_append_nameseg(device, "_ADR"); > + build_append_byte(device, 0x0C); /* DWordPrefix */ > + build_append_byte(device, PCI_FUNC(devfn)); > + build_append_byte(device, 0x00); > + build_append_byte(device, PCI_SLOT(devfn)); > + build_append_byte(device, 0x00); > + } else { > + op = 0x10; /* ScopeOp */ > + build_append_nameseg(device, "PCI0"); > + } > + > + for (slot = 0; slot < PCI_SLOT_MAX; ++slot) { > + if (!test_bit(slot, guest_info->hotplug_enable[bnum])) { > + continue; > + } > + /* Bus device? We'll add it later */ > + if (build_find_device_bus(guest_info->hotplug_buses, > + bus, PCI_DEVFN(slot, 0))) { > + continue; > + } > + build_append_device(device, PCI_DEVFN(slot, 0)); > + } > + > + if (bnum) { > + build_extop_package(device, op); > + } else { > + build_package(device, op); > + } > + > + build_append_array(table, device); > + build_free_array(device); > +} > + > +static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) > +{ > + int i; > + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { > + build_append_bus(table, i, guest_info); > + } > +} > > static uint8_t* > build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, > - const char *target, int ofs) > + const char *target, int ofs, int step) > { > int i; > > @@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, > *(ssdt_ptr++) = i; > *(ssdt_ptr++) = 0x86; /* NotifyOp */ > memcpy(ssdt_ptr, target, 4); > - ssdt_ptr[ofs] = acpi_get_hex(i >> 4); > - ssdt_ptr[ofs + 1] = acpi_get_hex(i); > + ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4); > + ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); > ssdt_ptr += 4; > *(ssdt_ptr++) = 0x69; /* Arg1Op */ > } > return ssdt_ptr; > } > > -static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) > -{ > - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4); > - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot); > - ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot; > - ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot; > - > - /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot, > - * replace the method name: _EJ0 by ACPI_EJ0_. */ > - /* Sanity check */ > - assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4)); > - > - if (!eject) { > - memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4); > - } > -} > - > static void > build_ssdt(GArray *table_data, GArray *linker, > FWCfgState *fw_cfg, PcGuestInfo *guest_info) > @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, > + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ > + (1+2+5+(12*acpi_cpus)) /* NTFY */ > + (6+2+1+(1*acpi_cpus)) /* CPON */ > - + (1+3+4) /* Scope(PCI0) */ > - + ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */ > - + (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */ > - uint8_t *ssdt = acpi_data_push(table_data, length); > - uint8_t *ssdt_ptr = ssdt; > + ); > + uint8_t *ssdt; > + uint8_t *ssdt_ptr; > + GArray *devices = build_alloc_array(); > + GArray *notify = build_alloc_array(); > + > + /* TODO: rewrite this function using glib */ > + build_append_device_buses(devices, guest_info); > + length += devices->len; > + build_append_notify_buses(notify, guest_info); > + length += notify->len; > + > + ssdt = acpi_data_push(table_data, length); > + ssdt_ptr = ssdt; > > /* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ > memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml)); > @@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker, > > /* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ > /* Arg0 = Processor ID = APIC ID */ > - ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2); > + ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1); > > /* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ > *(ssdt_ptr++) = 0x08; /* NameOp */ > @@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, > *(ssdt_ptr++) = 'I'; > *(ssdt_ptr++) = '0'; > > - /* build Device object for each slot */ > - for (i = 1; i < PCI_SLOT_MAX; i++) { > - bool eject = test_bit(i, guest_info->slot_hotplug_enable); > - memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF); > - patch_pcihp(i, ssdt_ptr, eject); > - ssdt_ptr += ACPI_PCIHP_SIZEOF; > - } > - > - ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1); > + memcpy(ssdt_ptr, devices->data, devices->len); > + ssdt_ptr += devices->len; > + memcpy(ssdt_ptr, notify->data, notify->len); > + ssdt_ptr += notify->len; > > build_header(linker, table_data, > (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1); > diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs > index 2ab2572..c2e74e9 100644 > --- a/hw/i386/Makefile.objs > +++ b/hw/i386/Makefile.objs > @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o > obj-y += kvmvapic.o > obj-y += acpi-build.o > obj-y += bios-linker-loader.o > -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex > +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex > hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex > hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex > > diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated > deleted file mode 100644 > index 0d32a27..0000000 > --- a/hw/i386/ssdt-pcihp.hex.generated > +++ /dev/null > @@ -1,108 +0,0 @@ > -static unsigned char ssdt_pcihp_name[] = { > -0x33 > -}; > -static unsigned char ssdt_pcihp_adr[] = { > -0x44 > -}; > -static unsigned char ssdt_pcihp_end[] = { > -0x58 > -}; > -static unsigned char ssdp_pcihp_aml[] = { > -0x53, > -0x53, > -0x44, > -0x54, > -0x58, > -0x0, > -0x0, > -0x0, > -0x1, > -0x77, > -0x42, > -0x58, > -0x50, > -0x43, > -0x0, > -0x0, > -0x42, > -0x58, > -0x53, > -0x53, > -0x44, > -0x54, > -0x50, > -0x43, > -0x1, > -0x0, > -0x0, > -0x0, > -0x49, > -0x4e, > -0x54, > -0x4c, > -0x28, > -0x5, > -0x10, > -0x20, > -0x10, > -0x33, > -0x5c, > -0x2e, > -0x5f, > -0x53, > -0x42, > -0x5f, > -0x50, > -0x43, > -0x49, > -0x30, > -0x5b, > -0x82, > -0x26, > -0x53, > -0x41, > -0x41, > -0x5f, > -0x8, > -0x5f, > -0x53, > -0x55, > -0x4e, > -0xa, > -0xaa, > -0x8, > -0x5f, > -0x41, > -0x44, > -0x52, > -0xc, > -0x0, > -0x0, > -0xaa, > -0x0, > -0x14, > -0xf, > -0x5f, > -0x45, > -0x4a, > -0x30, > -0x1, > -0xa4, > -0x50, > -0x43, > -0x45, > -0x4a, > -0x5f, > -0x53, > -0x55, > -0x4e > -}; > -static unsigned char ssdt_pcihp_start[] = { > -0x30 > -}; > -static unsigned char ssdt_pcihp_id[] = { > -0x3d > -}; > -static unsigned char ssdt_pcihp_ej0[] = { > -0x4a > -}; > -- > MST
On 10 June 2013 19:58, Anthony Liguori <anthony@codemonkey.ws> wrote: > We only care about supporting one version of SeaBIOS. SeaBIOS should > only care about supporting one version of QEMU. We're not asking it to > support multiple versions. I'm confused, how does this work in cross-version migration? The BIOS is part of the guest, so I'd expect QEMU would have to support whatever it happens to be. (Plus how do you enforce the version-dependency?) thanks -- PMM
On 06/10/13 20:58, Anthony Liguori wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > >> This adds support for device hotplug behind >> pci bridges. Bridge devices themselves need >> to be pre-configured on qemu command line. >> >> One of the goals of this project was to >> demonstrate the advantages of generating >> ACPI tables in QEMU. >> In my opinion, it does this successfully. > > Since you've gone out of your way to make this difficult to actually > review... I haven't tried to review the patch yet, but why would you say such a thing? Michael wants to convince you. There's no way he could sneak past you in this patch. You surely won't say "it's too hard to review, I'll yield", and he's obviously aware. [snip] >> * No need for yet another interface to bios to detect qemu version >> to check if it's safe to activate new code, >> and to ship multiple table versions: > > We only care about supporting one version of SeaBIOS. SeaBIOS should > only care about supporting one version of QEMU. We're not asking it to > support multiple versions. Kevin requires cross-compat between {old, new} qemu and {old, new} SeaBIOS: http://thread.gmane.org/gmane.comp.emulators.qemu/202005/focus=202072 >> we generated code so we know this is new qemu >> * Use of glib's GArray makes it much easier to build >> up tables in code without need for iasl and code patching > > Adding a dynamic array to SeaBIOS isn't rocket science. Please take a look at my series for OVMF that adds basic support for SMBIOS tables. It took me three days of basically uninterrupted coding and two approaches to throw away until I got something submittable (with default tables for only type 0 and type 1). http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037 I don't want to invite accusations like "this is a strawman argument, noone was speaking about SMBIOS", but the selective, dynamic patching is somewhat similar between AML and SMBIOS in any given boot firmware. You got a big bunch of named offsets that must be mangled individually. Allowing the firmware to only install blobs verbatim, or maybe patch them but only in a completely programmatic manner (ie. without specific field names & offsets in the firmware, which I did try for SMBIOS in OVMF, and failed, (*)), would be a big help. ((*) because the fw_cfg format of individual SMBIOS fields doesn't distinguish formatted fields from strings in the unformatted area) Not rocket science, just churn and busywork. [snip] > So TL;DR, you break a bunch of stuff and introduce a mess of code. It > would be less code and wouldn't break anything to add this logic to > SeaBIOS. > > How is this even a discussion? Well it isn't for me any more; count me out. I'll go with the flow. Cheers, Laszlo
Peter Maydell <peter.maydell@linaro.org> writes: > On 10 June 2013 19:58, Anthony Liguori <anthony@codemonkey.ws> wrote: >> We only care about supporting one version of SeaBIOS. SeaBIOS should >> only care about supporting one version of QEMU. We're not asking it to >> support multiple versions. > > I'm confused, how does this work in cross-version migration? > The BIOS is part of the guest, so I'd expect QEMU would > have to support whatever it happens to be. (Plus how do > you enforce the version-dependency?) The BIOS is migrated but reset upon reboot. Regards, Anthony Liguori > > thanks > -- PMM
Laszlo Ersek <lersek@redhat.com> writes: > On 06/10/13 20:58, Anthony Liguori wrote: >> "Michael S. Tsirkin" <mst@redhat.com> writes: >> >>> This adds support for device hotplug behind >>> pci bridges. Bridge devices themselves need >>> to be pre-configured on qemu command line. >>> >>> One of the goals of this project was to >>> demonstrate the advantages of generating >>> ACPI tables in QEMU. >>> In my opinion, it does this successfully. >> >> Since you've gone out of your way to make this difficult to actually >> review... > > I haven't tried to review the patch yet, but why would you say such a > thing? Michael wants to convince you. There's no way he could sneak past > you in this patch. You surely won't say "it's too hard to review, I'll > yield", and he's obviously aware. The patch is very large and depends on another large series. It's not split up as a logical series of changes and more importantly doesn't solve significant problems (like live migration). It makes it very difficult to properly evaluate the approach. But even taking those sort cuts, it looks like a lot of code in QEMU to avoid what appears to be a significantly smaller amount of code in SeaBIOS. There isn't deep dependency on information only available in QEMU. AFAICT the only advantage of being in QEMU is being able to use a couple data types from glib. > [snip] > >>> * No need for yet another interface to bios to detect qemu version >>> to check if it's safe to activate new code, >>> and to ship multiple table versions: >> >> We only care about supporting one version of SeaBIOS. SeaBIOS should >> only care about supporting one version of QEMU. We're not asking it to >> support multiple versions. > > Kevin requires cross-compat between {old, new} qemu and {old, new} SeaBIOS: > > http://thread.gmane.org/gmane.comp.emulators.qemu/202005/focus=202072 > >>> we generated code so we know this is new qemu >>> * Use of glib's GArray makes it much easier to build >>> up tables in code without need for iasl and code patching >> >> Adding a dynamic array to SeaBIOS isn't rocket science. > > Please take a look at my series for OVMF that adds basic support for > SMBIOS tables. It took me three days of basically uninterrupted coding > and two approaches to throw away until I got something submittable (with > default tables for only type 0 and type 1). > > http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037 > > I don't want to invite accusations like "this is a strawman argument, > noone was speaking about SMBIOS", but the selective, dynamic patching is > somewhat similar between AML and SMBIOS in any given boot firmware. You > got a big bunch of named offsets that must be mangled individually. > > Allowing the firmware to only install blobs verbatim, or maybe patch > them but only in a completely programmatic manner (ie. without specific > field names & offsets in the firmware, which I did try for SMBIOS in > OVMF, and failed, (*)), would be a big help. > > ((*) because the fw_cfg format of individual SMBIOS fields doesn't > distinguish formatted fields from strings in the unformatted area) I don't understand this piece. Other than TianoCore being a weird environment, what made this more difficult than it is to generate the tables in QEMU? Regards, Anthony Liguori > > Not rocket science, just churn and busywork. > > [snip] > >> So TL;DR, you break a bunch of stuff and introduce a mess of code. It >> would be less code and wouldn't break anything to add this logic to >> SeaBIOS. >> >> How is this even a discussion? > > Well it isn't for me any more; count me out. I'll go with the flow. > > Cheers, > Laszlo
On 06/10/13 21:57, Anthony Liguori wrote: > Laszlo Ersek <lersek@redhat.com> writes: >> Please take a look at my series for OVMF that adds basic support for >> SMBIOS tables. It took me three days of basically uninterrupted coding >> and two approaches to throw away until I got something submittable (with >> default tables for only type 0 and type 1). >> >> http://thread.gmane.org/gmane.comp.bios.tianocore.devel/3037 >> >> I don't want to invite accusations like "this is a strawman argument, >> noone was speaking about SMBIOS", but the selective, dynamic patching is >> somewhat similar between AML and SMBIOS in any given boot firmware. You >> got a big bunch of named offsets that must be mangled individually. >> >> Allowing the firmware to only install blobs verbatim, or maybe patch >> them but only in a completely programmatic manner (ie. without specific >> field names & offsets in the firmware, which I did try for SMBIOS in >> OVMF, and failed, (*)), would be a big help. >> >> ((*) because the fw_cfg format of individual SMBIOS fields doesn't >> distinguish formatted fields from strings in the unformatted area) > > I don't understand this piece. > > Other than TianoCore being a weird environment, what made this more > difficult than it is to generate the tables in QEMU? QEMU can pass down two kinds of SMBIOS config items over fw_cfg: verbatim tables and individual fields. The verbatim table kind means "install this table as-is". I'm not sure how SeaBIOS solves it, in OVMF you call Smbios->Add(). Regarding the "field kind" config item: for each table type required by the SMBIOS spec, if qemu doesn't provide such a table in whole / in verbatim, then the firmware must install a default table for that type, *and* patch it with any referring field kind config items. Given the current format of the "field kind" config item, it is currently not possible to just iterate over the "field kind" config items, and patch whatever field in whatever table they refer to. The information contained in the "field kind" config item is insufficient to do this. The "field kind" config item designates a (table, field offset) pair. If that offset points at a string-typed SMBIOS field (basically string serial number into the unformatted area), one must update the unformatted string area after the actual table fields. (In OVMF one does this with a different call, Smbios->UpdateStrings(), and at that point the table even must have been installed already.) If the offset points at a non-string-typed (= in-formatted-area) field, the field must be overwritten in-place. (In OVMF this must happen *before* table installation, ie. before Smbios->Add().) So, the current fw_cfg representation is insufficient, and the firmware must extend (or qemu should have extended) the info contents with this distinction between formatted and unformatted. Again this differs from field to field and the origin of that classification is the SMBIOS spec. Until this point in my email there has been no clear indication whether qemu or the firmware should do this. *Some* side will have to suffer shoveling the field/offset info, and a table template for each type, from the SMBIOS spec into code. I described the above solely as background, to set the landscape. (a) The first thing that tips the scale is: of qemu there is one, and of firmware, there exist at least two, *wildly different* implementations. (Apologies, I forgot about coreboot; not sure if it's affected.) It's not a design purity argument, just what's cheaper. (b) The second thing that further unbalances the scale: suppose you decide to add further field-kind overrides to qemu. For example, currently Type 3 (chassis information, table required by SMBIOS spec) cannot be overridden field-wise. If a user is unhappy with her firmware's default Type 3 table, she's allowed to hex-edit a Type 3 blob, and pass it with "-smbios file=type3.blob" on the qemu command line. If you as developer want to provide her with more flexibility, you need to (i) patch qemu (of course), to recognize individual fields on the command line, and to populate fw_cfg with them, (ii) patch *every single firmware* supported by qemu to *look for* those (type=3,offset) field-kind config entries, and to effectuate them. If you pass down only verbatim tables, then (ii) falls away. You don't have to update SeaBIOS, nor OVMF. As a bonus, you don't have to care about versions: age-old firmware that only knows how to install verbatim tables will immediately benefit from qemu's new-fangled Type 3 flexibility. Case in point: <https://bugzilla.redhat.com/show_bug.cgi?id=788142>. (Non-public bug, sorry about that. I think I can disclose that it is about supporting the "field kind" config entry for Type 3 tables.) For ACPI tables, especially for tables containing AML, multiply the cost of (ii) by twenty. Many more offsets to patch and vastly more dependencies via fw_cfg. Laszlo
Sorry about replying to myself...
On 06/10/13 22:43, Laszlo Ersek wrote:
> It's not a design purity argument, just what's cheaper.
You probably consider SMBIOS/ACPI table generation "guest code" that has
no place in the machine emulator. You're likely willing to expose the
needed bits on a case-by-case basis, in their barest representation.
I must consider the entire stack, from the libvirt top, down to the
guest kernel or root shell, and see where it is cheapest to effect
changes. The higher I can hoist something the better.
Modulo, it depends on what element of the stack people think of as
central / indispensable -- that element should support the feature directly.
(Returning to my earlier example, it would be theoretically viable *I
guess* to equip libvirt to pre-format a Type 3 SMBIOS blob in a file,
and pass it with "-smbios file=...", but many people don't use libvirt.
It also might not apply to other emulators supported by libvirt.)
Thanks,
Laszlo
Laszlo Ersek <lersek@redhat.com> writes: > On 06/10/13 21:57, Anthony Liguori wrote: >> Laszlo Ersek <lersek@redhat.com> writes: > >> >> I don't understand this piece. >> >> Other than TianoCore being a weird environment, what made this more >> difficult than it is to generate the tables in QEMU? > > QEMU can pass down two kinds of SMBIOS config items over fw_cfg: > verbatim tables and individual fields. > > The verbatim table kind means "install this table as-is". I'm not sure > how SeaBIOS solves it, in OVMF you call Smbios->Add(). > > Regarding the "field kind" config item: for each table type required by > the SMBIOS spec, if qemu doesn't provide such a table in whole / in > verbatim, then the firmware must install a default table for that type, > *and* patch it with any referring field kind config items. > > Given the current format of the "field kind" config item, it is > currently not possible to just iterate over the "field kind" config > items, and patch whatever field in whatever table they refer to. The > information contained in the "field kind" config item is insufficient to > do this. > > The "field kind" config item designates a (table, field offset) pair. If > that offset points at a string-typed SMBIOS field (basically string > serial number into the unformatted area), one must update the > unformatted string area after the actual table fields. (In OVMF one does > this with a different call, Smbios->UpdateStrings(), and at that point > the table even must have been installed already.) > > If the offset points at a non-string-typed (= in-formatted-area) field, > the field must be overwritten in-place. (In OVMF this must happen > *before* table installation, ie. before Smbios->Add().) Okay, I understand the problem here. Thanks for the explanation. I had assumed that the smbios_field structure was the binary representation of the table. I didn't realize it was our own format used to pass information over. > So, the current fw_cfg representation is insufficient, and the firmware > must extend (or qemu should have extended) the info contents with this > distinction between formatted and unformatted. Again this differs from > field to field and the origin of that classification is the SMBIOS spec. > > Until this point in my email there has been no clear indication whether > qemu or the firmware should do this. *Some* side will have to suffer > shoveling the field/offset info, and a table template for each type, > from the SMBIOS spec into code. I described the above solely as > background, to set the landscape. > > (a) The first thing that tips the scale is: of qemu there is one, and of > firmware, there exist at least two, *wildly different* implementations. > (Apologies, I forgot about coreboot; not sure if it's affected.) It's > not a design purity argument, just what's cheaper. The only reason there are two implementations is a licensing problem that must be solved anyway. > (b) The second thing that further unbalances the scale: suppose you > decide to add further field-kind overrides to qemu. For example, > currently Type 3 (chassis information, table required by SMBIOS spec) > cannot be overridden field-wise. The reason we expose blobs in SMBIOS is that blobs can be vendor specific and there was a strong desire to allow vendor specific blobs to be passed through. There really is no choice but to expose a blob interface for that. > If a user is unhappy with her firmware's default Type 3 table, she's > allowed to hex-edit a Type 3 blob, and pass it with "-smbios > file=type3.blob" on the qemu command line. If you as developer want to > provide her with more flexibility, you need to > (i) patch qemu (of course), to recognize individual fields on the > command line, and to populate fw_cfg with them, > (ii) patch *every single firmware* supported by qemu to *look for* those > (type=3,offset) field-kind config entries, and to effectuate them. > > If you pass down only verbatim tables, then (ii) falls away. You don't > have to update SeaBIOS, nor OVMF. OVMF is proprietary. It is not "supported" by QEMU. Firmware is a fundamental part of our stack. I'm not really convinced that QEMU<->firmware is a GPL boundary because of how tightly the two are linked. Moving large chunks of firmware code into QEMU just to avoid solving licensing issues is a non-starter with me. > As a bonus, you don't have to care > about versions: age-old firmware that only knows how to install verbatim > tables will immediately benefit from qemu's new-fangled Type 3 > flexibility. This presume that (1) you only care about exposing this information to x86 guests (2) that the firmware will never have any need to patch these blobs. > Case in point: <https://bugzilla.redhat.com/show_bug.cgi?id=788142>. > (Non-public bug, sorry about that. I think I can disclose that it is > about supporting the "field kind" config entry for Type 3 tables.) This information is *exactly* the kind of information that you want to be exposed across architectures. It makes lots of sense to model a device to expose this type of information in a structure way. This discussion comes down to two things I think: (a) our existing firmware interface is pretty poor (b) we are duplicating work because of firmware licensing. We can fix (a) and there's lots of value in doing that in terms of improving support for other architectures. We've discussed (b) in other threads and I've stated my opinion on the direction we need to take. Regards, Anthony Liguori > > > For ACPI tables, especially for tables containing AML, multiply the cost > of (ii) by twenty. Many more offsets to patch and vastly more > dependencies via fw_cfg. > > Laszlo
On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote: > This discussion comes down to two things I think: (a) our existing > firmware interface is pretty poor (b) we are duplicating work because of > firmware licensing. > > We can fix (a) and there's lots of value in doing that in terms of > improving support for other architectures. We've discussed (b) in other > threads and I've stated my opinion on the direction we need to take. I'm not concerned about (b). I'm quite curious how you are planning to solve (a). I think it would help move this conversation forward if you could take a couple acpi tables in use today (eg, madt, srat) and describe the future format and location for each field in those tables. I think it would also be useful if you could do the same for a couple DSDT entries (eg, \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the guest build the AML from that info. -Kevin
Kevin O'Connor <kevin@koconnor.net> writes: > On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote: >> This discussion comes down to two things I think: (a) our existing >> firmware interface is pretty poor (b) we are duplicating work because of >> firmware licensing. >> >> We can fix (a) and there's lots of value in doing that in terms of >> improving support for other architectures. We've discussed (b) in other >> threads and I've stated my opinion on the direction we need to take. > > I'm not concerned about (b). > > I'm quite curious how you are planning to solve (a). I think it would > help move this conversation forward if you could take a couple acpi > tables in use today (eg, madt, srat) and describe the future format > and location for each field in those tables. Sure. Today we expose the device model through a graph that can be losely mapped to a filesystem. We expose two methods of interest as part of our management api: { 'type': 'ObjectPropertyInfo', 'data': { 'name': 'str', 'type': 'str' } } { 'command': 'qom-list', 'data': { 'path': 'str' }, 'returns': [ 'ObjectPropertyInfo' ] } { 'command': 'qom-get', 'data': { 'path': 'str', 'property': 'str' }, 'returns': 'visitor', 'gen': 'no' } I am proposing that we replace fw_cfg with a device that exports the QOM tree (read-only) via firmware. We have a utility called qom-fuse that can mount a fuse filesystem that exports this as a proper filesystem too. Here's a small example: [06:26 PM] anthony🐵 titi:~/git/qemu/QMP/tmp$ find . ./machine ./machine/i440fx ./machine/i440fx/ioapic ./machine/i440fx/ioapic/parent_bus ./machine/i440fx/ioapic/realized ./machine/i440fx/ioapic/type ./machine/i440fx/pci.0 ./machine/i440fx/pci.0/child[5] ./machine/i440fx/pci.0/child[4] ./machine/i440fx/pci.0/child[3] ./machine/i440fx/pci.0/child[2] ./machine/i440fx/pci.0/child[1] ./machine/i440fx/pci.0/child[0] ... For the MADT table, right now SeaBIOS needs the CPU count. That can be found by counting the number of CPU nodes. Today cpus are unattached so they appear in /machine/unattached but we should put them in a /machine/cpu container for clarity. Presumably it would be good to expose the apic id too. This is already in the QOM tree. QOM is the full representation of the device state so we should not ever need to add additional things to fw_cfg. More likely than not, when SeaBIOS needs more information, it's already there so added functionality will probably Just Work with older QEMUs. > I think it would also be > useful if you could do the same for a couple DSDT entries (eg, > \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the > guest build the AML from that info. Likewise the slot count should be available too. We hard code slots today but it is something we should model properly. We would model it using QOM of course. Internally within QEMU, this initial discussion started by saying that any ACPI generation within QEMU should happen strictly with QOM methods. This was the crux of my argument, if QOM is the only interface we need, everything is there for the firmware to do the same job that QEMU would be doing. Regards, Anthony Liguori > > -Kevin
On Mon, 2013-06-10 at 18:34 -0500, Anthony Liguori wrote: > Internally within QEMU, this initial discussion started by saying that > any ACPI generation within QEMU should happen strictly with QOM > methods. This was the crux of my argument, if QOM is the only > interface we need, everything is there for the firmware to do the same > job that QEMU would be doing. That's nice in theory, but I'm not sure how it works as things evolve and new things / new features get exposed. The firmware's *interpretation* of the QOM tree needs to be kept in sync with qemu. Hm, make that: The firmwares' *interpretation*… Let's take a specific, recent example. We fixed the PIIX4 code to actually handle the hard reset on port 0xcf9. We need to fix the ACPI tables to indicate a usable RESET_REG. How is that exposed in the QOM tree, and how does it all work? With qemu exposing ACPI tables in their close-to-final form, it's just fine. Boot with a recent qemu and it's all nice and shiny; boot with an old qemu and it doesn't reset properly. But if the firmware has to be updated to interpret the new feature advertised in the QOM tree and translate it into the ACPI table, then we haven't really got much of an improvement. Please explain how this is supposed to work in *practice*.
David Woodhouse <dwmw2@infradead.org> writes: > On Mon, 2013-06-10 at 18:34 -0500, Anthony Liguori wrote: >> Internally within QEMU, this initial discussion started by saying that >> any ACPI generation within QEMU should happen strictly with QOM >> methods. This was the crux of my argument, if QOM is the only >> interface we need, everything is there for the firmware to do the same >> job that QEMU would be doing. > > That's nice in theory, but I'm not sure how it works as things evolve > and new things / new features get exposed. The firmware's > *interpretation* of the QOM tree needs to be kept in sync with qemu. > > Hm, make that: The firmwares' *interpretation*… > > Let's take a specific, recent example. We fixed the PIIX4 code to > actually handle the hard reset on port 0xcf9. We need to fix the ACPI > tables to indicate a usable RESET_REG. Very good example. Normally, we try to be bug-for-bug compatible for guests such that -M pc-1.4 would behave exactly the same. In this case, we failed to introduce a property to control this behavior like we should have. If this changes the guest ACPI tables, it definitely should definitely be set based on a property. This is a good example of why this approach is good for QEMU, it helps us catch stuff like this. > How is that exposed in the QOM tree, and how does it all work? With qemu > exposing ACPI tables in their close-to-final form, it's just fine. Boot > with a recent qemu and it's all nice and shiny; boot with an old qemu > and it doesn't reset properly. If we did this right in QEMU, we'd have to introduce a QOM property anyway as that's how we trigger differences in machine behavior. And -M pc-1.4 ought to generate the same tables as qemu 1.4 did. Regards, Anthony Liguori > But if the firmware has to be updated to interpret the new feature > advertised in the QOM tree and translate it into the ACPI table, then we > haven't really got much of an improvement. > > Please explain how this is supposed to work in *practice*. > > -- > dwmw2
On Mon, Jun 10, 2013 at 06:34:29PM -0500, Anthony Liguori wrote: > Kevin O'Connor <kevin@koconnor.net> writes: > > > On Mon, Jun 10, 2013 at 04:45:53PM -0500, Anthony Liguori wrote: > >> This discussion comes down to two things I think: (a) our existing > >> firmware interface is pretty poor (b) we are duplicating work because of > >> firmware licensing. > >> > >> We can fix (a) and there's lots of value in doing that in terms of > >> improving support for other architectures. We've discussed (b) in other > >> threads and I've stated my opinion on the direction we need to take. > > > > I'm not concerned about (b). > > > > I'm quite curious how you are planning to solve (a). I think it would > > help move this conversation forward if you could take a couple acpi > > tables in use today (eg, madt, srat) and describe the future format > > and location for each field in those tables. > > Sure. Today we expose the device model through a graph that can be > losely mapped to a filesystem. We expose two methods of interest as > part of our management api: [...] > For the MADT table, right now SeaBIOS needs the CPU count. That can be > found by counting the number of CPU nodes. Today cpus are unattached so > they appear in /machine/unattached but we should put them in a > /machine/cpu container for clarity. SeaBIOS needs much more than the CPU count. The madt contains info on each interrupt - its global irq number, the legacy irq number, the acpi defined type of the irq, and the acpi defined flags for the irq. It also contains similar info on each cpu (including apic id), the io apic, and NMIs. > QOM is the full representation of the device state so we should not ever > need to add additional things to fw_cfg. More likely than not, when > SeaBIOS needs more information, it's already there so added > functionality will probably Just Work with older QEMUs. > > > I think it would also be > > useful if you could do the same for a couple DSDT entries (eg, > > \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the > > guest build the AML from that info. > > Likewise the slot count should be available too. We hard code slots > today but it is something we should model properly. We would model it > using QOM of course. > > Internally within QEMU, this initial discussion started by saying that > any ACPI generation within QEMU should happen strictly with QOM > methods. This was the crux of my argument, if QOM is the only interface > we need, everything is there for the firmware to do the same job that > QEMU would be doing. I think we keep talking past each other. You seem to be pointing out that the information seabios uses to patch its hardcoded tables can be passed in via QOM. Agreed, it can. I'm pointing out all the info that is hardcoded in seabios - I don't see how that can be passed via QOM. All the hardcoded data in seabios is a problem, because when it changes (and it frequently does) it requires changes to both QEMU and SeaBIOS and those changes have to be coordinated. The key reason for moving the tables into QEMU is not that it can better patch the tables - the advantage is that it can hardcode the tables that need patching. I can cite several recent examples of seabios change requests that require changing of the hardcoded tables: liguang wants to add an "embedded controller" hardware device which requires changes to seabios' dsdt, Corey Minyard wants to add IPMI hardware support (both smbios changes and DSDT changes), and Corey Bryant wants to add TPM hardware support. How do we solve the problem of seabios having a ton of hardcoded information about the qemu hardware, and seabios having to change with the hardware modifications that qemu makes? -Kevin
On Mon, Jun 10, 2013 at 2:45 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > OVMF is proprietary. I don't agree that not-OSI means proprietary. I agree that the FAT driver is not 'free software' and I agree that is a problem for usage with free software projects, such as QEMU. This is a big deal, but unfortunately, as an Intel employee, I think I've done as much as I can to address this. It couldn't hurt if more people that actually care about this spoke up on edk2-devel about the issue, or perhaps within a UEFI working group. Because, I know that they've stopped listening to me about it. > It is not "supported" by QEMU. No, but I've always thought that QEMU was happy to have alternative firmware projects. > I'm not really convinced that > QEMU<->firmware is a GPL boundary because of how tightly the two are > linked. Where has 'linked' in terms of the GPL ever been anything other than actual executable linking? > Moving large chunks of firmware code into QEMU just to avoid solving > licensing issues is a non-starter with me. Is this a licensing issue? I thought this was a "let's save time by doing it in one place" thing. I'm pretty ambivalent about this feature, really. I don't think it is even worth all this bickering. I'm certain OVMF has ACPI issues on QEMU, but I don't think it is a huge deal to resolve them independently of this feature. I was not a huge fan of supporting this type of thing for Xen in OVMF, but it does seem to work fine. -Jordan
Hi Kevin, On Mon, Jun 10, 2013 at 7:23 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > On Mon, Jun 10, 2013 at 06:34:29PM -0500, Anthony Liguori wrote: >> Kevin O'Connor <kevin@koconnor.net> writes: >> >> For the MADT table, right now SeaBIOS needs the CPU count. That can be >> found by counting the number of CPU nodes. Today cpus are unattached so >> they appear in /machine/unattached but we should put them in a >> /machine/cpu container for clarity. > > SeaBIOS needs much more than the CPU count. The madt contains info on > each interrupt - its global irq number, the legacy irq number, the > acpi defined type of the irq, and the acpi defined flags for the irq. > It also contains similar info on each cpu (including apic id), the io > apic, and NMIs. See below. >> QOM is the full representation of the device state so we should not ever >> need to add additional things to fw_cfg. More likely than not, when >> SeaBIOS needs more information, it's already there so added >> functionality will probably Just Work with older QEMUs. >> >> > I think it would also be >> > useful if you could do the same for a couple DSDT entries (eg, >> > \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the >> > guest build the AML from that info. >> >> Likewise the slot count should be available too. We hard code slots >> today but it is something we should model properly. We would model it >> using QOM of course. >> >> Internally within QEMU, this initial discussion started by saying that >> any ACPI generation within QEMU should happen strictly with QOM >> methods. This was the crux of my argument, if QOM is the only interface >> we need, everything is there for the firmware to do the same job that >> QEMU would be doing. > > I think we keep talking past each other. You seem to be pointing out > that the information seabios uses to patch its hardcoded tables can be > passed in via QOM. Agreed, it can. I'm pointing out all the info > that is hardcoded in seabios - I don't see how that can be passed via > QOM. No, we aren't talking past each other. We both have the same goal. A lot of what is hard coded in SeaBIOS is hard coded in QEMU too. IRQ routing is a good example of that. We want to make this information dynamic. Part of the trouble is that we haven't had a need to not hard code it because this is only information consumed by the BIOS. > All the hardcoded data in seabios is a problem, because when it > changes (and it frequently does) it requires changes to both QEMU and > SeaBIOS and those changes have to be coordinated. The key reason for > moving the tables into QEMU is not that it can better patch the tables > - the advantage is that it can hardcode the tables that need patching. So let's fix it. It's very easy to add read-only properties to QOM so we can hard code the bits that are in SeaBIOS now as read-only properties. For the MADT table, the per-CPU and IOAPIC info can simply be properties of those devices. We already represent irqs in QOM as integers so I suspect much of the information SeaBIOS needs is already there. > I can cite several recent examples of seabios change requests that > require changing of the hardcoded tables: liguang wants to add an > "embedded controller" hardware device which requires changes to > seabios' dsdt, Corey Minyard wants to add IPMI hardware support (both > smbios changes and DSDT changes), and Corey Bryant wants to add TPM > hardware support. > > How do we solve the problem of seabios having a ton of hardcoded > information about the qemu hardware, and seabios having to change with > the hardware modifications that qemu makes? I think that we can pretty much touch a table once pulling all of the info from QOM and then from a SeaBIOS point of view, never have to touch it again. The benefit is that solving this problem for x86 solves it for other architectures too and also lays the ground work to let a user actually control these bits too. Regards, Anthony Liguori > -Kevin
Hi Jordan, On Mon, Jun 10, 2013 at 7:28 PM, Jordan Justen <jljusten@gmail.com> wrote: > On Mon, Jun 10, 2013 at 2:45 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: >> OVMF is proprietary. > > I don't agree that not-OSI means proprietary. I call it blue if that makes you all feel better :-) But no matter what the word we use it, the point still stands, we need a UEFI firmware that has a OSI-approved license. Forgetting everything else, the distros can't ship OVMF in its current form. > I agree that the FAT driver is not 'free software' and I agree that is > a problem for usage with free software projects, such as QEMU. This is > a big deal, but unfortunately, as an Intel employee, I think I've done > as much as I can to address this. > > It couldn't hurt if more people that actually care about this spoke up > on edk2-devel about the issue, or perhaps within a UEFI working group. > Because, I know that they've stopped listening to me about it. Is this useful? I can try to make noise. I assume since folks like you who have much more credibility and familiarity in this space have given up, it's a lost cause. >> It is not "supported" by QEMU. > > No, but I've always thought that QEMU was happy to have alternative > firmware projects. We are and we're happy to accept patches to enable things even if its proprietary. But that's all assuming we're improving hardware emulation. What we're talking about here is doing something that's very un-hardware like. >> I'm not really convinced that >> QEMU<->firmware is a GPL boundary because of how tightly the two are >> linked. > > Where has 'linked' in terms of the GPL ever been anything other than > actual executable linking? I should not have even brought this up as it's not worth debating. If you're curious, http://www.gnu.org/licenses/gpl-faq.html#MereAggregation >> Moving large chunks of firmware code into QEMU just to avoid solving >> licensing issues is a non-starter with me. > > Is this a licensing issue? I thought this was a "let's save time by > doing it in one place" thing. I'm pretty ambivalent about this > feature, really. I don't think it is even worth all this bickering. > > I'm certain OVMF has ACPI issues on QEMU, but I don't think it is a > huge deal to resolve them independently of this feature. Doing it one place could just mean share code. Code can't be shared today, that's why licensing has come up. Regards, Anthony Liguori > I was not a huge fan of supporting this type of thing for Xen in OVMF, > but it does seem to work fine. > > -Jordan
On Mon, Jun 10, 2013 at 07:51:55PM -0500, Anthony Liguori wrote: > I think that we can pretty much touch a table once pulling all of the > info from QOM and then from a SeaBIOS point of view, never have to > touch it again. Thanks. I do think it would help if you could go through the details of a couple of tables and a couple of DSDT entries and walk through where each field will come from. And also describe how seabios will build the resulting AML. From my previous email: > I'm quite curious how you are planning to solve (a). I think it would > help move this conversation forward if you could take a couple acpi > tables in use today (eg, madt, srat) and describe the future format > and location for each field in those tables. I think it would also be > useful if you could do the same for a couple DSDT entries (eg, > \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the > guest build the AML from that info. I do understand your desire to pass this stuff as parameters, but I really don't see it as feasible. I'm hoping that if you can write up some examples with specifics you'll either enlighten me or you'll see the difficulties I'm trying to describe. -Kevin
Hi Kevin, On Mon, Jun 10, 2013 at 8:19 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > On Mon, Jun 10, 2013 at 07:51:55PM -0500, Anthony Liguori wrote: >> I think that we can pretty much touch a table once pulling all of the >> info from QOM and then from a SeaBIOS point of view, never have to >> touch it again. > > Thanks. I do think it would help if you could go through the details > of a couple of tables and a couple of DSDT entries and walk through > where each field will come from. And also describe how seabios will > build the resulting AML. From my previous email: > >> I'm quite curious how you are planning to solve (a). I think it would >> help move this conversation forward if you could take a couple acpi >> tables in use today (eg, madt, srat) and describe the future format >> and location for each field in those tables. I think it would also be >> useful if you could do the same for a couple DSDT entries (eg, >> \_SB.PCI0, \_SB.PCI0.ISA) and also describe how you plan to have the >> guest build the AML from that info. > > I do understand your desire to pass this stuff as parameters, but I > really don't see it as feasible. I'm hoping that if you can write up > some examples with specifics you'll either enlighten me or you'll see > the difficulties I'm trying to describe. Is your basic argument that generating these tables is hard, QEMU can and should just hard code them and change them whenever it feels like without involving SeaBIOS? If that's what it boils down to, I'll look more closely at it. Regards, Anthony Liguori > -Kevin
On Mon, Jun 10, 2013 at 6:03 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > On Mon, Jun 10, 2013 at 7:28 PM, Jordan Justen <jljusten@gmail.com> wrote: >> It couldn't hurt if more people that actually care about this spoke up >> on edk2-devel about the issue, or perhaps within a UEFI working group. >> Because, I know that they've stopped listening to me about it. > > Is this useful? I can try to make noise. Possibly. But, I don't want to give you false hope. :) > I assume since folks like > you who have much more credibility and familiarity in this space have > given up, it's a lost cause. I think customers/partners reporting an issue directly can sometime be more persuasive than a lone engineer saying this somewhat obscure licensing thing will be an issue for free software projects. >> No, but I've always thought that QEMU was happy to have alternative >> firmware projects. > > We are and we're happy to accept patches to enable things even if its > proprietary. But that's all assuming we're improving hardware > emulation. > > What we're talking about here is doing something that's very un-hardware like. Fair enough. I know that it's been done by other VMs, but you certainly have a point in that real systems don't do this. -Jordan
On Mon, Jun 10, 2013 at 08:25:15PM -0500, Anthony Liguori wrote: > On Mon, Jun 10, 2013 at 8:19 PM, Kevin O'Connor <kevin@koconnor.net> wrote: > > I do understand your desire to pass this stuff as parameters, but I > > really don't see it as feasible. I'm hoping that if you can write up > > some examples with specifics you'll either enlighten me or you'll see > > the difficulties I'm trying to describe. > > Is your basic argument that generating these tables is hard, QEMU can > and should just hard code them and change them whenever it feels like > without involving SeaBIOS? Yes - that's my basic argument. Generating the content in a way that is future proof (or future flexible) is very hard. The DSDT is particularly painful. I'm also afraid of introducing a lot of complexity when simply hardcoding the content (in the right place) is the better option. > If that's what it boils down to, I'll look more closely at it. Thanks, -Kevin
On Mon, Jun 10, 2013 at 01:58:41PM -0500, Anthony Liguori wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > This adds support for device hotplug behind > > pci bridges. Bridge devices themselves need > > to be pre-configured on qemu command line. > > > > One of the goals of this project was to > > demonstrate the advantages of generating > > ACPI tables in QEMU. > > In my opinion, it does this successfully. > > Since you've gone out of your way to make this difficult to actually > review... What's wrong? > > > > * This saves the need to provide a new interface for firmware > > to discover bus number to pci brige mapping > > Do you mean fw_cfg? The BIOS normally does bus numbering. I see no > reason for it not to. I did my best to explain this. ACPI spec explicitly states config access to devices not on bus 0 is forbidden. This is to allow OS to renumber the bus at any time. So you have pci bus number but can't use it. > > * No need for yet another interface to bios to detect qemu version > > to check if it's safe to activate new code, > > and to ship multiple table versions: > > We only care about supporting one version of SeaBIOS. SeaBIOS should > only care about supporting one version of QEMU. We're not asking it to > support multiple versions. We do e.g. when we run with -M1.2 and migrate to old QEMU early in boot process. > > we generated code so we know this is new qemu > > * Use of glib's GArray makes it much easier to build > > up tables in code without need for iasl and code patching > > Adding a dynamic array to SeaBIOS isn't rocket science. Glib reimplementation in bios will be more code than using existing glib. > > > > I have also tried to address the concerns that > > Anthony has raised with this approach, please see below. > > > > Design: > > - each bus gets assigned a number 0-255 > > - generated ACPI code writes this number > > to a new BSEL register, then uses existing > > UP/DOWN registers to probe slot status; > > to eject, write number to BSEL register, > > then slot into existing EJ > > > > This is to address the ACPI spec requirement to > > avoid config cycle access to any bus except PCI roots. > > > > Portability: > > - Non x86 (or any Linux) platforms don't need any of this code. > > They can keep happily using SHPC the way > > they always did. > > > > Things to note: > > - this is on top of acpi patchset posted a month ago, > > with a small patch adding a core function to walk all > > pci buses, on top. > > Can also be found in my git tree > > git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi > > > > - Extensive use of glib completely removes > > pointer math in new code: we use > > g_array_append_vals exclusively. > > There's no code patching in new code. > > This is in response to comments about security > > concerns with adding code to qemu. > > In this sense it is more secure than existing > > code in hw/acpi/core.c - that can be switched to > > glib if desired. > > > > - New code (not imported from seabios) > > uses glib to reduce dependency on iasl. > > With time all code can be rewritted to > > use glib and avoid iasl, if desired. > > > > - As was the case previously, > > systems that lack working iasl are detected at configure time, > > pre-generated hex files in source tree are used in this case. > > This addresses the concern with iasl/big-endian > > systems. > > > > - Compile tested only. Migration is known to be > > broken - there are a bunch of TODO tags in code. > > It was agreed on the last qemu conf meeting that > > this code is posted without looking at migration, > > with understanding that it is addressed before > > being merged. Please do not mistake this > > limitation for a fundamental one - I have a > > very good idea how to fix it, including > > cross-version migration. > > > > - Cross version migration: when running with -M 1.5 > > and older, all ACPI table generation should be disabled. > > We'll present FW_CFG interface compatible with 1.5. > > So TL;DR, you break a bunch of stuff Confused. When you asked me to write this up during last conf call, you explicitly said not to bother with migration? Almost any change affecting guest needs migration related work. > and introduce a mess of code. What did you expect? New functionality with less code? > It > would be less code and wouldn't break anything to add this logic to > SeaBIOS. It will be much more code - we would need to add a bunch of interfaces to qemu then code using that to seabios, worry about running this seabios on old qemu, etc etc. > How is this even a discussion? Interface change is minimized. Interface to bios is not changed at all. Interface to guests - by a single register. Here we reuse existing fw cfg interface. That's the point. > Regards, > > Anthony Liguori > > > > > Cc: Jordan Justen <jljusten@gmail.com> > > Cc: Anthony Liguori <anthony@codemonkey.ws> > > Cc: Laszlo Ersek <lersek@redhat.com> > > Cc: Gerd Hoffmann <kraxel@redhat.com> > > Cc: seabios@seabios.org > > Cc: David Woodhouse <dwmw2@infradead.org> > > Cc: Kevin O'Connor <kevin@koconnor.net> > > Cc: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > --- > > docs/specs/acpi_pci_hotplug.txt | 8 + > > include/hw/i386/pc.h | 4 +- > > hw/i386/acpi-dsdt.dsl | 36 +++- > > hw/i386/ssdt-pcihp.dsl | 51 ----- > > hw/acpi/piix4.c | 145 ++++++++++---- > > hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- > > hw/i386/Makefile.objs | 2 +- > > hw/i386/ssdt-pcihp.hex.generated | 108 ---------- > > 8 files changed, 510 insertions(+), 255 deletions(-) > > delete mode 100644 hw/i386/ssdt-pcihp.dsl > > delete mode 100644 hw/i386/ssdt-pcihp.hex.generated > > > > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt > > index a839434..951b99a 100644 > > --- a/docs/specs/acpi_pci_hotplug.txt > > +++ b/docs/specs/acpi_pci_hotplug.txt > > @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access): > > > > Used by ACPI BIOS _RMV method to indicate removability status to OS. One > > bit per slot. Read-only > > + > > +PCI bus selector (IO port 0xae10-0xae13, 4-byte access): > > +----------------------------------------------- > > + > > +Used by ACPI BIOS methods to select the current PCI bus. > > +When written, makes all the other PCI registers (0xae00 - 0xae0f) > > +to refer to the appropriate bus. > > +0 selects PCI root bus (default). > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h > > index f8d0871..66ec787 100644 > > --- a/include/hw/i386/pc.h > > +++ b/include/hw/i386/pc.h > > @@ -38,7 +38,9 @@ struct PcGuestInfo { > > bool s3_disabled; > > bool s4_disabled; > > uint8_t s4_val; > > - DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX); > > +#define GUEST_INFO_MAX_HOTPLUG_BUS 256 > > + PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS]; > > + DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); > > uint16_t sci_int; > > uint8_t acpi_enable_cmd; > > uint8_t acpi_disable_cmd; > > diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl > > index 90efce0..dc05668 100644 > > --- a/hw/i386/acpi-dsdt.dsl > > +++ b/hw/i386/acpi-dsdt.dsl > > @@ -133,11 +133,18 @@ DefinitionBlock ( > > B0EJ, 32, > > } > > > > + OperationRegion(SEL, SystemIO, 0xae10, 0x04) > > + Field(SEL, DWordAcc, NoLock, WriteAsZeros) { > > + BSEL, 32, > > + } > > + > > /* Methods called by bulk generated PCI devices below */ > > + Name(BNUM, 0x00) > > > > /* Methods called by hotplug devices */ > > - Method(PCEJ, 1, NotSerialized) { > > + Method(PCEJ, 2, NotSerialized) { > > // _EJ0 method - eject callback > > + Store(Arg1, BSEL) > > Store(ShiftLeft(1, Arg0), B0EJ) > > Return (0x0) > > } > > @@ -147,16 +154,25 @@ DefinitionBlock ( > > > > /* PCI hotplug notify method */ > > Method(PCNF, 0) { > > - // Local0 = iterator > > + // Local0 = bridge selector > > Store(Zero, Local0) > > - While (LLess(Local0, 31)) { > > - Increment(Local0) > > - If (And(PCIU, ShiftLeft(1, Local0))) { > > - PCNT(Local0, 1) > > - } > > - If (And(PCID, ShiftLeft(1, Local0))) { > > - PCNT(Local0, 3) > > - } > > + While (LLess(Local0, 255)) { > > + Store(Local0, BSEL) > > + // Local1 = slot iterator > > + Store(Zero, Local1) > > + > > + Store(PCIU, Local2) > > + Store(PCID, Local3) > > + While (LLess(Local1, 31)) { > > + Increment(Local1) > > + If (And(Local2, ShiftLeft(1, Local1))) { > > + PCNT(Add(Local1, Multiply(Local0, 32)), 1) > > + } > > + If (And(Local3, ShiftLeft(1, Local1))) { > > + PCNT(Add(Local1, Multiply(Local0, 32)), 3) > > + } > > + } > > + Increment(Local0) > > } > > } > > } > > diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl > > deleted file mode 100644 > > index d29a5b9..0000000 > > --- a/hw/i386/ssdt-pcihp.dsl > > +++ /dev/null > > @@ -1,51 +0,0 @@ > > -/* > > - * This program is free software; you can redistribute it and/or modify > > - * it under the terms of the GNU General Public License as published by > > - * the Free Software Foundation; either version 2 of the License, or > > - * (at your option) any later version. > > - > > - * This program is distributed in the hope that it will be useful, > > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > - * GNU General Public License for more details. > > - > > - * You should have received a copy of the GNU General Public License along > > - * with this program; if not, see <http://www.gnu.org/licenses/>. > > - */ > > - > > -ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml > > - > > -DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) > > -{ > > - > > -/**************************************************************** > > - * PCI hotplug > > - ****************************************************************/ > > - > > - /* Objects supplied by DSDT */ > > - External(\_SB.PCI0, DeviceObj) > > - External(\_SB.PCI0.PCEJ, MethodObj) > > - > > - Scope(\_SB.PCI0) { > > - > > - /* Bulk generated PCI hotplug devices */ > > - ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start > > - ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end > > - ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name > > - > > - // Method _EJ0 can be patched by BIOS to EJ0_ > > - // at runtime, if the slot is detected to not support hotplug. > > - // Extract the offset of the address dword and the > > - // _EJ0 name to allow this patching. > > - Device(SAA) { > > - ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id > > - Name(_SUN, 0xAA) > > - ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr > > - Name(_ADR, 0xAA0000) > > - ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0 > > - Method(_EJ0, 1) { > > - Return (PCEJ(_SUN)) > > - } > > - } > > - } > > -} > > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c > > index 19a26f5..10106fd 100644 > > --- a/hw/acpi/piix4.c > > +++ b/hw/acpi/piix4.c > > @@ -29,6 +29,7 @@ > > #include "exec/ioport.h" > > #include "hw/nvram/fw_cfg.h" > > #include "exec/address-spaces.h" > > +#include "hw/pci/pci_bus.h" > > > > //#define DEBUG > > > > @@ -42,11 +43,12 @@ > > #define GPE_LEN 4 > > > > #define PCI_HOTPLUG_ADDR 0xae00 > > -#define PCI_HOTPLUG_SIZE 0x000f > > +#define PCI_HOTPLUG_SIZE 0x0014 > > #define PCI_UP_BASE 0xae00 > > #define PCI_DOWN_BASE 0xae04 > > #define PCI_EJ_BASE 0xae08 > > #define PCI_RMV_BASE 0xae0c > > +#define PCI_SEL_BASE 0xae10 > > > > #define PIIX4_PROC_BASE 0xaf00 > > #define PIIX4_PROC_LEN 32 > > @@ -57,6 +59,8 @@ > > struct pci_status { > > uint32_t up; /* deprecated, maintained for migration compatibility */ > > uint32_t down; > > + uint32_t hotplug_enable; > > + uint32_t device_present; > > }; > > > > typedef struct CPUStatus { > > @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { > > Notifier powerdown_notifier; > > > > /* for pci hotplug */ > > - struct pci_status pci0_status; > > - uint32_t pci0_hotplug_enable; > > - uint32_t pci0_slot_device_present; > > + struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS]; > > + uint32_t hotplug_select; > > > > uint8_t disable_s3; > > uint8_t disable_s4; > > @@ -98,6 +101,17 @@ typedef struct PIIX4PMState { > > PcGuestInfo *guest_info; > > } PIIX4PMState; > > > > +static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) > > +{ > > + int i; > > + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { > > + if (guest_info->hotplug_buses[i] == bus) { > > + return i; > > + } > > + } > > + return -1; > > +} > > + > > static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > > PCIBus *bus, PIIX4PMState *s); > > > > @@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d, > > > > static void vmstate_pci_status_pre_save(void *opaque) > > { > > +#if 0 > > + /* TODO */ > > struct pci_status *pci0_status = opaque; > > PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status); > > > > @@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) > > * to a version that still does... of course these might get lost > > * by an old buggy implementation, but we try. */ > > pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; > > +#endif > > } > > > > static int vmstate_acpi_post_load(void *opaque, int version_id) > > @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) > > qemu_get_be16s(f, &temp); > > } > > > > +#if 0 > > + TODO > > ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1); > > +#endif > > return ret; > > } > > > > @@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { > > VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), > > VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), > > VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), > > +#if 0 > > VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, > > struct pci_status), > > +#endif > > VMSTATE_END_OF_LIST() > > } > > }; > > > > -static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) > > { > > BusChild *kid, *next; > > - BusState *bus = qdev_get_parent_bus(&s->dev.qdev); > > + BusState *bus; > > int slot = ffs(slots) - 1; > > bool slot_free = true; > > > > + if (!s->guest_info->hotplug_buses[bsel]) { > > + return; > > + } > > + bus = &s->guest_info->hotplug_buses[bsel]->qbus; > > + > > /* Mark request as complete */ > > - s->pci0_status.down &= ~(1U << slot); > > + s->pci_status[bsel].down &= ~(1U << slot); > > > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > > DeviceState *qdev = kid->child; > > @@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) > > } > > } > > if (slot_free) { > > - s->pci0_slot_device_present &= ~(1U << slot); > > + s->pci_status[bsel].device_present &= ~(1U << slot); > > } > > } > > > > -static void piix4_update_hotplug(PIIX4PMState *s) > > +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) > > { > > - PCIDevice *dev = &s->dev; > > - BusState *bus = qdev_get_parent_bus(&dev->qdev); > > BusChild *kid, *next; > > + BusState *bus; > > > > + if (!s->guest_info->hotplug_buses[bsel]) { > > + return; > > + } > > + bus = &s->guest_info->hotplug_buses[bsel]->qbus; > > /* Execute any pending removes during reset */ > > - while (s->pci0_status.down) { > > - acpi_piix_eject_slot(s, s->pci0_status.down); > > + while (s->pci_status[bsel].down) { > > + acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down); > > } > > > > - s->pci0_hotplug_enable = ~0; > > - s->pci0_slot_device_present = 0; > > + s->pci_status[bsel].hotplug_enable = ~0; > > + s->pci_status[bsel].device_present = 0; > > > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > > DeviceState *qdev = kid->child; > > @@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) > > int slot = PCI_SLOT(pdev->devfn); > > > > if (pc->no_hotplug) { > > - s->pci0_hotplug_enable &= ~(1U << slot); > > + s->pci_status[bsel].hotplug_enable &= ~(1U << slot); > > } > > > > - s->pci0_slot_device_present |= (1U << slot); > > + s->pci_status[bsel].device_present |= (1U << slot); > > } > > } > > > > @@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) > > { > > PIIX4PMState *s = opaque; > > uint8_t *pci_conf = s->dev.config; > > + int i; > > > > pci_conf[0x58] = 0; > > pci_conf[0x59] = 0; > > @@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) > > /* Mark SMM as already inited (until KVM supports SMM). */ > > pci_conf[0x5B] = 0x02; > > } > > - piix4_update_hotplug(s); > > + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { > > + piix4_update_hotplug(s, i); > > + } > > } > > > > static void piix4_pm_powerdown_req(Notifier *n, void *opaque) > > @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) > > acpi_pm1_evt_power_down(&s->ar); > > } > > > > -static void piix4_update_guest_info(PIIX4PMState *s) > > +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > > + PCIHotplugState state); > > + > > +static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) > > { > > - PCIDevice *dev = &s->dev; > > - BusState *bus = qdev_get_parent_bus(&dev->qdev); > > + PIIX4PMState *s = opaque; > > + PcGuestInfo *guest_info = s->guest_info; > > BusChild *kid, *next; > > + BusState *bus = &pci_bus->qbus; > > > > - memset(s->guest_info->slot_hotplug_enable, 0xff, > > - DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE)); > > + int i = piix4_find_hotplug_bus(guest_info, NULL); > > + assert(i >= 0); > > + > > + guest_info->hotplug_buses[i] = pci_bus; > > > > QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { > > DeviceState *qdev = kid->child; > > @@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) > > int slot = PCI_SLOT(pdev->devfn); > > > > if (pc->no_hotplug) { > > - clear_bit(slot, s->guest_info->slot_hotplug_enable); > > + clear_bit(slot, guest_info->hotplug_enable[i]); > > } > > } > > + > > + pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev); > > +} > > + > > +static void piix4_update_guest_info(PIIX4PMState *s) > > +{ > > + PCIDevice *dev = &s->dev; > > + > > + memset(s->guest_info->hotplug_enable, 0xff, > > + sizeof s->guest_info->hotplug_enable); > > + > > + pci_for_each_bus(dev->bus, piix4_update_bus_guest_info, > > + s); > > } > > > > static void piix4_pm_machine_ready(Notifier *n, void *opaque) > > @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > > { > > PIIX4PMState *s = opaque; > > uint32_t val = 0; > > + int bsel = s->hotplug_select; > > + > > + if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) { > > + return 0; > > + } > > > > switch (addr) { > > case PCI_UP_BASE - PCI_HOTPLUG_ADDR: > > /* Manufacture an "up" value to cause a device check on any hotplug > > * slot with a device. Extra device checks are harmless. */ > > - val = s->pci0_slot_device_present & s->pci0_hotplug_enable; > > + val = s->pci_status[bsel].device_present & > > + s->pci_status[bsel].hotplug_enable; > > PIIX4_DPRINTF("pci_up_read %x\n", val); > > break; > > case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR: > > - val = s->pci0_status.down; > > + val = s->pci_status[bsel].down; > > PIIX4_DPRINTF("pci_down_read %x\n", val); > > break; > > case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: > > @@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > > PIIX4_DPRINTF("pci_features_read %x\n", val); > > break; > > case PCI_RMV_BASE - PCI_HOTPLUG_ADDR: > > - val = s->pci0_hotplug_enable; > > + val = s->pci_status[bsel].hotplug_enable; > > break; > > + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: > > + val = s->hotplug_select; > > default: > > break; > > } > > @@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) > > static void pci_write(void *opaque, hwaddr addr, uint64_t data, > > unsigned int size) > > { > > + PIIX4PMState *s = opaque; > > switch (addr) { > > case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: > > - acpi_piix_eject_slot(opaque, (uint32_t)data); > > + if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) { > > + break; > > + } > > + acpi_piix_eject_slot(s, s->hotplug_select, data); > > PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", > > addr, data); > > break; > > + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: > > + s->hotplug_select = data; > > default: > > break; > > } > > @@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) > > g->sts[id / 8] |= (1 << (id % 8)); > > } > > > > -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > > - PCIHotplugState state); > > - > > static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > > PCIBus *bus, PIIX4PMState *s) > > { > > @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > > PCI_HOTPLUG_SIZE); > > memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, > > &s->io_pci); > > - pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev); > > - > > qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); > > memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", > > PIIX4_PROC_LEN); > > @@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, > > qemu_register_cpu_added_notifier(&s->cpu_added_notifier); > > } > > > > -static void enable_device(PIIX4PMState *s, int slot) > > +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) > > { > > s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; > > - s->pci0_slot_device_present |= (1U << slot); > > + s->pci_status[bsel].device_present |= (1U << slot); > > } > > > > -static void disable_device(PIIX4PMState *s, int slot) > > +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) > > { > > s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; > > - s->pci0_status.down |= (1U << slot); > > + s->pci_status[bsel].down |= (1U << slot); > > } > > > > static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > > @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, > > int slot = PCI_SLOT(dev->devfn); > > PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, > > PCI_DEVICE(qdev)); > > + int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus); > > + if (bsel < 0) { > > + return -1; > > + } > > > > /* Don't send event when device is enabled during qemu machine creation: > > * it is present on boot, no hotplug event is necessary. We do send an > > * event when the device is disabled later. */ > > if (state == PCI_COLDPLUG_ENABLED) { > > - s->pci0_slot_device_present |= (1U << slot); > > + s->pci_status[bsel].device_present |= (1U << slot); > > return 0; > > } > > > > if (state == PCI_HOTPLUG_ENABLED) { > > - enable_device(s, slot); > > + enable_device(s, bsel, slot); > > } else { > > - disable_device(s, slot); > > + disable_device(s, bsel, slot); > > } > > > > pm_update_sci(s); > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > > index 72606a8..43e4988 100644 > > --- a/hw/i386/acpi-build.c > > +++ b/hw/i386/acpi-build.c > > @@ -34,6 +34,7 @@ > > #include "hw/acpi/acpi.h" > > #include "hw/nvram/fw_cfg.h" > > #include "hw/i386/bios-linker-loader.h" > > +#include "hw/pci/pci_bus.h" > > > > #define ACPI_BUILD_APPNAME "Bochs" > > #define ACPI_BUILD_APPNAME6 "BOCHS " > > @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) > > #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) > > #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) > > > > -/* 0x5B 0x82 DeviceOp PkgLength NameString */ > > -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) > > -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) > > -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) > > -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) > > -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) > > -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) > > - > > #define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ > > #define ACPI_SSDT_HEADER_LENGTH 36 > > > > #include "hw/i386/ssdt-misc.hex" > > -#include "hw/i386/ssdt-pcihp.hex" > > + > > +static inline GArray *build_alloc_array(void) > > +{ > > + return g_array_new(false, true /* clear */, 1); > > +} > > + > > +static inline void build_free_array(GArray *array) > > +{ > > + g_array_free(array, true); > > +} > > + > > +static inline void build_prepend_byte(GArray *array, uint8_t val) > > +{ > > + g_array_prepend_val(array, val); > > +} > > + > > +static inline void build_append_byte(GArray *array, uint8_t val) > > +{ > > + g_array_append_val(array, val); > > +} > > + > > +static inline void build_prepend_array(GArray *array, GArray *val) > > +{ > > + g_array_prepend_vals(array, val->data, val->len); > > +} > > + > > +static inline void build_append_array(GArray *array, GArray *val) > > +{ > > + g_array_append_vals(array, val->data, val->len); > > +} > > + > > +static void build_append_nameseg(GArray *array, const char *format, ...) > > +{ > > + GString *s = g_string_new(""); > > + va_list args; > > + > > + va_start(args, format); > > + g_string_vprintf(s, format, args); > > + va_end(args); > > + > > + assert(s->len == 4); > > + g_array_append_vals(array, s->str, s->len); > > + g_string_free(s, true); > > +} > > + > > +static void build_prepend_nameseg(GArray *array, const char *format, ...) > > +{ > > + GString *s = g_string_new(""); > > + va_list args; > > + > > + va_start(args, format); > > + g_string_vprintf(s, format, args); > > + va_end(args); > > + > > + assert(s->len == 4); > > + g_array_prepend_vals(array, s->str, s->len); > > + g_string_free(s, true); > > +} > > + > > +static void build_prepend_devfn(GArray *name, uint8_t devfn) > > +{ > > + build_prepend_nameseg(name, "S%0.02x_", devfn); > > +} > > + > > +static void build_append_devfn(GArray *name, uint8_t devfn) > > +{ > > + GArray *array = build_alloc_array(); > > + build_prepend_devfn(array, devfn); > > + build_append_array(name, array); > > + build_free_array(array); > > +} > > + > > +static void build_prepend_name_string(GArray *name, PCIBus *bus) > > +{ > > + int segcount; > > + > > + for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) { > > + build_prepend_devfn(name, bus->parent_dev->devfn); > > + } > > + g_array_prepend_vals(name, "PCI0", 4); > > + g_array_prepend_vals(name, "_SB_", 4); > > + /* Name segment length is a multiple of 4 */ > > + assert(name->len % 4 == 0); > > + segcount = name->len / 4; > > + build_prepend_byte(name, segcount); > > + build_prepend_byte(name, 0x2f); /* MultiNamePrefix */ > > + build_prepend_byte(name, '\\'); /* RootChar */ > > +} > > + > > +static void build_append_name_string(GArray *name, PCIBus *bus) > > +{ > > + GArray *append = build_alloc_array(); > > + build_prepend_name_string(append, bus); > > + build_append_array(name, append); > > + build_free_array(append); > > +} > > + > > +/* 5.4 Definition Block Encoding */ > > +enum { > > + PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */ > > + PACKAGE_LENGTH_2BYTE_SHIFT = 4, > > + PACKAGE_LENGTH_3BYTE_SHIFT = 12, > > + PACKAGE_LENGTH_4BYTE_SHIFT = 20, > > +}; > > + > > +static void build_prepend_package_length(GArray *package) > > +{ > > + unsigned bytes; > > + uint8_t byte; > > + unsigned package_len = package->len; > > + unsigned length = package_len; > > + > > + if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) { > > + byte = length; > > + build_prepend_byte(package, byte); > > + return; > > + } > > + > > + if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) { > > + byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT; > > + build_prepend_byte(package, byte); > > + length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1; > > + } > > + if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) { > > + byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT; > > + build_prepend_byte(package, byte); > > + length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1; > > + } > > + if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) { > > + byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT; > > + build_prepend_byte(package, byte); > > + length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1; > > + } > > + > > + bytes = package->len - package_len; > > + byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length; > > + build_prepend_byte(package, byte); > > +} > > + > > +static void build_package(GArray *package, uint8_t op) > > +{ > > + build_prepend_package_length(package); > > + build_prepend_byte(package, op); > > +} > > + > > +static void build_extop_package(GArray *package, uint8_t op) > > +{ > > + build_package(package, op); > > + build_prepend_byte(package, 0x5B); /* ExtOpPrefix */ > > +} > > + > > +static void build_append_notify(GArray *method, GArray *target_name, > > + uint16_t value) > > +{ > > + GArray *notify = build_alloc_array(); > > + uint8_t op = 0xA0; /* IfOp */ > > + > > + build_append_byte(notify, 0x93); /* LEqualOp */ > > + build_append_byte(notify, 0x68); /* Arg0Op */ > > + build_append_byte(notify, 0x0A); /* BytePrefix */ > > + build_append_byte(notify, value); > > + build_append_byte(notify, value >> 8); > > + build_append_byte(notify, 0x86); /* NotifyOp */ > > + build_append_array(notify, target_name); > > + build_append_byte(notify, 0x69); /* Arg1Op */ > > + > > + /* Pack it up */ > > + build_package(notify, op); > > + > > + build_append_array(method, notify); > > + > > + build_free_array(notify); > > +} > > + > > +static void > > +build_append_notify_slots(GArray *method, PCIBus *bus, > > + unsigned bus_select, > > + DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX)) > > +{ > > + int i; > > + > > + if (!bus) { > > + return; > > + } > > + > > + for (i = 0; i < 32; i++) { > > + GArray *target; > > + if (!test_bit(i, hotplug_enable)) { > > + continue; > > + } > > + target = build_alloc_array(); > > + build_prepend_devfn(target, i); > > + build_prepend_name_string(target, bus); > > + /* *32 matches Multiply in PCNF */ > > + build_append_notify(method, target, bus_select * 32 + i); > > + build_free_array(target); > > + } > > +} > > + > > +static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) > > +{ > > + int i; > > + GArray *method = build_alloc_array(); > > + > > + uint8_t op = 0x14; /* MethodOp */ > > + > > + build_append_nameseg(method, "PCNT"); > > + build_append_byte(method, 0x02); /* MethodFlags: ArgCount */ > > + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { > > + build_append_notify_slots(method, guest_info->hotplug_buses[i], i, > > + guest_info->hotplug_enable[i]); > > + } > > + build_package(method, op); > > + > > + build_append_array(table, method); > > + build_free_array(method); > > +} > > + > > +static void build_append_device(GArray *table, unsigned devfn) > > +{ > > + uint8_t op, ej0_op; > > + > > + GArray *device = build_alloc_array(); > > + GArray *ej0 = build_alloc_array(); > > + > > + op = 0x82; /* DeviceOp */ > > + > > + build_append_devfn(device, devfn); > > + > > + build_append_byte(device, 0x08); /* NameOp */ > > + build_append_nameseg(device, "_SUN"); > > + build_append_byte(device, 0x0A); /* BytePrefix */ > > + build_append_byte(device, PCI_SLOT(devfn)); > > + build_append_byte(device, 0x08); /* NameOp */ > > + build_append_nameseg(device, "_ADR"); > > + build_append_byte(device, 0x0C); /* DWordPrefix */ > > + build_append_byte(device, PCI_FUNC(devfn)); > > + build_append_byte(device, 0x00); > > + build_append_byte(device, PCI_SLOT(devfn)); > > + build_append_byte(device, 0x00); > > + > > + ej0_op = 0x14; /* MethodOp */ > > + build_append_nameseg(ej0, "_EJ0"); > > + build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */ > > + build_append_byte(ej0, 0xA4); /* ReturnOp */ > > + build_append_nameseg(ej0, "PCEJ"); > > + build_append_nameseg(ej0, "_SUN"); > > + build_append_nameseg(ej0, "BNUM"); > > + build_package(ej0, ej0_op); > > + build_append_array(device, ej0); > > + > > + build_extop_package(device, op); > > + > > + build_append_array(table, device); > > + build_free_array(device); > > + build_free_array(ej0); > > +} > > + > > +static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS], > > + PCIBus *bus, unsigned devfn) > > +{ > > + int i; > > + for (i = 0; i < ARRAY_SIZE(buses); ++i) { > > + if (buses[i] && buses[i]->parent_dev && > > + buses[i]->parent_dev->bus == bus && > > + buses[i]->parent_dev->devfn == devfn) { > > + return true; > > + } > > + } > > + return false; > > +} > > + > > +static void > > +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) > > +{ > > + uint8_t op; > > + int slot; > > + unsigned devfn; > > + GArray *device; > > + PCIBus *bus = guest_info->hotplug_buses[bnum]; > > + > > + if (!bus) { > > + return; > > + } > > + > > + device = build_alloc_array(); > > + > > + /* Root device described in DSDT */ > > + if (bnum) { > > + devfn = bus->parent_dev->devfn; > > + > > + op = 0x82; /* DeviceOp */ > > + > > + build_append_name_string(device, bus); > > + > > + build_append_byte(device, 0x08); /* NameOp */ > > + build_append_nameseg(device, "_SUN"); > > + build_append_byte(device, 0x0A); /* BytePrefix */ > > + build_append_byte(device, PCI_SLOT(devfn)); > > + build_append_byte(device, 0x08); /* NameOp */ > > + build_append_nameseg(device, "BNUM"); > > + build_append_byte(device, 0x0A); /* BytePrefix */ > > + build_append_byte(device, bnum); > > + build_append_byte(device, 0x08); /* NameOp */ > > + build_append_nameseg(device, "_ADR"); > > + build_append_byte(device, 0x0C); /* DWordPrefix */ > > + build_append_byte(device, PCI_FUNC(devfn)); > > + build_append_byte(device, 0x00); > > + build_append_byte(device, PCI_SLOT(devfn)); > > + build_append_byte(device, 0x00); > > + } else { > > + op = 0x10; /* ScopeOp */ > > + build_append_nameseg(device, "PCI0"); > > + } > > + > > + for (slot = 0; slot < PCI_SLOT_MAX; ++slot) { > > + if (!test_bit(slot, guest_info->hotplug_enable[bnum])) { > > + continue; > > + } > > + /* Bus device? We'll add it later */ > > + if (build_find_device_bus(guest_info->hotplug_buses, > > + bus, PCI_DEVFN(slot, 0))) { > > + continue; > > + } > > + build_append_device(device, PCI_DEVFN(slot, 0)); > > + } > > + > > + if (bnum) { > > + build_extop_package(device, op); > > + } else { > > + build_package(device, op); > > + } > > + > > + build_append_array(table, device); > > + build_free_array(device); > > +} > > + > > +static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) > > +{ > > + int i; > > + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { > > + build_append_bus(table, i, guest_info); > > + } > > +} > > > > static uint8_t* > > build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, > > - const char *target, int ofs) > > + const char *target, int ofs, int step) > > { > > int i; > > > > @@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, > > *(ssdt_ptr++) = i; > > *(ssdt_ptr++) = 0x86; /* NotifyOp */ > > memcpy(ssdt_ptr, target, 4); > > - ssdt_ptr[ofs] = acpi_get_hex(i >> 4); > > - ssdt_ptr[ofs + 1] = acpi_get_hex(i); > > + ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4); > > + ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); > > ssdt_ptr += 4; > > *(ssdt_ptr++) = 0x69; /* Arg1Op */ > > } > > return ssdt_ptr; > > } > > > > -static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) > > -{ > > - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4); > > - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot); > > - ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot; > > - ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot; > > - > > - /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot, > > - * replace the method name: _EJ0 by ACPI_EJ0_. */ > > - /* Sanity check */ > > - assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4)); > > - > > - if (!eject) { > > - memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4); > > - } > > -} > > - > > static void > > build_ssdt(GArray *table_data, GArray *linker, > > FWCfgState *fw_cfg, PcGuestInfo *guest_info) > > @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, > > + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ > > + (1+2+5+(12*acpi_cpus)) /* NTFY */ > > + (6+2+1+(1*acpi_cpus)) /* CPON */ > > - + (1+3+4) /* Scope(PCI0) */ > > - + ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */ > > - + (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */ > > - uint8_t *ssdt = acpi_data_push(table_data, length); > > - uint8_t *ssdt_ptr = ssdt; > > + ); > > + uint8_t *ssdt; > > + uint8_t *ssdt_ptr; > > + GArray *devices = build_alloc_array(); > > + GArray *notify = build_alloc_array(); > > + > > + /* TODO: rewrite this function using glib */ > > + build_append_device_buses(devices, guest_info); > > + length += devices->len; > > + build_append_notify_buses(notify, guest_info); > > + length += notify->len; > > + > > + ssdt = acpi_data_push(table_data, length); > > + ssdt_ptr = ssdt; > > > > /* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ > > memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml)); > > @@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker, > > > > /* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ > > /* Arg0 = Processor ID = APIC ID */ > > - ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2); > > + ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1); > > > > /* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ > > *(ssdt_ptr++) = 0x08; /* NameOp */ > > @@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, > > *(ssdt_ptr++) = 'I'; > > *(ssdt_ptr++) = '0'; > > > > - /* build Device object for each slot */ > > - for (i = 1; i < PCI_SLOT_MAX; i++) { > > - bool eject = test_bit(i, guest_info->slot_hotplug_enable); > > - memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF); > > - patch_pcihp(i, ssdt_ptr, eject); > > - ssdt_ptr += ACPI_PCIHP_SIZEOF; > > - } > > - > > - ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1); > > + memcpy(ssdt_ptr, devices->data, devices->len); > > + ssdt_ptr += devices->len; > > + memcpy(ssdt_ptr, notify->data, notify->len); > > + ssdt_ptr += notify->len; > > > > build_header(linker, table_data, > > (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1); > > diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs > > index 2ab2572..c2e74e9 100644 > > --- a/hw/i386/Makefile.objs > > +++ b/hw/i386/Makefile.objs > > @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o > > obj-y += kvmvapic.o > > obj-y += acpi-build.o > > obj-y += bios-linker-loader.o > > -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex > > +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex > > hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex > > hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex > > > > diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated > > deleted file mode 100644 > > index 0d32a27..0000000 > > --- a/hw/i386/ssdt-pcihp.hex.generated > > +++ /dev/null > > @@ -1,108 +0,0 @@ > > -static unsigned char ssdt_pcihp_name[] = { > > -0x33 > > -}; > > -static unsigned char ssdt_pcihp_adr[] = { > > -0x44 > > -}; > > -static unsigned char ssdt_pcihp_end[] = { > > -0x58 > > -}; > > -static unsigned char ssdp_pcihp_aml[] = { > > -0x53, > > -0x53, > > -0x44, > > -0x54, > > -0x58, > > -0x0, > > -0x0, > > -0x0, > > -0x1, > > -0x77, > > -0x42, > > -0x58, > > -0x50, > > -0x43, > > -0x0, > > -0x0, > > -0x42, > > -0x58, > > -0x53, > > -0x53, > > -0x44, > > -0x54, > > -0x50, > > -0x43, > > -0x1, > > -0x0, > > -0x0, > > -0x0, > > -0x49, > > -0x4e, > > -0x54, > > -0x4c, > > -0x28, > > -0x5, > > -0x10, > > -0x20, > > -0x10, > > -0x33, > > -0x5c, > > -0x2e, > > -0x5f, > > -0x53, > > -0x42, > > -0x5f, > > -0x50, > > -0x43, > > -0x49, > > -0x30, > > -0x5b, > > -0x82, > > -0x26, > > -0x53, > > -0x41, > > -0x41, > > -0x5f, > > -0x8, > > -0x5f, > > -0x53, > > -0x55, > > -0x4e, > > -0xa, > > -0xaa, > > -0x8, > > -0x5f, > > -0x41, > > -0x44, > > -0x52, > > -0xc, > > -0x0, > > -0x0, > > -0xaa, > > -0x0, > > -0x14, > > -0xf, > > -0x5f, > > -0x45, > > -0x4a, > > -0x30, > > -0x1, > > -0xa4, > > -0x50, > > -0x43, > > -0x45, > > -0x4a, > > -0x5f, > > -0x53, > > -0x55, > > -0x4e > > -}; > > -static unsigned char ssdt_pcihp_start[] = { > > -0x30 > > -}; > > -static unsigned char ssdt_pcihp_id[] = { > > -0x3d > > -}; > > -static unsigned char ssdt_pcihp_ej0[] = { > > -0x4a > > -}; > > -- > > MST
Hi, > * Use of glib's GArray makes it much easier to build > up tables in code without need for iasl and code patching Nice. > Design: > - each bus gets assigned a number 0-255 > - generated ACPI code writes this number > to a new BSEL register, then uses existing > UP/DOWN registers to probe slot status; > to eject, write number to BSEL register, > then slot into existing EJ > > This is to address the ACPI spec requirement to > avoid config cycle access to any bus except PCI roots. > > Portability: > - Non x86 (or any Linux) platforms don't need any of this code. > They can keep happily using SHPC the way > they always did. Hmm. Is is possible to write a SHPC driver in AML? I think it would be alot better to have one guest/host interface for pci bridge hotplug instead of two. cheers, Gerd
On Mon, Jun 10, 2013 at 08:25:15PM -0500, Anthony Liguori wrote: > > I do understand your desire to pass this stuff as parameters, but I > > really don't see it as feasible. I'm hoping that if you can write up > > some examples with specifics you'll either enlighten me or you'll see > > the difficulties I'm trying to describe. > > Is your basic argument that generating these tables is hard, QEMU can > and should just hard code them and change them whenever it feels like > without involving SeaBIOS? Basically I think it makes sense to listen to people that write drivers for PV hardware that QEMU exposes. bios is one such example. If people that are supposed to use our interfaces ask for a specific interface, I think we should listen. If people come and say that building acpi dynamically is hard for them and they want QEMU to prepare templates for acpi - and there's overwhelming concensus on this - QEMU should stop going "tough luck we know what's best for you". > If that's what it boils down to, I'll look more closely at it. So please look at the original patchset [PATCH RFC 00/13] qemu: generate acpi tables for the guest and let's move the discussion of it beyond the meta-argument "QEMU does not need to generate any ACPI tables" and to the specific interface that we can use to make life easier for the bios. Specifically several options were discussed, on and off-line for passing per-formatted tables to firmware: 1. use existing fw cfg interface from bios, qemu side make it work well with cross version migration. 2. same as 1 but find some existing hardware that is similar to fw cfg and emulate it. For example build a directory structure in some flash. 3. make qemu put tables into guest memory. bios needs to reserve this memory. 4. same as 3 but bios allocates memory and passes address to qemu. Kevin suggested 1. This is what my patchset implements. But I think bios could live with any of the others, as long as it's a generic interface so we can stop changing bios code all the time.
On Tue, Jun 11, 2013 at 07:33:02AM +0200, Gerd Hoffmann wrote: > Hi, > > > * Use of glib's GArray makes it much easier to build > > up tables in code without need for iasl and code patching > > Nice. > > > Design: > > - each bus gets assigned a number 0-255 > > - generated ACPI code writes this number > > to a new BSEL register, then uses existing > > UP/DOWN registers to probe slot status; > > to eject, write number to BSEL register, > > then slot into existing EJ > > > > This is to address the ACPI spec requirement to > > avoid config cycle access to any bus except PCI roots. > > > > Portability: > > - Non x86 (or any Linux) platforms don't need any of this code. > > They can keep happily using SHPC the way > > they always did. > > Hmm. Is is possible to write a SHPC driver in AML? I think it would be > alot better to have one guest/host interface for pci bridge hotplug > instead of two. > > cheers, > Gerd No, it's not possible, SHPC is not designed to be used from ACPI. Two reasons off the top of my head, there are likely others: 1. SHPC uses regular PCI interrupts to signal events. It does not signal GFE and SCI. 2. SHPC uses config accesses to get information from device. ACPI does not allow config access anywhere except the root bus from ACPI (This requirement is designed to give the OS freedom to reconfigure PCI in an arbitrary way).
On Mon, Jun 10, 2013 at 08:03:46PM -0500, Anthony Liguori wrote: > >> It is not "supported" by QEMU. > > > > No, but I've always thought that QEMU was happy to have alternative > > firmware projects. > > We are and we're happy to accept patches to enable things even if its > proprietary. But that's all assuming we're improving hardware > emulation. > > What we're talking about here is doing something that's very un-hardware like. Two points 1. You never explained what you mean by un-hardware like. Currently bios is in a ROM device, and it has a template for ACPI tables together with it. This simply moves the tables to a separate ROM device (FW CFG), and generalizes the template using the linker interface. One ROM is hardware-like but two is un-hardware like? ACPI tables are static so it's likely lots of hardware has at least some of them pre-formatted in flash, then tweak some things like SRAT a bit. 2. There's no hardware-like alternative. A current alternative is exposing lots of PV interfaces to bios, bios builds up ACPI tables based on them. All of this code is in practice PC only, if we ever attempt to generalize we'll see how the interfaces are not a good match. They are still PV, poorly documented and it's very hard for bios not to make any assumptions about them. When it does, result is a very obscure breakage. Yes, building tables in bios means QEMU code is simpler but we double the amount of interfaces that we need to get right: QEMU-BIOS + BIOS-OS Yes it's "not QEMU code" but it's seen as a single package by users and it does not really matter - excellent hardware with lousy firmware is useless.
Hi, >>> Portability: >>> - Non x86 (or any Linux) platforms don't need any of this code. >>> They can keep happily using SHPC the way >>> they always did. >> >> Hmm. Is is possible to write a SHPC driver in AML? I think it would be >> alot better to have one guest/host interface for pci bridge hotplug >> instead of two. >> >> cheers, >> Gerd > > No, it's not possible, SHPC is not designed to be used from ACPI. > > Two reasons off the top of my head, there are likely others: > > 1. SHPC uses regular PCI interrupts to signal events. It does not signal > GFE and SCI. > > 2. SHPC uses config accesses to get information from device. > ACPI does not allow config access anywhere except the root bus from ACPI > (This requirement is designed to give the OS freedom > to reconfigure PCI in an arbitrary way). OK, so it's designed for OSes to have native SHPC support. Linux has that? Quick googling found me Windows Vista+ has it too, correct? So that leaves Win2k + WinXP versions. Older Windows versions do not support pci hotplug at all. Win2k is EOL already. WinXP will follow soon. More users? /me wonders whenever it is worth hopping through the loops needed to support ACPI-based hotplug of devices behind bridges in the first place. cheers, Gerd
On Tue, Jun 11, 2013 at 09:42:29AM +0200, Gerd Hoffmann wrote: > Hi, > > >>> Portability: > >>> - Non x86 (or any Linux) platforms don't need any of this code. > >>> They can keep happily using SHPC the way > >>> they always did. > >> > >> Hmm. Is is possible to write a SHPC driver in AML? I think it would be > >> alot better to have one guest/host interface for pci bridge hotplug > >> instead of two. > >> > >> cheers, > >> Gerd > > > > No, it's not possible, SHPC is not designed to be used from ACPI. > > > > Two reasons off the top of my head, there are likely others: > > > > 1. SHPC uses regular PCI interrupts to signal events. It does not signal > > GFE and SCI. > > > > 2. SHPC uses config accesses to get information from device. > > ACPI does not allow config access anywhere except the root bus from ACPI > > (This requirement is designed to give the OS freedom > > to reconfigure PCI in an arbitrary way). > > OK, so it's designed for OSes to have native SHPC support. Linux has that? Yes. > Quick googling found me Windows Vista+ has it too, correct? So that > leaves Win2k + WinXP versions. Older Windows versions do not support > pci hotplug at all. Win2k is EOL already. WinXP will follow soon. > > More users? googling lead you astray. No windows version supports SHPC. > /me wonders whenever it is worth hopping through the loops needed to > support ACPI-based hotplug of devices behind bridges in the first place. > > cheers, > Gerd Yes and not just because of windows guests. ACPI spec is also very explicit that native hotplug is an optional feature. Test suites such as WHQL are known to test spec compliance.
Hi, > Yes and not just because of windows guests. > ACPI spec is also very explicit that native hotplug is an optional > feature. Test suites such as WHQL are known to test spec compliance. /me looks a bit surprised. This pretty much implies that any shpc bridge needs a second interface to the hotplug functionality which can be driven via ACPI. Or the firmware somehow handles this using smm (not sure this works for the IRQ though). Do you know how this is handled by real hardware? cheers, Gerd
On Tue, Jun 11, 2013 at 10:00:36AM +0200, Gerd Hoffmann wrote: > Hi, > > > Yes and not just because of windows guests. > > ACPI spec is also very explicit that native hotplug is an optional > > feature. Test suites such as WHQL are known to test spec compliance. > > /me looks a bit surprised. > > This pretty much implies that any shpc bridge needs a second interface > to the hotplug functionality which can be driven via ACPI. Or the > firmware somehow handles this using smm (not sure this works for the IRQ > though). > > Do you know how this is handled by real hardware? > > cheers, > Gerd SHPC is not very widely deployed on a PC. Since most hardware vendors do care about windows support, the only way is a separate interface that is driven via ACPI. You then need an ACPI specific register to switch to standard SHPC. The SHPC spec even tells you as much.
On Tue, Jun 11, 2013 at 11:18:13AM +0300, Michael S. Tsirkin wrote: > On Tue, Jun 11, 2013 at 10:00:36AM +0200, Gerd Hoffmann wrote: > > Hi, > > > > > Yes and not just because of windows guests. > > > ACPI spec is also very explicit that native hotplug is an optional > > > feature. Test suites such as WHQL are known to test spec compliance. > > > > /me looks a bit surprised. > > > > This pretty much implies that any shpc bridge needs a second interface > > to the hotplug functionality which can be driven via ACPI. Or the > > firmware somehow handles this using smm (not sure this works for the IRQ > > though). > > > > Do you know how this is handled by real hardware? > > > > cheers, > > Gerd > > SHPC is not very widely deployed on a PC. > > Since most hardware vendors do care about windows support, > the only way is a separate interface that is driven via ACPI. > You then need an ACPI specific register to switch to standard SHPC. > The SHPC spec even tells you as much. Just to give another example, interface in this patch scales trivially to multi-root configurations, if we ever want to support them. > -- > MST
On Mon, 2013-06-10 at 20:03 -0500, Anthony Liguori wrote: > > > I agree that the FAT driver is not 'free software' and I agree that is > > a problem for usage with free software projects, such as QEMU. This is > > a big deal, but unfortunately, as an Intel employee, I think I've done > > as much as I can to address this. > > > > It couldn't hurt if more people that actually care about this spoke up > > on edk2-devel about the issue, or perhaps within a UEFI working group. > > Because, I know that they've stopped listening to me about it. > > Is this useful? I can try to make noise. I assume since folks like > you who have much more credibility and familiarity in this space have > given up, it's a lost cause. I don't think it's a lost cause. It's not technically hard to implement a FAT driver. And the *bureaucracy* shouldn't be hard either. Distributions ship *lots* of FAT implementations already. The alleged patents on it, if there even *are* any left, are fraudulent. And don't apply in the Free World where I live either, to my knowledge. If we can ship FAT support in the Linux kernel, in mtools, and in various other things including BSD kernels, then we can damn well ship a UEFI module which implements FAT support. Saying that we can't do the latter, while simultaneously continuing to do the former, makes absolutely no sense.
On Mon, 2013-06-10 at 19:11 -0500, Anthony Liguori wrote: > If we did this right in QEMU, we'd have to introduce a QOM property > anyway ... and then we'd have to update each firmware implementation to know about this new property, and the how it translates to the RESET_REG field in the DSDT, etc. So we *still* end up having to update firmwares to keep up with qemu, much more than we would if we'd generated the tables on the qemu side.
Il 10/06/2013 21:03, Anthony Liguori ha scritto: >>> I'm not really convinced that >>> QEMU<->firmware is a GPL boundary because of how tightly the two are >>> linked. >> >> Where has 'linked' in terms of the GPL ever been anything other than >> actual executable linking? > > I should not have even brought this up as it's not worth debating. If > you're curious, http://www.gnu.org/licenses/gpl-faq.html#MereAggregation With the usual IANAL care, I think QOM would be much much more of a legal grey area that passing ACPI tables. If you pass ACPI tables, the ACPI tables are clearly part of QEMU, and are almost as clearly "just data" for the BIOS. But if you use QOM, you may start wondering if "the semantics of the communication are intimate enough, exchanging complex internal data structures". Probably not, as it is a generic interface and there could be in principle other consumers than the firmware, but still complex enough that raising the issue is not moot. Paolo
Il 11/06/2013 03:35, Michael S. Tsirkin ha scritto: > Two points > 1. You never explained what you mean by un-hardware like. > > Currently bios is in a ROM device, and it has a > template for ACPI tables together with it. > This simply moves the tables to a separate ROM > device (FW CFG), and generalizes the template using > the linker interface. > One ROM is hardware-like but two is un-hardware like? > > ACPI tables are static so it's likely lots of > hardware has at least some of them pre-formatted in flash, > then tweak some things like SRAT a bit. Also having a "bootstrap processor" was certainly not unheard of some decades ago. Right now we get all sort of SMM hacks instead of adding more processors, but it's certainly not un-hardware like. Maybe we should just have a bytecode interpreter and write the ACPI generator in that language. :) Paolo
On 06/14/13 01:02, Paolo Bonzini wrote: > Il 10/06/2013 21:03, Anthony Liguori ha scritto: >>>> I'm not really convinced that >>>> QEMU<->firmware is a GPL boundary because of how tightly the two are >>>> linked. >>> >>> Where has 'linked' in terms of the GPL ever been anything other than >>> actual executable linking? >> >> I should not have even brought this up as it's not worth debating. If >> you're curious, http://www.gnu.org/licenses/gpl-faq.html#MereAggregation > > With the usual IANAL care, I think QOM would be much much more of a > legal grey area that passing ACPI tables. > > If you pass ACPI tables, the ACPI tables are clearly part of QEMU, and > are almost as clearly "just data" for the BIOS. > > But if you use QOM, you may start wondering if "the semantics of the > communication are intimate enough, exchanging complex internal data > structures". Probably not, as it is a generic interface and there could > be in principle other consumers than the firmware, but still complex > enough that raising the issue is not moot. Basing the decision about derivative or not on internal or complex enough ; well I find that unusable. First, complexity: web services can use very complex XML schemas, and that clearly doesn't make the server derivative of the client, or vice versa. Second, internality: this attribute can be wiped out simply by writing documentation for the data structure (which should be done *anyway*). Once it is documented, it is not internal any longer. However existence of documentation for a wire format between A and B should have absolutely no say in whether codebase A is derivative of codebase B. IANAL of course but I find the FSF's argument biased. (Of course I'm also not buying that linking against a library makes the client application (its own source code, or its dynamically linked binary) derivative of the library. If there are two libraries (especially when released under different licenses) implementing the same API, which one is the application a derivative of?) Laszlo
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 11/06/2013 03:35, Michael S. Tsirkin ha scritto: >> Two points >> 1. You never explained what you mean by un-hardware like. >> >> Currently bios is in a ROM device, and it has a >> template for ACPI tables together with it. >> This simply moves the tables to a separate ROM >> device (FW CFG), and generalizes the template using >> the linker interface. >> One ROM is hardware-like but two is un-hardware like? >> >> ACPI tables are static so it's likely lots of >> hardware has at least some of them pre-formatted in flash, >> then tweak some things like SRAT a bit. > > Also having a "bootstrap processor" was certainly not unheard of some > decades ago. Right now we get all sort of SMM hacks instead of adding > more processors, but it's certainly not un-hardware like. It's still not unheard of. This is how power systems work still. However, with PCs, the ACPI tables are generated by/included in the firmware. There's no question about that. > > Maybe we should just have a bytecode interpreter and write the ACPI > generator in that language. :) Indeed, we can even using an existing bytecode like the x86 instruction set and use this VM called KVM to execute it. I hear there are even C compilers for this bytecode ;-) Regards, Anthony Liguori > Paolo
Anthony Liguori wrote: > However, with PCs, the ACPI tables are generated by/included in the > firmware. There's no question about that. I think the key point is that the firmware is developed and delivered by the hardware vendor, and not by an independent source. The firmware is intimately tied to the hardware, and is very much an integral part of all hardware, to the point where the firmware is one of the most prominent downloads made available by all hardware vendors. Don't make the mistake of thinking that firmware for a virtual machine has less to do with the machine just because it's virtual. //Peter
On 06/14/13 02:26, Laszlo Ersek wrote: > On 06/14/13 01:02, Paolo Bonzini wrote: >> Il 10/06/2013 21:03, Anthony Liguori ha scritto: >>>>> I'm not really convinced that >>>>> QEMU<->firmware is a GPL boundary because of how tightly the two are >>>>> linked. >>>> >>>> Where has 'linked' in terms of the GPL ever been anything other than >>>> actual executable linking? >>> >>> I should not have even brought this up as it's not worth debating. If >>> you're curious, http://www.gnu.org/licenses/gpl-faq.html#MereAggregation >> >> With the usual IANAL care, I think QOM would be much much more of a >> legal grey area that passing ACPI tables. >> >> If you pass ACPI tables, the ACPI tables are clearly part of QEMU, and >> are almost as clearly "just data" for the BIOS. >> >> But if you use QOM, you may start wondering if "the semantics of the >> communication are intimate enough, exchanging complex internal data >> structures". Probably not, as it is a generic interface and there could >> be in principle other consumers than the firmware, but still complex >> enough that raising the issue is not moot. > > Basing the decision about > > derivative or not > > on > > internal > > or > > complex enough > > ; well I find that unusable. > > First, complexity: web services can use very complex XML schemas, and > that clearly doesn't make the server derivative of the client, or vice > versa. > > Second, internality: this attribute can be wiped out simply by writing > documentation for the data structure (which should be done *anyway*). > Once it is documented, it is not internal any longer. However existence > of documentation for a wire format between A and B should have > absolutely no say in whether codebase A is derivative of codebase B. > > IANAL of course but I find the FSF's argument biased. > > (Of course I'm also not buying that linking against a library makes the > client application (its own source code, or its dynamically linked > binary) derivative of the library. If there are two libraries > (especially when released under different licenses) implementing the > same API, which one is the application a derivative of?) This is my personal, private opinion, of course, which is independent of what my employer holds. Laszlo
diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt index a839434..951b99a 100644 --- a/docs/specs/acpi_pci_hotplug.txt +++ b/docs/specs/acpi_pci_hotplug.txt @@ -43,3 +43,11 @@ PCI removability status (IO port 0xae0c-0xae0f, 4-byte access): Used by ACPI BIOS _RMV method to indicate removability status to OS. One bit per slot. Read-only + +PCI bus selector (IO port 0xae10-0xae13, 4-byte access): +----------------------------------------------- + +Used by ACPI BIOS methods to select the current PCI bus. +When written, makes all the other PCI registers (0xae00 - 0xae0f) +to refer to the appropriate bus. +0 selects PCI root bus (default). diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index f8d0871..66ec787 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -38,7 +38,9 @@ struct PcGuestInfo { bool s3_disabled; bool s4_disabled; uint8_t s4_val; - DECLARE_BITMAP(slot_hotplug_enable, PCI_SLOT_MAX); +#define GUEST_INFO_MAX_HOTPLUG_BUS 256 + PCIBus *hotplug_buses[GUEST_INFO_MAX_HOTPLUG_BUS]; + DECLARE_BITMAP(hotplug_enable[GUEST_INFO_MAX_HOTPLUG_BUS], PCI_SLOT_MAX); uint16_t sci_int; uint8_t acpi_enable_cmd; uint8_t acpi_disable_cmd; diff --git a/hw/i386/acpi-dsdt.dsl b/hw/i386/acpi-dsdt.dsl index 90efce0..dc05668 100644 --- a/hw/i386/acpi-dsdt.dsl +++ b/hw/i386/acpi-dsdt.dsl @@ -133,11 +133,18 @@ DefinitionBlock ( B0EJ, 32, } + OperationRegion(SEL, SystemIO, 0xae10, 0x04) + Field(SEL, DWordAcc, NoLock, WriteAsZeros) { + BSEL, 32, + } + /* Methods called by bulk generated PCI devices below */ + Name(BNUM, 0x00) /* Methods called by hotplug devices */ - Method(PCEJ, 1, NotSerialized) { + Method(PCEJ, 2, NotSerialized) { // _EJ0 method - eject callback + Store(Arg1, BSEL) Store(ShiftLeft(1, Arg0), B0EJ) Return (0x0) } @@ -147,16 +154,25 @@ DefinitionBlock ( /* PCI hotplug notify method */ Method(PCNF, 0) { - // Local0 = iterator + // Local0 = bridge selector Store(Zero, Local0) - While (LLess(Local0, 31)) { - Increment(Local0) - If (And(PCIU, ShiftLeft(1, Local0))) { - PCNT(Local0, 1) - } - If (And(PCID, ShiftLeft(1, Local0))) { - PCNT(Local0, 3) - } + While (LLess(Local0, 255)) { + Store(Local0, BSEL) + // Local1 = slot iterator + Store(Zero, Local1) + + Store(PCIU, Local2) + Store(PCID, Local3) + While (LLess(Local1, 31)) { + Increment(Local1) + If (And(Local2, ShiftLeft(1, Local1))) { + PCNT(Add(Local1, Multiply(Local0, 32)), 1) + } + If (And(Local3, ShiftLeft(1, Local1))) { + PCNT(Add(Local1, Multiply(Local0, 32)), 3) + } + } + Increment(Local0) } } } diff --git a/hw/i386/ssdt-pcihp.dsl b/hw/i386/ssdt-pcihp.dsl deleted file mode 100644 index d29a5b9..0000000 --- a/hw/i386/ssdt-pcihp.dsl +++ /dev/null @@ -1,51 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - - * You should have received a copy of the GNU General Public License along - * with this program; if not, see <http://www.gnu.org/licenses/>. - */ - -ACPI_EXTRACT_ALL_CODE ssdp_pcihp_aml - -DefinitionBlock ("ssdt-pcihp.aml", "SSDT", 0x01, "BXPC", "BXSSDTPCIHP", 0x1) -{ - -/**************************************************************** - * PCI hotplug - ****************************************************************/ - - /* Objects supplied by DSDT */ - External(\_SB.PCI0, DeviceObj) - External(\_SB.PCI0.PCEJ, MethodObj) - - Scope(\_SB.PCI0) { - - /* Bulk generated PCI hotplug devices */ - ACPI_EXTRACT_DEVICE_START ssdt_pcihp_start - ACPI_EXTRACT_DEVICE_END ssdt_pcihp_end - ACPI_EXTRACT_DEVICE_STRING ssdt_pcihp_name - - // Method _EJ0 can be patched by BIOS to EJ0_ - // at runtime, if the slot is detected to not support hotplug. - // Extract the offset of the address dword and the - // _EJ0 name to allow this patching. - Device(SAA) { - ACPI_EXTRACT_NAME_BYTE_CONST ssdt_pcihp_id - Name(_SUN, 0xAA) - ACPI_EXTRACT_NAME_DWORD_CONST ssdt_pcihp_adr - Name(_ADR, 0xAA0000) - ACPI_EXTRACT_METHOD_STRING ssdt_pcihp_ej0 - Method(_EJ0, 1) { - Return (PCEJ(_SUN)) - } - } - } -} diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 19a26f5..10106fd 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -29,6 +29,7 @@ #include "exec/ioport.h" #include "hw/nvram/fw_cfg.h" #include "exec/address-spaces.h" +#include "hw/pci/pci_bus.h" //#define DEBUG @@ -42,11 +43,12 @@ #define GPE_LEN 4 #define PCI_HOTPLUG_ADDR 0xae00 -#define PCI_HOTPLUG_SIZE 0x000f +#define PCI_HOTPLUG_SIZE 0x0014 #define PCI_UP_BASE 0xae00 #define PCI_DOWN_BASE 0xae04 #define PCI_EJ_BASE 0xae08 #define PCI_RMV_BASE 0xae0c +#define PCI_SEL_BASE 0xae10 #define PIIX4_PROC_BASE 0xaf00 #define PIIX4_PROC_LEN 32 @@ -57,6 +59,8 @@ struct pci_status { uint32_t up; /* deprecated, maintained for migration compatibility */ uint32_t down; + uint32_t hotplug_enable; + uint32_t device_present; }; typedef struct CPUStatus { @@ -84,9 +88,8 @@ typedef struct PIIX4PMState { Notifier powerdown_notifier; /* for pci hotplug */ - struct pci_status pci0_status; - uint32_t pci0_hotplug_enable; - uint32_t pci0_slot_device_present; + struct pci_status pci_status[GUEST_INFO_MAX_HOTPLUG_BUS]; + uint32_t hotplug_select; uint8_t disable_s3; uint8_t disable_s4; @@ -98,6 +101,17 @@ typedef struct PIIX4PMState { PcGuestInfo *guest_info; } PIIX4PMState; +static int piix4_find_hotplug_bus(PcGuestInfo *guest_info, PCIBus *bus) +{ + int i; + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { + if (guest_info->hotplug_buses[i] == bus) { + return i; + } + } + return -1; +} + static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s); @@ -183,6 +197,8 @@ static void pm_write_config(PCIDevice *d, static void vmstate_pci_status_pre_save(void *opaque) { +#if 0 + /* TODO */ struct pci_status *pci0_status = opaque; PIIX4PMState *s = container_of(pci0_status, PIIX4PMState, pci0_status); @@ -190,6 +206,7 @@ static void vmstate_pci_status_pre_save(void *opaque) * to a version that still does... of course these might get lost * by an old buggy implementation, but we try. */ pci0_status->up = s->pci0_slot_device_present & s->pci0_hotplug_enable; +#endif } static int vmstate_acpi_post_load(void *opaque, int version_id) @@ -267,7 +284,10 @@ static int acpi_load_old(QEMUFile *f, void *opaque, int version_id) qemu_get_be16s(f, &temp); } +#if 0 + TODO ret = vmstate_load_state(f, &vmstate_pci_status, &s->pci0_status, 1); +#endif return ret; } @@ -293,21 +313,28 @@ static const VMStateDescription vmstate_acpi = { VMSTATE_TIMER(ar.tmr.timer, PIIX4PMState), VMSTATE_INT64(ar.tmr.overflow_time, PIIX4PMState), VMSTATE_STRUCT(ar.gpe, PIIX4PMState, 2, vmstate_gpe, ACPIGPE), +#if 0 VMSTATE_STRUCT(pci0_status, PIIX4PMState, 2, vmstate_pci_status, struct pci_status), +#endif VMSTATE_END_OF_LIST() } }; -static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned bsel, unsigned slots) { BusChild *kid, *next; - BusState *bus = qdev_get_parent_bus(&s->dev.qdev); + BusState *bus; int slot = ffs(slots) - 1; bool slot_free = true; + if (!s->guest_info->hotplug_buses[bsel]) { + return; + } + bus = &s->guest_info->hotplug_buses[bsel]->qbus; + /* Mark request as complete */ - s->pci0_status.down &= ~(1U << slot); + s->pci_status[bsel].down &= ~(1U << slot); QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -322,23 +349,26 @@ static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots) } } if (slot_free) { - s->pci0_slot_device_present &= ~(1U << slot); + s->pci_status[bsel].device_present &= ~(1U << slot); } } -static void piix4_update_hotplug(PIIX4PMState *s) +static void piix4_update_hotplug(PIIX4PMState *s, unsigned bsel) { - PCIDevice *dev = &s->dev; - BusState *bus = qdev_get_parent_bus(&dev->qdev); BusChild *kid, *next; + BusState *bus; + if (!s->guest_info->hotplug_buses[bsel]) { + return; + } + bus = &s->guest_info->hotplug_buses[bsel]->qbus; /* Execute any pending removes during reset */ - while (s->pci0_status.down) { - acpi_piix_eject_slot(s, s->pci0_status.down); + while (s->pci_status[bsel].down) { + acpi_piix_eject_slot(s, bsel, s->pci_status[bsel].down); } - s->pci0_hotplug_enable = ~0; - s->pci0_slot_device_present = 0; + s->pci_status[bsel].hotplug_enable = ~0; + s->pci_status[bsel].device_present = 0; QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -347,10 +377,10 @@ static void piix4_update_hotplug(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn); if (pc->no_hotplug) { - s->pci0_hotplug_enable &= ~(1U << slot); + s->pci_status[bsel].hotplug_enable &= ~(1U << slot); } - s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); } } @@ -358,6 +388,7 @@ static void piix4_reset(void *opaque) { PIIX4PMState *s = opaque; uint8_t *pci_conf = s->dev.config; + int i; pci_conf[0x58] = 0; pci_conf[0x59] = 0; @@ -371,7 +402,9 @@ static void piix4_reset(void *opaque) /* Mark SMM as already inited (until KVM supports SMM). */ pci_conf[0x5B] = 0x02; } - piix4_update_hotplug(s); + for (i = 0; i < GUEST_INFO_MAX_HOTPLUG_BUS; ++i) { + piix4_update_hotplug(s, i); + } } static void piix4_pm_powerdown_req(Notifier *n, void *opaque) @@ -382,14 +415,20 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(&s->ar); } -static void piix4_update_guest_info(PIIX4PMState *s) +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, + PCIHotplugState state); + +static void piix4_update_bus_guest_info(PCIBus *pci_bus, void *opaque) { - PCIDevice *dev = &s->dev; - BusState *bus = qdev_get_parent_bus(&dev->qdev); + PIIX4PMState *s = opaque; + PcGuestInfo *guest_info = s->guest_info; BusChild *kid, *next; + BusState *bus = &pci_bus->qbus; - memset(s->guest_info->slot_hotplug_enable, 0xff, - DIV_ROUND_UP(PCI_SLOT_MAX, BITS_PER_BYTE)); + int i = piix4_find_hotplug_bus(guest_info, NULL); + assert(i >= 0); + + guest_info->hotplug_buses[i] = pci_bus; QTAILQ_FOREACH_SAFE(kid, &bus->children, sibling, next) { DeviceState *qdev = kid->child; @@ -398,9 +437,22 @@ static void piix4_update_guest_info(PIIX4PMState *s) int slot = PCI_SLOT(pdev->devfn); if (pc->no_hotplug) { - clear_bit(slot, s->guest_info->slot_hotplug_enable); + clear_bit(slot, guest_info->hotplug_enable[i]); } } + + pci_bus_hotplug(pci_bus, piix4_device_hotplug, &s->dev.qdev); +} + +static void piix4_update_guest_info(PIIX4PMState *s) +{ + PCIDevice *dev = &s->dev; + + memset(s->guest_info->hotplug_enable, 0xff, + sizeof s->guest_info->hotplug_enable); + + pci_for_each_bus(dev->bus, piix4_update_bus_guest_info, + s); } static void piix4_pm_machine_ready(Notifier *n, void *opaque) @@ -589,16 +641,22 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) { PIIX4PMState *s = opaque; uint32_t val = 0; + int bsel = s->hotplug_select; + + if (bsel > GUEST_INFO_MAX_HOTPLUG_BUS) { + return 0; + } switch (addr) { case PCI_UP_BASE - PCI_HOTPLUG_ADDR: /* Manufacture an "up" value to cause a device check on any hotplug * slot with a device. Extra device checks are harmless. */ - val = s->pci0_slot_device_present & s->pci0_hotplug_enable; + val = s->pci_status[bsel].device_present & + s->pci_status[bsel].hotplug_enable; PIIX4_DPRINTF("pci_up_read %x\n", val); break; case PCI_DOWN_BASE - PCI_HOTPLUG_ADDR: - val = s->pci0_status.down; + val = s->pci_status[bsel].down; PIIX4_DPRINTF("pci_down_read %x\n", val); break; case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: @@ -606,8 +664,10 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) PIIX4_DPRINTF("pci_features_read %x\n", val); break; case PCI_RMV_BASE - PCI_HOTPLUG_ADDR: - val = s->pci0_hotplug_enable; + val = s->pci_status[bsel].hotplug_enable; break; + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: + val = s->hotplug_select; default: break; } @@ -618,12 +678,18 @@ static uint64_t pci_read(void *opaque, hwaddr addr, unsigned int size) static void pci_write(void *opaque, hwaddr addr, uint64_t data, unsigned int size) { + PIIX4PMState *s = opaque; switch (addr) { case PCI_EJ_BASE - PCI_HOTPLUG_ADDR: - acpi_piix_eject_slot(opaque, (uint32_t)data); + if (s->hotplug_select >= GUEST_INFO_MAX_HOTPLUG_BUS) { + break; + } + acpi_piix_eject_slot(s, s->hotplug_select, data); PIIX4_DPRINTF("pciej write %" HWADDR_PRIx " <== % " PRIu64 "\n", addr, data); break; + case PCI_SEL_BASE - PCI_HOTPLUG_ADDR: + s->hotplug_select = data; default: break; } @@ -706,9 +772,6 @@ static void piix4_init_cpu_status(CPUState *cpu, void *data) g->sts[id / 8] |= (1 << (id % 8)); } -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, - PCIHotplugState state); - static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCIBus *bus, PIIX4PMState *s) { @@ -720,8 +783,6 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, PCI_HOTPLUG_SIZE); memory_region_add_subregion(parent, PCI_HOTPLUG_ADDR, &s->io_pci); - pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev); - qemu_for_each_cpu(piix4_init_cpu_status, &s->gpe_cpu); memory_region_init_io(&s->io_cpu, &cpu_hotplug_ops, s, "apci-cpu-hotplug", PIIX4_PROC_LEN); @@ -730,16 +791,16 @@ static void piix4_acpi_system_hot_add_init(MemoryRegion *parent, qemu_register_cpu_added_notifier(&s->cpu_added_notifier); } -static void enable_device(PIIX4PMState *s, int slot) +static void enable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; - s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); } -static void disable_device(PIIX4PMState *s, int slot) +static void disable_device(PIIX4PMState *s, unsigned bsel, int slot) { s->ar.gpe.sts[0] |= PIIX4_PCI_HOTPLUG_STATUS; - s->pci0_status.down |= (1U << slot); + s->pci_status[bsel].down |= (1U << slot); } static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, @@ -748,19 +809,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int slot = PCI_SLOT(dev->devfn); PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, PCI_DEVICE(qdev)); + int bsel = piix4_find_hotplug_bus(s->guest_info, dev->bus); + if (bsel < 0) { + return -1; + } /* Don't send event when device is enabled during qemu machine creation: * it is present on boot, no hotplug event is necessary. We do send an * event when the device is disabled later. */ if (state == PCI_COLDPLUG_ENABLED) { - s->pci0_slot_device_present |= (1U << slot); + s->pci_status[bsel].device_present |= (1U << slot); return 0; } if (state == PCI_HOTPLUG_ENABLED) { - enable_device(s, slot); + enable_device(s, bsel, slot); } else { - disable_device(s, slot); + disable_device(s, bsel, slot); } pm_update_sci(s); diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 72606a8..43e4988 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -34,6 +34,7 @@ #include "hw/acpi/acpi.h" #include "hw/nvram/fw_cfg.h" #include "hw/i386/bios-linker-loader.h" +#include "hw/pci/pci_bus.h" #define ACPI_BUILD_APPNAME "Bochs" #define ACPI_BUILD_APPNAME6 "BOCHS " @@ -258,23 +259,358 @@ acpi_encode_len(uint8_t *ssdt_ptr, int length, int bytes) #define ACPI_PROC_SIZEOF (*ssdt_proc_end - *ssdt_proc_start) #define ACPI_PROC_AML (ssdp_proc_aml + *ssdt_proc_start) -/* 0x5B 0x82 DeviceOp PkgLength NameString */ -#define ACPI_PCIHP_OFFSET_HEX (*ssdt_pcihp_name - *ssdt_pcihp_start + 1) -#define ACPI_PCIHP_OFFSET_ID (*ssdt_pcihp_id - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_ADR (*ssdt_pcihp_adr - *ssdt_pcihp_start) -#define ACPI_PCIHP_OFFSET_EJ0 (*ssdt_pcihp_ej0 - *ssdt_pcihp_start) -#define ACPI_PCIHP_SIZEOF (*ssdt_pcihp_end - *ssdt_pcihp_start) -#define ACPI_PCIHP_AML (ssdp_pcihp_aml + *ssdt_pcihp_start) - #define ACPI_SSDT_SIGNATURE 0x54445353 /* SSDT */ #define ACPI_SSDT_HEADER_LENGTH 36 #include "hw/i386/ssdt-misc.hex" -#include "hw/i386/ssdt-pcihp.hex" + +static inline GArray *build_alloc_array(void) +{ + return g_array_new(false, true /* clear */, 1); +} + +static inline void build_free_array(GArray *array) +{ + g_array_free(array, true); +} + +static inline void build_prepend_byte(GArray *array, uint8_t val) +{ + g_array_prepend_val(array, val); +} + +static inline void build_append_byte(GArray *array, uint8_t val) +{ + g_array_append_val(array, val); +} + +static inline void build_prepend_array(GArray *array, GArray *val) +{ + g_array_prepend_vals(array, val->data, val->len); +} + +static inline void build_append_array(GArray *array, GArray *val) +{ + g_array_append_vals(array, val->data, val->len); +} + +static void build_append_nameseg(GArray *array, const char *format, ...) +{ + GString *s = g_string_new(""); + va_list args; + + va_start(args, format); + g_string_vprintf(s, format, args); + va_end(args); + + assert(s->len == 4); + g_array_append_vals(array, s->str, s->len); + g_string_free(s, true); +} + +static void build_prepend_nameseg(GArray *array, const char *format, ...) +{ + GString *s = g_string_new(""); + va_list args; + + va_start(args, format); + g_string_vprintf(s, format, args); + va_end(args); + + assert(s->len == 4); + g_array_prepend_vals(array, s->str, s->len); + g_string_free(s, true); +} + +static void build_prepend_devfn(GArray *name, uint8_t devfn) +{ + build_prepend_nameseg(name, "S%0.02x_", devfn); +} + +static void build_append_devfn(GArray *name, uint8_t devfn) +{ + GArray *array = build_alloc_array(); + build_prepend_devfn(array, devfn); + build_append_array(name, array); + build_free_array(array); +} + +static void build_prepend_name_string(GArray *name, PCIBus *bus) +{ + int segcount; + + for (;!pci_bus_is_root(bus); bus = bus->parent_dev->bus) { + build_prepend_devfn(name, bus->parent_dev->devfn); + } + g_array_prepend_vals(name, "PCI0", 4); + g_array_prepend_vals(name, "_SB_", 4); + /* Name segment length is a multiple of 4 */ + assert(name->len % 4 == 0); + segcount = name->len / 4; + build_prepend_byte(name, segcount); + build_prepend_byte(name, 0x2f); /* MultiNamePrefix */ + build_prepend_byte(name, '\\'); /* RootChar */ +} + +static void build_append_name_string(GArray *name, PCIBus *bus) +{ + GArray *append = build_alloc_array(); + build_prepend_name_string(append, bus); + build_append_array(name, append); + build_free_array(append); +} + +/* 5.4 Definition Block Encoding */ +enum { + PACKAGE_LENGTH_1BYTE_SHIFT = 6, /* Up to 63 - use extra 2 bits. */ + PACKAGE_LENGTH_2BYTE_SHIFT = 4, + PACKAGE_LENGTH_3BYTE_SHIFT = 12, + PACKAGE_LENGTH_4BYTE_SHIFT = 20, +}; + +static void build_prepend_package_length(GArray *package) +{ + unsigned bytes; + uint8_t byte; + unsigned package_len = package->len; + unsigned length = package_len; + + if (length + 1 < (1 << PACKAGE_LENGTH_1BYTE_SHIFT)) { + byte = length; + build_prepend_byte(package, byte); + return; + } + + if (length + 4 > (1 << PACKAGE_LENGTH_4BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_4BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_4BYTE_SHIFT) - 1; + } + if (length + 3 > (1 << PACKAGE_LENGTH_3BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_3BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_3BYTE_SHIFT) - 1; + } + if (length + 2 > (1 << PACKAGE_LENGTH_2BYTE_SHIFT)) { + byte = length >> PACKAGE_LENGTH_2BYTE_SHIFT; + build_prepend_byte(package, byte); + length &= (1 << PACKAGE_LENGTH_2BYTE_SHIFT) - 1; + } + + bytes = package->len - package_len; + byte = (bytes << PACKAGE_LENGTH_1BYTE_SHIFT) | length; + build_prepend_byte(package, byte); +} + +static void build_package(GArray *package, uint8_t op) +{ + build_prepend_package_length(package); + build_prepend_byte(package, op); +} + +static void build_extop_package(GArray *package, uint8_t op) +{ + build_package(package, op); + build_prepend_byte(package, 0x5B); /* ExtOpPrefix */ +} + +static void build_append_notify(GArray *method, GArray *target_name, + uint16_t value) +{ + GArray *notify = build_alloc_array(); + uint8_t op = 0xA0; /* IfOp */ + + build_append_byte(notify, 0x93); /* LEqualOp */ + build_append_byte(notify, 0x68); /* Arg0Op */ + build_append_byte(notify, 0x0A); /* BytePrefix */ + build_append_byte(notify, value); + build_append_byte(notify, value >> 8); + build_append_byte(notify, 0x86); /* NotifyOp */ + build_append_array(notify, target_name); + build_append_byte(notify, 0x69); /* Arg1Op */ + + /* Pack it up */ + build_package(notify, op); + + build_append_array(method, notify); + + build_free_array(notify); +} + +static void +build_append_notify_slots(GArray *method, PCIBus *bus, + unsigned bus_select, + DECLARE_BITMAP(hotplug_enable, PCI_SLOT_MAX)) +{ + int i; + + if (!bus) { + return; + } + + for (i = 0; i < 32; i++) { + GArray *target; + if (!test_bit(i, hotplug_enable)) { + continue; + } + target = build_alloc_array(); + build_prepend_devfn(target, i); + build_prepend_name_string(target, bus); + /* *32 matches Multiply in PCNF */ + build_append_notify(method, target, bus_select * 32 + i); + build_free_array(target); + } +} + +static void build_append_notify_buses(GArray *table, PcGuestInfo *guest_info) +{ + int i; + GArray *method = build_alloc_array(); + + uint8_t op = 0x14; /* MethodOp */ + + build_append_nameseg(method, "PCNT"); + build_append_byte(method, 0x02); /* MethodFlags: ArgCount */ + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { + build_append_notify_slots(method, guest_info->hotplug_buses[i], i, + guest_info->hotplug_enable[i]); + } + build_package(method, op); + + build_append_array(table, method); + build_free_array(method); +} + +static void build_append_device(GArray *table, unsigned devfn) +{ + uint8_t op, ej0_op; + + GArray *device = build_alloc_array(); + GArray *ej0 = build_alloc_array(); + + op = 0x82; /* DeviceOp */ + + build_append_devfn(device, devfn); + + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_SUN"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_ADR"); + build_append_byte(device, 0x0C); /* DWordPrefix */ + build_append_byte(device, PCI_FUNC(devfn)); + build_append_byte(device, 0x00); + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x00); + + ej0_op = 0x14; /* MethodOp */ + build_append_nameseg(ej0, "_EJ0"); + build_append_byte(ej0, 0x01); /* MethodFlags: ArgCount */ + build_append_byte(ej0, 0xA4); /* ReturnOp */ + build_append_nameseg(ej0, "PCEJ"); + build_append_nameseg(ej0, "_SUN"); + build_append_nameseg(ej0, "BNUM"); + build_package(ej0, ej0_op); + build_append_array(device, ej0); + + build_extop_package(device, op); + + build_append_array(table, device); + build_free_array(device); + build_free_array(ej0); +} + +static bool build_find_device_bus(PCIBus *buses[GUEST_INFO_MAX_HOTPLUG_BUS], + PCIBus *bus, unsigned devfn) +{ + int i; + for (i = 0; i < ARRAY_SIZE(buses); ++i) { + if (buses[i] && buses[i]->parent_dev && + buses[i]->parent_dev->bus == bus && + buses[i]->parent_dev->devfn == devfn) { + return true; + } + } + return false; +} + +static void +build_append_bus(GArray *table, int bnum, PcGuestInfo *guest_info) +{ + uint8_t op; + int slot; + unsigned devfn; + GArray *device; + PCIBus *bus = guest_info->hotplug_buses[bnum]; + + if (!bus) { + return; + } + + device = build_alloc_array(); + + /* Root device described in DSDT */ + if (bnum) { + devfn = bus->parent_dev->devfn; + + op = 0x82; /* DeviceOp */ + + build_append_name_string(device, bus); + + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_SUN"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "BNUM"); + build_append_byte(device, 0x0A); /* BytePrefix */ + build_append_byte(device, bnum); + build_append_byte(device, 0x08); /* NameOp */ + build_append_nameseg(device, "_ADR"); + build_append_byte(device, 0x0C); /* DWordPrefix */ + build_append_byte(device, PCI_FUNC(devfn)); + build_append_byte(device, 0x00); + build_append_byte(device, PCI_SLOT(devfn)); + build_append_byte(device, 0x00); + } else { + op = 0x10; /* ScopeOp */ + build_append_nameseg(device, "PCI0"); + } + + for (slot = 0; slot < PCI_SLOT_MAX; ++slot) { + if (!test_bit(slot, guest_info->hotplug_enable[bnum])) { + continue; + } + /* Bus device? We'll add it later */ + if (build_find_device_bus(guest_info->hotplug_buses, + bus, PCI_DEVFN(slot, 0))) { + continue; + } + build_append_device(device, PCI_DEVFN(slot, 0)); + } + + if (bnum) { + build_extop_package(device, op); + } else { + build_package(device, op); + } + + build_append_array(table, device); + build_free_array(device); +} + +static void build_append_device_buses(GArray *table, PcGuestInfo *guest_info) +{ + int i; + for (i = 0; i < ARRAY_SIZE(guest_info->hotplug_buses); ++i) { + build_append_bus(table, i, guest_info); + } +} static uint8_t* build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, - const char *target, int ofs) + const char *target, int ofs, int step) { int i; @@ -295,31 +631,14 @@ build_notify(uint8_t *ssdt_ptr, const char *name, int skip, int count, *(ssdt_ptr++) = i; *(ssdt_ptr++) = 0x86; /* NotifyOp */ memcpy(ssdt_ptr, target, 4); - ssdt_ptr[ofs] = acpi_get_hex(i >> 4); - ssdt_ptr[ofs + 1] = acpi_get_hex(i); + ssdt_ptr[ofs] = acpi_get_hex((i * step) >> 4); + ssdt_ptr[ofs + 1] = acpi_get_hex(i * step); ssdt_ptr += 4; *(ssdt_ptr++) = 0x69; /* Arg1Op */ } return ssdt_ptr; } -static void patch_pcihp(int slot, uint8_t *ssdt_ptr, uint32_t eject) -{ - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX] = acpi_get_hex(slot >> 4); - ssdt_ptr[ACPI_PCIHP_OFFSET_HEX+1] = acpi_get_hex(slot); - ssdt_ptr[ACPI_PCIHP_OFFSET_ID] = slot; - ssdt_ptr[ACPI_PCIHP_OFFSET_ADR + 2] = slot; - - /* Runtime patching of ACPI_EJ0: to disable hotplug for a slot, - * replace the method name: _EJ0 by ACPI_EJ0_. */ - /* Sanity check */ - assert (!memcmp(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "_EJ0", 4)); - - if (!eject) { - memcpy(ssdt_ptr + ACPI_PCIHP_OFFSET_EJ0, "EJ0_", 4); - } -} - static void build_ssdt(GArray *table_data, GArray *linker, FWCfgState *fw_cfg, PcGuestInfo *guest_info) @@ -330,11 +649,20 @@ build_ssdt(GArray *table_data, GArray *linker, + (acpi_cpus * ACPI_PROC_SIZEOF) /* procs */ + (1+2+5+(12*acpi_cpus)) /* NTFY */ + (6+2+1+(1*acpi_cpus)) /* CPON */ - + (1+3+4) /* Scope(PCI0) */ - + ((PCI_SLOT_MAX - 1) * ACPI_PCIHP_SIZEOF) /* slots */ - + (1+2+5+(12*(PCI_SLOT_MAX - 1)))); /* PCNT */ - uint8_t *ssdt = acpi_data_push(table_data, length); - uint8_t *ssdt_ptr = ssdt; + ); + uint8_t *ssdt; + uint8_t *ssdt_ptr; + GArray *devices = build_alloc_array(); + GArray *notify = build_alloc_array(); + + /* TODO: rewrite this function using glib */ + build_append_device_buses(devices, guest_info); + length += devices->len; + build_append_notify_buses(notify, guest_info); + length += notify->len; + + ssdt = acpi_data_push(table_data, length); + ssdt_ptr = ssdt; /* Copy header and encode fwcfg values in the S3_ / S4_ / S5_ packages */ memcpy(ssdt_ptr, ssdp_misc_aml, sizeof(ssdp_misc_aml)); @@ -389,7 +717,7 @@ build_ssdt(GArray *table_data, GArray *linker, /* build "Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...}" */ /* Arg0 = Processor ID = APIC ID */ - ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2); + ssdt_ptr = build_notify(ssdt_ptr, "NTFY", 0, acpi_cpus, "CP00", 2, 1); /* build "Name(CPON, Package() { One, One, ..., Zero, Zero, ... })" */ *(ssdt_ptr++) = 0x08; /* NameOp */ @@ -411,15 +739,10 @@ build_ssdt(GArray *table_data, GArray *linker, *(ssdt_ptr++) = 'I'; *(ssdt_ptr++) = '0'; - /* build Device object for each slot */ - for (i = 1; i < PCI_SLOT_MAX; i++) { - bool eject = test_bit(i, guest_info->slot_hotplug_enable); - memcpy(ssdt_ptr, ACPI_PCIHP_AML, ACPI_PCIHP_SIZEOF); - patch_pcihp(i, ssdt_ptr, eject); - ssdt_ptr += ACPI_PCIHP_SIZEOF; - } - - ssdt_ptr = build_notify(ssdt_ptr, "PCNT", 1, PCI_SLOT_MAX, "S00_", 1); + memcpy(ssdt_ptr, devices->data, devices->len); + ssdt_ptr += devices->len; + memcpy(ssdt_ptr, notify->data, notify->len); + ssdt_ptr += notify->len; build_header(linker, table_data, (void*)ssdt, ACPI_SSDT_SIGNATURE, ssdt_ptr - ssdt, 1); diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 2ab2572..c2e74e9 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -6,7 +6,7 @@ obj-$(CONFIG_XEN) += xen_domainbuild.o xen_machine_pv.o obj-y += kvmvapic.o obj-y += acpi-build.o obj-y += bios-linker-loader.o -hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-pcihp.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex +hw/i386/acpi-build.o: hw/i386/acpi-build.c hw/i386/acpi-dsdt.hex hw/i386/ssdt-proc.hex hw/i386/ssdt-misc.hex hw/i386/q35-acpi-dsdt.hex hw/i386/pc_piix.o: hw/i386/pc_piix.c hw/i386/acpi-dsdt.hex hw/i386/pc_q35.o: hw/i386/pc_q35.c hw/i386/q35-acpi-dsdt.hex diff --git a/hw/i386/ssdt-pcihp.hex.generated b/hw/i386/ssdt-pcihp.hex.generated deleted file mode 100644 index 0d32a27..0000000 --- a/hw/i386/ssdt-pcihp.hex.generated +++ /dev/null @@ -1,108 +0,0 @@ -static unsigned char ssdt_pcihp_name[] = { -0x33 -}; -static unsigned char ssdt_pcihp_adr[] = { -0x44 -}; -static unsigned char ssdt_pcihp_end[] = { -0x58 -}; -static unsigned char ssdp_pcihp_aml[] = { -0x53, -0x53, -0x44, -0x54, -0x58, -0x0, -0x0, -0x0, -0x1, -0x77, -0x42, -0x58, -0x50, -0x43, -0x0, -0x0, -0x42, -0x58, -0x53, -0x53, -0x44, -0x54, -0x50, -0x43, -0x1, -0x0, -0x0, -0x0, -0x49, -0x4e, -0x54, -0x4c, -0x28, -0x5, -0x10, -0x20, -0x10, -0x33, -0x5c, -0x2e, -0x5f, -0x53, -0x42, -0x5f, -0x50, -0x43, -0x49, -0x30, -0x5b, -0x82, -0x26, -0x53, -0x41, -0x41, -0x5f, -0x8, -0x5f, -0x53, -0x55, -0x4e, -0xa, -0xaa, -0x8, -0x5f, -0x41, -0x44, -0x52, -0xc, -0x0, -0x0, -0xaa, -0x0, -0x14, -0xf, -0x5f, -0x45, -0x4a, -0x30, -0x1, -0xa4, -0x50, -0x43, -0x45, -0x4a, -0x5f, -0x53, -0x55, -0x4e -}; -static unsigned char ssdt_pcihp_start[] = { -0x30 -}; -static unsigned char ssdt_pcihp_id[] = { -0x3d -}; -static unsigned char ssdt_pcihp_ej0[] = { -0x4a -};
This adds support for device hotplug behind pci bridges. Bridge devices themselves need to be pre-configured on qemu command line. One of the goals of this project was to demonstrate the advantages of generating ACPI tables in QEMU. In my opinion, it does this successfully. * This saves the need to provide a new interface for firmware to discover bus number to pci brige mapping * No need for yet another interface to bios to detect qemu version to check if it's safe to activate new code, and to ship multiple table versions: we generated code so we know this is new qemu * Use of glib's GArray makes it much easier to build up tables in code without need for iasl and code patching I have also tried to address the concerns that Anthony has raised with this approach, please see below. Design: - each bus gets assigned a number 0-255 - generated ACPI code writes this number to a new BSEL register, then uses existing UP/DOWN registers to probe slot status; to eject, write number to BSEL register, then slot into existing EJ This is to address the ACPI spec requirement to avoid config cycle access to any bus except PCI roots. Portability: - Non x86 (or any Linux) platforms don't need any of this code. They can keep happily using SHPC the way they always did. Things to note: - this is on top of acpi patchset posted a month ago, with a small patch adding a core function to walk all pci buses, on top. Can also be found in my git tree git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git acpi - Extensive use of glib completely removes pointer math in new code: we use g_array_append_vals exclusively. There's no code patching in new code. This is in response to comments about security concerns with adding code to qemu. In this sense it is more secure than existing code in hw/acpi/core.c - that can be switched to glib if desired. - New code (not imported from seabios) uses glib to reduce dependency on iasl. With time all code can be rewritted to use glib and avoid iasl, if desired. - As was the case previously, systems that lack working iasl are detected at configure time, pre-generated hex files in source tree are used in this case. This addresses the concern with iasl/big-endian systems. - Compile tested only. Migration is known to be broken - there are a bunch of TODO tags in code. It was agreed on the last qemu conf meeting that this code is posted without looking at migration, with understanding that it is addressed before being merged. Please do not mistake this limitation for a fundamental one - I have a very good idea how to fix it, including cross-version migration. - Cross version migration: when running with -M 1.5 and older, all ACPI table generation should be disabled. We'll present FW_CFG interface compatible with 1.5. Cc: Jordan Justen <jljusten@gmail.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Cc: Laszlo Ersek <lersek@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: seabios@seabios.org Cc: David Woodhouse <dwmw2@infradead.org> Cc: Kevin O'Connor <kevin@koconnor.net> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- docs/specs/acpi_pci_hotplug.txt | 8 + include/hw/i386/pc.h | 4 +- hw/i386/acpi-dsdt.dsl | 36 +++- hw/i386/ssdt-pcihp.dsl | 51 ----- hw/acpi/piix4.c | 145 ++++++++++---- hw/i386/acpi-build.c | 411 ++++++++++++++++++++++++++++++++++----- hw/i386/Makefile.objs | 2 +- hw/i386/ssdt-pcihp.hex.generated | 108 ---------- 8 files changed, 510 insertions(+), 255 deletions(-) delete mode 100644 hw/i386/ssdt-pcihp.dsl delete mode 100644 hw/i386/ssdt-pcihp.hex.generated