From patchwork Mon Apr 1 19:57:49 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 232800 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 4318C2C011E for ; Tue, 2 Apr 2013 06:58:35 +1100 (EST) Received: from localhost ([::1]:51368 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMksD-0006Sp-FE for incoming@patchwork.ozlabs.org; Mon, 01 Apr 2013 15:58:33 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36715) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMkrc-0006IR-S4 for qemu-devel@nongnu.org; Mon, 01 Apr 2013 15:58:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UMkrX-0002x4-NW for qemu-devel@nongnu.org; Mon, 01 Apr 2013 15:57:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:14086) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UMkrX-0002wj-FZ for qemu-devel@nongnu.org; Mon, 01 Apr 2013 15:57:51 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r31Jvoxv028245 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 1 Apr 2013 15:57:50 -0400 Received: from bling.home (ovpn-113-45.phx2.redhat.com [10.3.113.45]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r31Jvo6a016935; Mon, 1 Apr 2013 15:57:50 -0400 To: aliguori@us.ibm.com, qemu-devel@nongnu.org From: Alex Williamson Date: Mon, 01 Apr 2013 13:57:49 -0600 Message-ID: <20130401195749.17115.51335.stgit@bling.home> In-Reply-To: <20130401195242.17115.51929.stgit@bling.home> References: <20130401195242.17115.51929.stgit@bling.home> User-Agent: StGit/0.16 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: kvm@vger.kernel.org Subject: [Qemu-devel] [PATCH 3/9] vfio-pci: Add PCIe capability mangling based on bus type X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Windows seems to pay particular interest to the PCIe header type of devices and will fail to load drivers if we attach Endpoint devices or Legacy Endpoint devices to the Root Complex. We can use pci_bus_is_express and pci_bus_is_root to determine the bus type and mangle the type appropriately: * Legacy PCI * No change, capability is unmodified for compatibility. * PCI Express * Integrated Root Complex Endpoint -> Endpoint * PCI Express Root Complex * Endpoint -> Integrated Root Complex Endpoint * Legacy Endpoint -> none, capability hidden We also take this opportunity to explicitly limit supported devices to Endpoints, Legacy Endpoints, and Root Complex Integrated Endpoints. We don't currently have support for other types and users often cause themselves problems by assigning them. Signed-off-by: Alex Williamson --- hw/vfio_pci.c | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 128 insertions(+), 1 deletion(-) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index a3bae7b..0f74dbb 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1506,6 +1506,124 @@ static uint8_t vfio_std_cap_max_size(PCIDevice *pdev, uint8_t pos) return next - pos; } +static void vfio_set_word_bits(uint8_t *buf, uint16_t val, uint16_t mask) +{ + pci_set_word(buf, (pci_get_word(buf) & ~mask) | val); +} + +static void vfio_add_emulated_word(VFIODevice *vdev, int pos, + uint16_t val, uint16_t mask) +{ + vfio_set_word_bits(vdev->pdev.config + pos, val, mask); + vfio_set_word_bits(vdev->pdev.wmask + pos, ~mask, mask); + vfio_set_word_bits(vdev->emulated_config_bits + pos, mask, mask); +} + +static void vfio_set_long_bits(uint8_t *buf, uint32_t val, uint32_t mask) +{ + pci_set_long(buf, (pci_get_long(buf) & ~mask) | val); +} + +static void vfio_add_emulated_long(VFIODevice *vdev, int pos, + uint32_t val, uint32_t mask) +{ + vfio_set_long_bits(vdev->pdev.config + pos, val, mask); + vfio_set_long_bits(vdev->pdev.wmask + pos, ~mask, mask); + vfio_set_long_bits(vdev->emulated_config_bits + pos, mask, mask); +} + +static int vfio_setup_pcie_cap(VFIODevice *vdev, int pos, uint8_t size) +{ + uint16_t flags; + uint8_t type; + + flags = pci_get_word(vdev->pdev.config + pos + PCI_CAP_FLAGS); + type = (flags & PCI_EXP_FLAGS_TYPE) >> 4; + + if (type != PCI_EXP_TYPE_ENDPOINT && + type != PCI_EXP_TYPE_LEG_END && + type != PCI_EXP_TYPE_RC_END) { + + error_report("vfio: Assignment of PCIe type 0x%x " + "devices is not currently supported", type); + return -EINVAL; + } + + if (!pci_bus_is_express(vdev->pdev.bus)) { + /* + * Use express capability as-is on PCI bus. It doesn't make much + * sense to even expose, but some drivers (ex. tg3) depend on it + * and guests don't seem to be particular about it. We'll need + * to revist this or force express devices to express buses if we + * ever expose an IOMMU to the guest. + */ + } else if (pci_bus_is_root(vdev->pdev.bus)) { + /* + * On a Root Complex bus Endpoints become Root Complex Integrated + * Endpoints, which changes the type and clears the LNK & LNK2 fields. + */ + if (type == PCI_EXP_TYPE_ENDPOINT) { + vfio_add_emulated_word(vdev, pos + PCI_CAP_FLAGS, + PCI_EXP_TYPE_RC_END << 4, + PCI_EXP_FLAGS_TYPE); + + /* Link Capabilities, Status, and Control goes away */ + if (size > PCI_EXP_LNKCTL) { + vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP, 0, ~0); + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0); + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKSTA, 0, ~0); + +#ifndef PCI_EXP_LNKCAP2 +#define PCI_EXP_LNKCAP2 44 +#endif +#ifndef PCI_EXP_LNKSTA2 +#define PCI_EXP_LNKSTA2 50 +#endif + /* Link 2 Capabilities, Status, and Control goes away */ + if (size > PCI_EXP_LNKCAP2) { + vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP2, 0, ~0); + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL2, 0, ~0); + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKSTA2, 0, ~0); + } + } + + } else if (type == PCI_EXP_TYPE_LEG_END) { + /* + * Legacy endpoints don't belong on the root complex. Windows + * seems to be happier with devices if we skip the capability. + */ + return 0; + } + + } else { + /* + * Convert Root Complex Integrated Endpoints to regular endpoints. + * These devices don't support LNK/LNK2 capabilities, so make them up. + */ + if (type == PCI_EXP_TYPE_RC_END) { + vfio_add_emulated_word(vdev, pos + PCI_CAP_FLAGS, + PCI_EXP_TYPE_ENDPOINT << 4, + PCI_EXP_FLAGS_TYPE); + vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP, + PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25, ~0); + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0); + } + + /* Mark the Link Status bits as emulated to allow virtual negotiation */ + vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKSTA, + pci_get_word(vdev->pdev.config + pos + + PCI_EXP_LNKSTA), + PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS); + } + + pos = pci_add_capability(&vdev->pdev, PCI_CAP_ID_EXP, pos, size); + if (pos >= 0) { + vdev->pdev.exp.exp_cap = pos; + } + + return pos; +} + static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos) { PCIDevice *pdev = &vdev->pdev; @@ -1536,13 +1654,22 @@ static int vfio_add_std_cap(VFIODevice *vdev, uint8_t pos) return ret; } } else { - pdev->config[PCI_CAPABILITY_LIST] = 0; /* Begin the rebuild */ + /* Begin the rebuild, use QEMU emulated list bits */ + pdev->config[PCI_CAPABILITY_LIST] = 0; + vdev->emulated_config_bits[PCI_CAPABILITY_LIST] = 0xff; + vdev->emulated_config_bits[PCI_STATUS] |= PCI_STATUS_CAP_LIST; } + /* Use emulated next pointer to allow dropping caps */ + pci_set_byte(vdev->emulated_config_bits + pos + 1, 0xff); + switch (cap_id) { case PCI_CAP_ID_MSI: ret = vfio_setup_msi(vdev, pos); break; + case PCI_CAP_ID_EXP: + ret = vfio_setup_pcie_cap(vdev, pos, size); + break; case PCI_CAP_ID_MSIX: ret = vfio_setup_msix(vdev, pos); break;