From patchwork Wed Feb 22 03:08:51 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 730890 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vSj883R4xz9s7h for ; Wed, 22 Feb 2017 14:11:52 +1100 (AEDT) Received: from localhost ([::1]:49564 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgNLJ-0002Oh-Rg for incoming@patchwork.ozlabs.org; Tue, 21 Feb 2017 22:11:49 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cgNIa-0000N2-32 for qemu-devel@nongnu.org; Tue, 21 Feb 2017 22:09:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cgNIW-0001FA-RQ for qemu-devel@nongnu.org; Tue, 21 Feb 2017 22:09:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34928) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cgNIW-0001F6-JY for qemu-devel@nongnu.org; Tue, 21 Feb 2017 22:08:56 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24C9C80F94; Wed, 22 Feb 2017 03:08:56 +0000 (UTC) Received: from pxdev.xzpeter.org (ovpn-8-39.pek2.redhat.com [10.72.8.39]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v1M38qjG007566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 21 Feb 2017 22:08:53 -0500 Date: Wed, 22 Feb 2017 11:08:51 +0800 From: Peter Xu To: Alex Williamson Message-ID: <20170222030851.GA17314@pxdev.xzpeter.org> References: <20170221214228.12515.79751.stgit@gimli.home> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170221214228.12515.79751.stgit@gimli.home> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Wed, 22 Feb 2017 03:08:56 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Jintack Lim , qemu-devel@nongnu.org, "Michael S. Tsirkin" Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" [cc Jintack] On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote: > Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is > reserved") removes the internal use of extended capability ID 0, the > comment here becomes invalid. However, peeling back the onion, the > code is still correct and we still can't seed the capability chain > with ID 0, unless we want to muck with using the version number to > force the header to be non-zero, which is much uglier to deal with. > The comment also now covers some of the subtleties of using cap ID 0, > such as transparently indicating absence of capabilities if none are > added. This doesn't detract from the correctness of the referenced > commit as vfio in the kernel also uses capability ID zero to mask > capabilties. In fact, we should skip zero capabilities precisely > because the kernel might also expose such a capability at the head > position and re-introduce the problem. > > Signed-off-by: Alex Williamson > Cc: Peter Xu > Cc: Michael S. Tsirkin > --- > hw/vfio/pci.c | 31 +++++++++++++++++++++---------- > 1 file changed, 21 insertions(+), 10 deletions(-) > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index f2ba9b6cfafc..03a3d0154976 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > /* > * Extended capabilities are chained with each pointing to the next, so we > * can drop anything other than the head of the chain simply by modifying > - * the previous next pointer. For the head of the chain, we can modify the > - * capability ID to something that cannot match a valid capability. ID > - * 0 is reserved for this since absence of capabilities is indicated by > - * 0 for the ID, version, AND next pointer. However, pcie_add_capability() > - * uses ID 0 as reserved for list management and will incorrectly match and > - * assert if we attempt to pre-load the head of the chain with this ID. > - * Use ID 0xFFFF temporarily since it is also seems to be reserved in > - * part for identifying absence of capabilities in a root complex register > - * block. If the ID still exists after adding capabilities, switch back to > - * zero. We'll mark this entire first dword as emulated for this purpose. > + * the previous next pointer. Seed the head of the chain here such that > + * we can simply skip any capabilities we want to drop below, regardless > + * of their position in the chain. If this stub capability still exists > + * after we add the capabilities we want to expose, update the capability > + * ID to zero. Note that we cannot seed with the capability header being > + * zero as this conflicts with definition of an absent capability chain > + * and prevents capabilities beyond the head of the list from being added. > + * By replacing the dummy capability ID with zero after walking the device > + * chain, we also transparently mark extended capabilities as absent if > + * no capabilities were added. Note that the PCIe spec defines an absence > + * of extended capabilities to be determined by a value of zero for the > + * capability ID, version, AND next pointer. A non-zero next pointer > + * should be sufficient to indicate additional capabilities are present, > + * which will occur if we call pcie_add_capability() below. The entire > + * first dword is emulated to support this. > + * > + * NB. The kernel side does similar masking, so be prepared that our > + * view of the device may also contain a capability ID zero in the head > + * of the chain. Skip it for the same reason that we cannot seed the > + * chain with a zero capability. > */ > pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE, > PCI_EXT_CAP(0xFFFF, 0, 0)); > @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) > PCI_EXT_CAP_NEXT_MASK); > > switch (cap_id) { > + case 0: /* kernel masked capability */ > case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */ > case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */ > trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next); > Reviewed-by: Peter Xu Since this bug is originally reported by Jintack, maybe we can also add: Reported-by: Jintack Lim Jintack, if you want to test it and provide your tested-by, it would be nice as well. ;) Actually I just found that the bug still exist after Michael's fix (I thought it was fixed). So we definitely need this patch or equivalent. However, I would still slightly prefer removing the wrapping hack since after all we need to touch it (and I do feel like that's error prone...). So, Alex, do you like below one instead? --------8<---------- ------->8-------- The new patch will keep all the dropped ecaps (so we may see more than one cap_id=0x0000 field), which I don't know whether would be a drawback. OTOH, it provides benefits like: - we can remove the wrapping hack, so the code is much readable and less error prone imho - we can avoid using the assumption that 0xffff cap_id is reserved I can live with both patches though. :-) Thanks! -- peterx Tested-by: Peter Xu Tested-by: Jintack Lim diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 332f41d..6942c1d 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) */ config = g_memdup(pdev->config, vdev->config_size); - /* - * Extended capabilities are chained with each pointing to the next, so we - * can drop anything other than the head of the chain simply by modifying - * the previous next pointer. For the head of the chain, we can modify the - * capability ID to something that cannot match a valid capability. ID - * 0 is reserved for this since absence of capabilities is indicated by - * 0 for the ID, version, AND next pointer. However, pcie_add_capability() - * uses ID 0 as reserved for list management and will incorrectly match and - * assert if we attempt to pre-load the head of the chain with this ID. - * Use ID 0xFFFF temporarily since it is also seems to be reserved in - * part for identifying absence of capabilities in a root complex register - * block. If the ID still exists after adding capabilities, switch back to - * zero. We'll mark this entire first dword as emulated for this purpose. - */ - pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE, - PCI_EXT_CAP(0xFFFF, 0, 0)); - pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0); - pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0); - for (next = PCI_CONFIG_SPACE_SIZE; next; next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) { header = pci_get_long(config + next); @@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) */ size = vfio_ext_cap_max_size(config, next); - /* Use emulated next pointer to allow dropping extended caps */ - pci_long_test_and_set_mask(vdev->emulated_config_bits + next, - PCI_EXT_CAP_NEXT_MASK); + /* Use emulated header to allow dropping extended caps */ + pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL); switch (cap_id) { case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */ case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */ + case PCI_EXT_CAP_ID_VC: + /* + * For dropped capabilities, we keep their slot but + * replace them with a header containing cap_id=0 && + * cap_ver=1. We do this reservation mostly to make sure + * the head ecap (at offset 0x100) will always be there. + * Anyway it won't hurt if we keep the rest of the dropped + * ones as well. + * + * Here we use non-zero cap_ver because we want to "mark" + * this ecap as "available" - from PCIe spec (chap 7.9.1), + * it marked out that cap_id=cap_ver=next=0 means empty + * ecap, and we really don't want it to be an "empty" slot + * especially for the head ecap (we need a head, always!). + */ + pcie_add_capability(pdev, 0, 1, next, size); trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next); break; default: pcie_add_capability(pdev, cap_id, cap_ver, next, size); } - - } - - /* Cleanup chain head ID if necessary */ - if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) { - pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0); } g_free(config);