From patchwork Sat Dec 31 09:13:07 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cao jin X-Patchwork-Id: 709912 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3trHc65Qb2z9t80 for ; Sat, 31 Dec 2016 20:10:12 +1100 (AEDT) Received: from localhost ([::1]:43037 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNFg1-00074V-JU for incoming@patchwork.ozlabs.org; Sat, 31 Dec 2016 04:10:09 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41217) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNFeT-0005we-2J for qemu-devel@nongnu.org; Sat, 31 Dec 2016 04:08:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cNFeP-00072X-A8 for qemu-devel@nongnu.org; Sat, 31 Dec 2016 04:08:32 -0500 Received: from [59.151.112.132] (port=15022 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cNFeO-0006yx-Tg for qemu-devel@nongnu.org; Sat, 31 Dec 2016 04:08:29 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="14388087" Received: from unknown (HELO cn.fujitsu.com) ([10.167.33.5]) by heian.cn.fujitsu.com with ESMTP; 31 Dec 2016 17:08:22 +0800 Received: from G08CNEXCHPEKD02.g08.fujitsu.local (unknown [10.167.33.83]) by cn.fujitsu.com (Postfix) with ESMTP id 4F222477B1FB; Sat, 31 Dec 2016 17:08:20 +0800 (CST) Received: from G08FNSTD140223.g08.fujitsu.local (10.167.226.69) by G08CNEXCHPEKD02.g08.fujitsu.local (10.167.33.89) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sat, 31 Dec 2016 17:08:21 +0800 From: Cao jin To: Date: Sat, 31 Dec 2016 17:13:07 +0800 Message-ID: <1483175588-17006-4-git-send-email-caoj.fnst@cn.fujitsu.com> X-Mailer: git-send-email 2.1.0 In-Reply-To: <1483175588-17006-1-git-send-email-caoj.fnst@cn.fujitsu.com> References: <1483175588-17006-1-git-send-email-caoj.fnst@cn.fujitsu.com> MIME-Version: 1.0 X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: 4F222477B1FB.AA2E9 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 59.151.112.132 Subject: [Qemu-devel] [PATCH RFC v11 3/4] vfio-pci: pass the aer error to guest X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chen Fan , izumi.taku@jp.fujitsu.com, alex.williamson@redhat.com, Dou Liyang , mst@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" From: Chen Fan When physical device has uncorrectable error hanppened, the vfio_pci driver will signal the uncorrectable error status register value to corresponding QEMU's vfio-pci device via the eventfd registered by this device, then, the vfio-pci's error eventfd handler will be invoked in event loop. Construct and pass the aer message to root port, root port will trigger an interrupt to signal guest, then, the guest driver will do the recovery. Note: Now only support non-fatal error's recovery, fatal error will still result in vm stop. Signed-off-by: Chen Fan Signed-off-by: Dou Liyang Signed-off-by: Cao jin --- hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 8 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 76a8ac3..9861f72 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2470,21 +2470,55 @@ static void vfio_put_device(VFIOPCIDevice *vdev) static void vfio_err_notifier_handler(void *opaque) { VFIOPCIDevice *vdev = opaque; + PCIDevice *dev = &vdev->pdev; + PCIEAERMsg msg = { + .severity = 0, + .source_id = (pci_bus_num(dev->bus) << 8) | dev->devfn, + }; + int len; + uint64_t uncor_status; + + /* Read uncorrectable error status from driver */ + len = read(vdev->err_notifier.rfd, &uncor_status, sizeof(uncor_status)); + if (len != sizeof(uncor_status)) { + error_report("vfio-pci: uncor error status reading returns" + " invalid number of bytes: %d", len); + return; //Or goto stop? + } + + if (!(vdev->features & VFIO_FEATURE_ENABLE_AER)) { + goto stop; + } + + /* Populate the aer msg and send it to root port */ + if (dev->exp.aer_cap) { + uint8_t *aer_cap = dev->config + dev->exp.aer_cap; + bool isfatal = uncor_status & + pci_get_long(aer_cap + PCI_ERR_UNCOR_SEVER); + + if (isfatal) { + goto stop; + } + + msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN : + PCI_ERR_ROOT_CMD_NONFATAL_EN; - if (!event_notifier_test_and_clear(&vdev->err_notifier)) { + error_report("vfio-pci device %d sending AER to root port. uncor" + " status = 0x%"PRIx64, dev->devfn, uncor_status); + pcie_aer_msg(dev, &msg); return; } +stop: /* - * TBD. Retrieve the error details and decide what action - * needs to be taken. One of the actions could be to pass - * the error to the guest and have the guest driver recover - * from the error. This requires that PCIe capabilities be - * exposed to the guest. For now, we just terminate the - * guest to contain the error. + * Terminate the guest in case of + * 1. AER capability is not exposed to guest. + * 2. AER capability is exposed, but error is fatal, only non-fatal + * error is handled now. */ - error_report("%s(%s) Unrecoverable error detected. Please collect any data possible and then kill the guest", __func__, vdev->vbasedev.name); + error_report("%s(%s) fatal error detected. Please collect any data" + " possible and then kill the guest", __func__, vdev->vbasedev.name); vm_stop(RUN_STATE_INTERNAL_ERROR); }