From patchwork Thu Jul 19 20:02:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Tai X-Patchwork-Id: 946587 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-pci-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=oracle.com header.i=@oracle.com header.b="bdCdmF5T"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41WlLD0J2Cz9s4c for ; Fri, 20 Jul 2018 06:02:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727499AbeGSUrV (ORCPT ); Thu, 19 Jul 2018 16:47:21 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:38564 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727508AbeGSUrV (ORCPT ); Thu, 19 Jul 2018 16:47:21 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6JJwxNn142490; Thu, 19 Jul 2018 20:02:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=rZEz41y9jNBlaD7hSsPD/05n5pthtO041EhJrJL3oV4=; b=bdCdmF5Tj8vKH3bd4AMGiHM3TsYYRJHBeJCZzDpV7GYebk+wvwe/RZMf+hX2j9E5LxRW /+jTExZZ3Qu6xgAzp+HHMSUPBrVBjLeYQoraJpzw3awfHiImaZCp6iuLEK+KfFzoWklo bu1QnB0ClYQm5oXsWkg/+6q2EFXEX0SgXo3Z8JmW/87MWJCtY3j6mfJV2QhjabqiNf9a 1fGqD/lb1ED9js7DOq6alR+ZdOmViRRyuP+Cl6zLE/BtGR4UFnFJnZXcYmxn2MUFg6cb iwKeKNEI51WSbkNWqYTJmCXBRuz4d/Aq5BViYEpEroFHyyLXMdymOOsIsuaZ7fIA3eUy 5Q== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2130.oracle.com with ESMTP id 2k7a3tc48n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 19 Jul 2018 20:02:37 +0000 Received: from brm-x54-01.us.oracle.com (brm-x54-01.us.oracle.com [10.80.150.34]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6JK2ZgL025513; Thu, 19 Jul 2018 20:02:36 GMT From: Thomas Tai To: thomas.tai@oracle.com, bhelgaas@google.com, keith.busch@intel.com Cc: linux-pci@vger.kernel.org, poza@codeaurora.org Subject: [PATCH V3, 1/1] PCI/AER: fix use-after-free in pcie_do_fatal_recovery Date: Thu, 19 Jul 2018 14:02:35 -0600 Message-Id: <1532030555-7177-2-git-send-email-thomas.tai@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1532030555-7177-1-git-send-email-thomas.tai@oracle.com> References: <1532030555-7177-1-git-send-email-thomas.tai@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8959 signatures=668706 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1806210000 definitions=main-1807190209 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org When an fatal error is recevied by a non-bridge device, the device is removed from the pci bus and the device structure is freed by pci_stop_and_remove_bus_device(). The freed device structure is used in the subsequence pci_info() to printout the message. It causes a corrupt printout. If slub_debug=FZP is used, it will cause following protection fault after a fatal error is received. general protection fault: 0000 [#1] SMP PTI CPU: 104 PID: 1077 Comm: kworker/104:1 Not tainted 4.18.0-rc1ttai #5 Hardware name: Oracle Corporation ORACLE SERVER X5-4/ASSY,MB WITH TRAY, BIOS 36030500 11/16/2016 Workqueue: events aer_isr RIP: 0010:__dev_printk+0x2e/0x90 Code: 00 55 49 89 d1 48 89 e5 53 48 89 fb 48 83 ec 18 48 85 f6 74 5f 4c 8b 46 50 4d 85 c0 74 2b 48 8b 86 88 00 00 00 48 85 c0 74 25 <48> 8b 08 0f be 7b 01 48 c7 c2 83 d4 71 99 31 c0 83 ef 30 e8 4a ff RSP: 0018:ffffb6b88fa57cf8 EFLAGS: 00010202 RAX: 6b6b6b6b6b6b6b6b RBX: ffffffff996ba720 RCX: 0000000000000000 RDX: ffffb6b88fa57d28 RSI: ffff8c4d7af94128 RDI: ffffffff996ba720 RBP: ffffb6b88fa57d18 R08: 6b6b6b6b6b6b6b6b R09: ffffb6b88fa57d28 R10: ffffffff99baca80 R11: 0000000000000000 R12: ffff8c4d7ae95990 R13: ffff8c2d7a840008 R14: ffff8c4d7af94088 R15: ffff8c4d7af90008 FS: 0000000000000000(0000) GS:ffff8c2d7fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22c0839000 CR3: 000000136bc0a001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? pci_bus_add_device+0x4f/0xa0 _dev_info+0x6c/0x90 pcie_do_fatal_recovery+0x1d5/0x230 aer_isr+0x3e5/0x950 ? add_timer_on+0xcc/0x160 process_one_work+0x168/0x370 worker_thread+0x4f/0x3d0 kthread+0x105/0x140 ? max_active_store+0x80/0x80 ? kthread_bind+0x20/0x20 ret_from_fork+0x35/0x40 To fix this issue, pci_dev_get is used to keep the device around. After all error devices are processed, pci_dev_put is then called to decrement the reference count for all error devices. Signed-off-by: Thomas Tai --- drivers/pci/pcie/aer.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index a2e8838..6e5e6a5 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -657,6 +657,10 @@ void cper_print_aer(struct pci_dev *dev, int aer_severity, static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) { if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { + /* increment reference count to keep the dev + * around until remove_source_device() + */ + pci_dev_get(dev); e_info->dev[e_info->error_dev_num] = dev; e_info->error_dev_num++; return 0; @@ -665,6 +669,21 @@ static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) } /** + * remove_source_device -remove error devices from the e_info + * @e_info: pointer to error info + */ +static void remove_source_device(struct aer_err_info *e_info) +{ + struct pci_dev *dev; + + while (e_info->error_dev_num > 0) { + e_info->error_dev_num--; + dev = e_info->dev[e_info->error_dev_num]; + pci_dev_put(dev); + } +} + +/** * is_error_source - check whether the device is source of reported error * @dev: pointer to pci_dev to be checked * @e_info: pointer to reported error info @@ -976,8 +995,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc, e_info->multi_error_valid = 0; aer_print_port_info(pdev, e_info); - if (find_source_device(pdev, e_info)) + if (find_source_device(pdev, e_info)) { aer_process_err_devices(e_info); + remove_source_device(e_info); + } } if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { @@ -995,8 +1016,10 @@ static void aer_isr_one_error(struct aer_rpc *rpc, aer_print_port_info(pdev, e_info); - if (find_source_device(pdev, e_info)) + if (find_source_device(pdev, e_info)) { aer_process_err_devices(e_info); + remove_source_device(e_info); + } } }