From patchwork Tue Mar 12 06:57:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oliver O'Halloran X-Patchwork-Id: 1055090 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44JQln5wFWz9s6w for ; Tue, 12 Mar 2019 17:57:57 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="sh4otxzr"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44JQlm5PqVzDqFG for ; Tue, 12 Mar 2019 17:57:56 +1100 (AEDT) X-Original-To: skiboot@lists.ozlabs.org Delivered-To: skiboot@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:4864:20::542; helo=mail-pg1-x542.google.com; envelope-from=oohall@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="sh4otxzr"; dkim-atps=neutral Received: from mail-pg1-x542.google.com (mail-pg1-x542.google.com [IPv6:2607:f8b0:4864:20::542]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44JQlY5s1nzDqDd for ; Tue, 12 Mar 2019 17:57:45 +1100 (AEDT) Received: by mail-pg1-x542.google.com with SMTP id k11so1090323pgb.8 for ; Mon, 11 Mar 2019 23:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=f12W9oJRz/TwKBPNF1cQgEzm9k+neY5LntBh9g0TpOQ=; b=sh4otxzrrJkowo2RnYq19RRvYia5EQm4Vedry44rP/r93W+fzs9XFV290kEc6LWO0E iJLvMROkP++LswzCE7v9DxnN0OR7k7gO+WbKhVhGgB6yKhovtJjry1JkxItR6efwwPBl Oy/A5x33ihX681BY6V80RhT22Sxw8bATXQGSU7YrciQ9q0tZJJStfYhVUNGcL+gK8aYi r4JpzygmHGLoc5PMhRoAJFRMbea2UAgltAk4OPy9HdtVQUqgtzbCBZpRIOLJlh6dciUO pzGNgvwPxRmnqCLKGgdN5pbdcmWbEBCsbDYKpu/Xp/jMKcO+q9gLAgzYZLLh16lrwHWL 8NQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=f12W9oJRz/TwKBPNF1cQgEzm9k+neY5LntBh9g0TpOQ=; b=LXxZYbYpDq7c/qZXbwuZqGTN2CyKsi1HhbIGr0p7514xqfUUAAWLdQ1HZR1XRQoLOy Iv7daMHAYOJOYcFpyV3jC/v21oJw5C4mk/JbK22W6WX1TPhixeN/FzKzXeELL/wY9DVD UMndwuA+ipCENj1g5Dt6PBgSLLNbuRRKQIDzMF2M2rzU38CEtI9MEjKyg0Wx4I2tySs8 ebxBYR7PQ+bYHGAs3mjTUauQocfjD1Tw3Q2TpomsglUYFV29+gl7yEpyxjz7ejXO0M8W MXJ9iw3Ar2wP+ykdykdG7Y29lnUk9zaFdzGvR5nWgNkqrPHaDPdJs9QzYNuaVAUEXJdN Y9ww== X-Gm-Message-State: APjAAAXUlb93LF/zoRzGHOEVNoyL9ScQeDItrqmSecBWV1OhEa8POVUH ZYyKwUvNxL1Lk3x8y7ytfNvLPd6f X-Google-Smtp-Source: APXvYqxaximFozzCgSeYM+UdZZPa78WUKw2eHeTdLnBAxERMvUOUGgQT2N6j3wFoRuiJru7MgH7k8w== X-Received: by 2002:a63:4650:: with SMTP id v16mr5829564pgk.329.1552373862063; Mon, 11 Mar 2019 23:57:42 -0700 (PDT) Received: from wafer.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id u26sm10698129pfn.166.2019.03.11.23.57.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 11 Mar 2019 23:57:41 -0700 (PDT) From: Oliver O'Halloran To: skiboot@lists.ozlabs.org Date: Tue, 12 Mar 2019 17:57:24 +1100 Message-Id: <20190312065724.28583-1-oohall@gmail.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Skiboot] [RFC PATCH] pci: Change bus number assignment policy X-BeenThere: skiboot@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list for skiboot development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: skiboot-bounces+incoming=patchwork.ozlabs.org@lists.ozlabs.org Sender: "Skiboot" Change the way we assign bus numbers so that rather than assigning bus numbers densely, we spread the available bus numbers across the downstream bridges of a given topology. For PCIe topologies with a wide-fan out this allows hot-plugging large numbers of devices into a downstream port near the top of the PCIe topology. This patch changes the bus number assignment policy to evenly split the available bus numbers across the bridges on bus. This allows us to support hot-adding of large numbers of devices near the root of the topology. Currently we only assign a handful of spare buses to each hotplug port so if a large number of devices are added at the root we won't have enough spare bus numbers for them. e.g. On a zaius system with a two-switch (HBA switch and drawer switch) topology: Before: -+-[0030:00]---00.0-[01-40]----00.0-[02-40]--+-04.0-[03-1e]----00.0-[04-1e]--+-08.0-[05]----00.0 | | +-09.0-[06]----00.0 | | +-0a.0-[07-0b]-- | | +-0b.0-[0c-10]-- | | +-10.0-[11]----00.0 | | +-11.0-[12]----00.0 | | +-12.0-[13-17]-- | | +-13.0-[18-1c]-- | | +-15.0-[1d]----00.0 | | \-16.0-[1e]----00.0 | +-05.0-[1f-36]----00.0-[20-36]--+-04.0-[21]----00.0 | | +-05.0-[22]----00.0 | | +-06.0-[23]----00.0 | | +-07.0-[24]----00.0 | | +-0c.0-[25-29]-- | | +-0d.0-[2a]----00.0 | | +-0e.0-[2b-2f]-- | | +-0f.0-[30]----00.0 | | +-14.0-[31-35]-- | | \-17.0-[36]----00.0 | +-06.0-[37-3b]-- | \-07.0-[3c-40]-- After: -+-[0030:00]---00.0-[01-fe]----00.0-[02-fd]--+-04.0-[03-41]----00.0-[04-40]--+-08.0-[05-0a]----00.0 | | +-09.0-[0b-10]----00.0 | | +-0a.0-[11-16]-- | | +-0b.0-[17-1c]-- | | +-10.0-[1d-22]----00.0 | | +-11.0-[23-28]----00.0 | | +-12.0-[29-2e]-- | | +-13.0-[2f-34]-- | | +-15.0-[35-3a]----00.0 | | \-16.0-[3b-40]----00.0 | +-05.0-[42-80]----00.0-[43-7f]--+-04.0-[44-49]----00.0 | | +-05.0-[4a-4f]-- | | +-06.0-[50-55]----00.0 | | +-07.0-[56-5b]----00.0 | | +-0c.0-[5c-61]-- | | +-0d.0-[62-67]----00.0 | | +-0e.0-[68-6d]-- | | +-0f.0-[6e-73]----00.0 | | +-14.0-[74-79]-- | | \-17.0-[7a-7f]----00.0 | +-06.0-[81-bf]-- | \-07.0-[c0-fe]-- This does however have the disadvantage that we can't really support deep rather than wide topologies. In the above example you can see that when we hit the lowest level there is only 5 buses available per port, so if we had an architecture where downstream storage was daisy-chained we would run out bus numbers pretty quickly. These are solvable problems, but I figure I should see what people think before spending a lot of time on this. Not-Signed-off-by: Oliver O'Halloran Cc: Sergey Miroshnichenko --- Sergey, would something like this work for the NVMe drawers you've been working with? I think we'll need to support bus-number reassignment at eventually, but if we could kick that can down the road a bit it'd be helpful. --- core/pci.c | 44 ++++++++++++++++---------------------------- 1 file changed, 16 insertions(+), 28 deletions(-) diff --git a/core/pci.c b/core/pci.c index 454b50102e59..d5537b6a376b 100644 --- a/core/pci.c +++ b/core/pci.c @@ -743,9 +743,10 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus, bool scan_downstream) { struct pci_device *pd = NULL, *rc = NULL; - uint8_t dev, fn, next_bus, max_sub, save_max; + uint8_t dev, fn, next_bus, max_sub; uint32_t scan_map; bool use_max; + int bridges = 0, buses_per_bridge; /* Decide what to scan */ scan_map = parent ? parent->scan_map : phb->scan_map; @@ -810,7 +811,18 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus, next_bus = bus + 1; max_sub = bus; - save_max = max_bus; + + list_for_each(list, pd, link) + if (pd->is_bridge) + bridges++; + + buses_per_bridge = max_bus - next_bus - 1; + if (bridges) + buses_per_bridge /= bridges; + + PCIERR(phb, pd->bdfn, "found %d [%x:%x] downstream bridges, %sscanning down, %d\n", + bridges, next_bus, max_bus, scan_downstream ? "" : "not ", + buses_per_bridge); /* Scan down bridges */ list_for_each(list, pd, link) { @@ -819,32 +831,8 @@ uint8_t pci_scan_bus(struct phb *phb, uint8_t bus, uint8_t max_bus, if (!pd->is_bridge) continue; - /* We need to figure out a new bus number to start from. - * - * This can be tricky due to our HW constraints which differ - * from bridge to bridge so we are going to let the phb - * driver decide what to do. This can return us a maximum - * bus number to assign as well - * - * This function will: - * - * - Return the bus number to use as secondary for the - * bridge or 0 for a failure - * - * - "max_bus" will be adjusted to represent the max - * subordinate that can be associated with the downstream - * device - * - * - "use_max" will be set to true if the returned max_bus - * *must* be used as the subordinate bus number of that - * bridge (when we need to give aligned powers of two's - * on P7IOC). If is is set to false, we just adjust the - * subordinate bus number based on what we probed. - * - */ - max_bus = save_max; - next_bus = phb->ops->choose_bus(phb, pd, next_bus, - &max_bus, &use_max); + use_max = 1; + max_bus = next_bus + buses_per_bridge; /* Configure the bridge with the returned values */ if (next_bus <= bus) {