Message ID | 20220505135407.1352382-6-dmitry.baryshkov@linaro.org |
---|---|
State | New |
Headers | show |
Series | PCI: qcom: Fix higher MSI vectors handling | expand |
On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote: > On some of Qualcomm platforms each group of 32 MSI vectors is routed to the > separate GIC interrupt. Thus to receive higher MSI vectors properly, > add separate msi_host_init()/msi_host_deinit() handling additional host > IRQs. msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS, so mutiple MSI IRQ outputs must have been added in newer versions of the DWC IP. If so, it's only a matter of time for another platform to do the same thing. Maybe someone from Synopsys could confirm? Therefore this should all be handled in the DWC core. In general, I don't want to see more users nor more ops if we don't have to. Let's not create ops for what can be handled as data. AFAICT, this is just number of MSIs and # of MSIs per IRQ. It seems plausible another platform could do something similar and supporting it in the core code wouldn't negatively impact other platforms. Rob
On 06/05/2022 00:26, Rob Herring wrote: > On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote: >> On some of Qualcomm platforms each group of 32 MSI vectors is routed to the >> separate GIC interrupt. Thus to receive higher MSI vectors properly, >> add separate msi_host_init()/msi_host_deinit() handling additional host >> IRQs. > > msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI > controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS, > so mutiple MSI IRQ outputs must have been added in newer versions of the > DWC IP. If so, it's only a matter of time for another platform to > do the same thing. Maybe someone from Synopsys could confirm? This is a valid question, and if you check, first iterations of this patchset had this in the dwc core ([1], [2]). Exactly for the reason this might be usable for other platforms. Then I did some research for other platforms using DWC PCIe IP core. For example, both Tegra Xavier and iMX6 support up to 256 MSI vectors, they use DWC MSI IRQ controller. The iMX6 TRM explicitly describes using different MSI groups for different endpoints. The diagram shows 8 MSI IRQ signal lines. However in the end the signals from all groups are OR'ed to form a single host msi_ctrl_int. Thus currently I suppose that using multiple MSI IRQs is a peculiarity of Qualcomm platform. > > Therefore this should all be handled in the DWC core. In general, I > don't want to see more users nor more ops if we don't have to. Let's not > create ops for what can be handled as data. AFAICT, this is just number > of MSIs and # of MSIs per IRQ. It seems plausible another platform could > do something similar and supporting it in the core code wouldn't > negatively impact other platforms. I wanted to balance adding additional ops vs complicating the core for other platforms. And I still suppose that platform specifics should go to the platform driver. However if you prefer [1] and [2], we can go back to that implementation. [1]: https://lore.kernel.org/linux-arm-msm/20220427121653.3158569-2-dmitry.baryshkov@linaro.org/[2]: https://lore.kernel.org/linux-arm-msm/20220427121653.3158569-3-dmitry.baryshkov@linaro.org/
On Fri, May 06, 2022 at 10:40:56AM +0300, Dmitry Baryshkov wrote: > On 06/05/2022 00:26, Rob Herring wrote: > > On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote: > > > On some of Qualcomm platforms each group of 32 MSI vectors is routed to the > > > separate GIC interrupt. Thus to receive higher MSI vectors properly, > > > add separate msi_host_init()/msi_host_deinit() handling additional host > > > IRQs. > > > > msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI > > controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS, > > so mutiple MSI IRQ outputs must have been added in newer versions of the > > DWC IP. If so, it's only a matter of time for another platform to > > do the same thing. Maybe someone from Synopsys could confirm? > > This is a valid question, and if you check, first iterations of this > patchset had this in the dwc core ([1], [2]). Exactly for the reason this > might be usable for other platforms. > > Then I did some research for other platforms using DWC PCIe IP core. For > example, both Tegra Xavier and iMX6 support up to 256 MSI vectors, they use > DWC MSI IRQ controller. The iMX6 TRM explicitly describes using different > MSI groups for different endpoints. The diagram shows 8 MSI IRQ signal > lines. However in the end the signals from all groups are OR'ed to form a > single host msi_ctrl_int. Thus currently I suppose that using multiple MSI > IRQs is a peculiarity of Qualcomm platform. Chip integration very often will just OR together interrupts or not. It's completely at the whim of the SoC design, so I'd say both cases are very likely. Seems to be a feature in newer versions of the IP. Probably requested by some misguided h/w person thinking split interrupts would be 'faster'. (Sorry, too many past discussions with h/w designers on this topic.) > > Therefore this should all be handled in the DWC core. In general, I > > don't want to see more users nor more ops if we don't have to. Let's not > > create ops for what can be handled as data. AFAICT, this is just number > > of MSIs and # of MSIs per IRQ. It seems plausible another platform could > > do something similar and supporting it in the core code wouldn't > > negatively impact other platforms. > > I wanted to balance adding additional ops vs complicating the core for other > platforms. And I still suppose that platform specifics should go to the > platform driver. However if you prefer [1] and [2], we can go back to that > implementation. Yes, I prefer that implementation. Rob
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c index 375f27ab9403..53a7dc266cf4 100644 --- a/drivers/pci/controller/dwc/pcie-qcom.c +++ b/drivers/pci/controller/dwc/pcie-qcom.c @@ -194,6 +194,7 @@ struct qcom_pcie_ops { struct qcom_pcie_cfg { const struct qcom_pcie_ops *ops; + unsigned int has_split_msi_irqs:1; unsigned int pipe_clk_need_muxing:1; unsigned int has_tbu_clk:1; unsigned int has_ddrss_sf_tbu_clk:1; @@ -209,6 +210,7 @@ struct qcom_pcie { struct phy *phy; struct gpio_desc *reset; const struct qcom_pcie_cfg *cfg; + int msi_irqs[MAX_MSI_CTRLS]; }; #define to_qcom_pcie(x) dev_get_drvdata((x)->dev) @@ -1387,6 +1389,124 @@ static int qcom_pcie_config_sid_sm8250(struct qcom_pcie *pcie) return 0; } +static void qcom_chained_msi_isr(struct irq_desc *desc) +{ + struct irq_chip *chip = irq_desc_get_chip(desc); + int irq = irq_desc_get_irq(desc); + struct pcie_port *pp; + int i, pos; + unsigned long val; + u32 status, num_ctrls; + struct dw_pcie *pci; + struct qcom_pcie *pcie; + + chained_irq_enter(chip, desc); + + pp = irq_desc_get_handler_data(desc); + pci = to_dw_pcie_from_pp(pp); + pcie = to_qcom_pcie(pci); + + /* + * Unlike generic dw_handle_msi_irq(), we can determine which group of + * MSIs triggered the IRQ, so process just that group. + */ + num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL; + + for (i = 0; i < num_ctrls; i++) { + if (pcie->msi_irqs[i] == irq) + break; + } + + if (WARN_ON_ONCE(unlikely(i == num_ctrls))) + goto out; + + status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS + + (i * MSI_REG_CTRL_BLOCK_SIZE)); + if (!status) + goto out; + + val = status; + pos = 0; + while ((pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL, + pos)) != MAX_MSI_IRQS_PER_CTRL) { + generic_handle_domain_irq(pp->irq_domain, + (i * MAX_MSI_IRQS_PER_CTRL) + + pos); + pos++; + } + +out: + chained_irq_exit(chip, desc); +} + +static int qcom_pcie_msi_host_init(struct pcie_port *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct qcom_pcie *pcie = to_qcom_pcie(pci); + struct platform_device *pdev = to_platform_device(pci->dev); + char irq_name[] = "msiXXX"; + unsigned int ctrl, num_ctrls; + int msi_irq, ret; + + pp->msi_irq = -EINVAL; + + /* + * We provide our own implementation of MSI init/deinit, but rely on + * using the rest of DWC MSI functionality. + */ + pp->has_msi_ctrl = true; + + msi_irq = platform_get_irq_byname_optional(pdev, "msi"); + if (msi_irq < 0) { + msi_irq = platform_get_irq(pdev, 0); + if (msi_irq < 0) + return msi_irq; + } + + pcie->msi_irqs[0] = msi_irq; + + for (num_ctrls = 1; num_ctrls < MAX_MSI_CTRLS; num_ctrls++) { + snprintf(irq_name, sizeof(irq_name), "msi%d", num_ctrls + 1); + msi_irq = platform_get_irq_byname_optional(pdev, irq_name); + if (msi_irq == -ENXIO) + break; + + pcie->msi_irqs[num_ctrls] = msi_irq; + } + + pp->num_vectors = num_ctrls * MAX_MSI_IRQS_PER_CTRL; + dev_info(&pdev->dev, "Using %d MSI vectors\n", pp->num_vectors); + for (ctrl = 0; ctrl < num_ctrls; ctrl++) + pp->irq_mask[ctrl] = ~0; + + ret = dw_pcie_allocate_msi(pp); + if (ret) + return ret; + + for (ctrl = 0; ctrl < num_ctrls; ctrl++) + irq_set_chained_handler_and_data(pcie->msi_irqs[ctrl], + qcom_chained_msi_isr, + pp); + + return 0; +} + +static void qcom_pcie_msi_host_deinit(struct pcie_port *pp) +{ + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); + struct qcom_pcie *pcie = to_qcom_pcie(pci); + unsigned int ctrl, num_ctrls; + + num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL; + + for (ctrl = 0; ctrl < num_ctrls; ctrl++) + irq_set_chained_handler_and_data(pcie->msi_irqs[ctrl], + NULL, + NULL); + + dw_pcie_free_msi(pp); +} + static int qcom_pcie_host_init(struct pcie_port *pp) { struct dw_pcie *pci = to_dw_pcie_from_pp(pp); @@ -1435,6 +1555,12 @@ static const struct dw_pcie_host_ops qcom_pcie_dw_ops = { .host_init = qcom_pcie_host_init, }; +static const struct dw_pcie_host_ops qcom_pcie_msi_dw_ops = { + .host_init = qcom_pcie_host_init, + .msi_host_init = qcom_pcie_msi_host_init, + .msi_host_deinit = qcom_pcie_msi_host_deinit, +}; + /* Qcom IP rev.: 2.1.0 Synopsys IP rev.: 4.01a */ static const struct qcom_pcie_ops ops_2_1_0 = { .get_resources = qcom_pcie_get_resources_2_1_0, @@ -1508,6 +1634,7 @@ static const struct qcom_pcie_cfg ipq8064_cfg = { static const struct qcom_pcie_cfg msm8996_cfg = { .ops = &ops_2_3_2, + .has_split_msi_irqs = true, }; static const struct qcom_pcie_cfg ipq8074_cfg = { @@ -1520,6 +1647,7 @@ static const struct qcom_pcie_cfg ipq4019_cfg = { static const struct qcom_pcie_cfg sdm845_cfg = { .ops = &ops_2_7_0, + .has_split_msi_irqs = true, .has_tbu_clk = true, }; @@ -1532,12 +1660,14 @@ static const struct qcom_pcie_cfg sm8150_cfg = { static const struct qcom_pcie_cfg sm8250_cfg = { .ops = &ops_1_9_0, + .has_split_msi_irqs = true, .has_tbu_clk = true, .has_ddrss_sf_tbu_clk = true, }; static const struct qcom_pcie_cfg sm8450_pcie0_cfg = { .ops = &ops_1_9_0, + .has_split_msi_irqs = true, .has_ddrss_sf_tbu_clk = true, .pipe_clk_need_muxing = true, .has_aggre0_clk = true, @@ -1546,6 +1676,7 @@ static const struct qcom_pcie_cfg sm8450_pcie0_cfg = { static const struct qcom_pcie_cfg sm8450_pcie1_cfg = { .ops = &ops_1_9_0, + .has_split_msi_irqs = true, .has_ddrss_sf_tbu_clk = true, .pipe_clk_need_muxing = true, .has_aggre1_clk = true, @@ -1553,6 +1684,7 @@ static const struct qcom_pcie_cfg sm8450_pcie1_cfg = { static const struct qcom_pcie_cfg sc7280_cfg = { .ops = &ops_1_9_0, + .has_split_msi_irqs = true, .has_tbu_clk = true, .pipe_clk_need_muxing = true, }; @@ -1626,7 +1758,10 @@ static int qcom_pcie_probe(struct platform_device *pdev) if (ret) goto err_pm_runtime_put; - pp->ops = &qcom_pcie_dw_ops; + if (pcie->cfg->has_split_msi_irqs) + pp->ops = &qcom_pcie_msi_dw_ops; + else + pp->ops = &qcom_pcie_dw_ops; ret = phy_init(pcie->phy); if (ret) {
On some of Qualcomm platforms each group of 32 MSI vectors is routed to the separate GIC interrupt. Thus to receive higher MSI vectors properly, add separate msi_host_init()/msi_host_deinit() handling additional host IRQs. Note, that if DT doesn't list extra MSI interrupts, the driver will limit the amount of supported MSI vectors accordingly (to 32). Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> --- drivers/pci/controller/dwc/pcie-qcom.c | 137 ++++++++++++++++++++++++- 1 file changed, 136 insertions(+), 1 deletion(-)