diff mbox series

[v7,5/7] PCI: qcom: Handle MSIs routed to multiple GIC interrupts

Message ID 20220505135407.1352382-6-dmitry.baryshkov@linaro.org
State New
Headers show
Series PCI: qcom: Fix higher MSI vectors handling | expand

Commit Message

Dmitry Baryshkov May 5, 2022, 1:54 p.m. UTC
On some of Qualcomm platforms each group of 32 MSI vectors is routed to the
separate GIC interrupt. Thus to receive higher MSI vectors properly,
add separate msi_host_init()/msi_host_deinit() handling additional host
IRQs.

Note, that if DT doesn't list extra MSI interrupts, the driver will limit
the amount of supported MSI vectors accordingly (to 32).

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 137 ++++++++++++++++++++++++-
 1 file changed, 136 insertions(+), 1 deletion(-)

Comments

Rob Herring May 5, 2022, 9:26 p.m. UTC | #1
On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote:
> On some of Qualcomm platforms each group of 32 MSI vectors is routed to the
> separate GIC interrupt. Thus to receive higher MSI vectors properly,
> add separate msi_host_init()/msi_host_deinit() handling additional host
> IRQs.

msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI 
controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS, 
so mutiple MSI IRQ outputs must have been added in newer versions of the 
DWC IP. If so, it's only a matter of time for another platform to 
do the same thing. Maybe someone from Synopsys could confirm?

Therefore this should all be handled in the DWC core. In general, I 
don't want to see more users nor more ops if we don't have to. Let's not 
create ops for what can be handled as data. AFAICT, this is just number 
of MSIs and # of MSIs per IRQ. It seems plausible another platform could 
do something similar and supporting it in the core code wouldn't 
negatively impact other platforms.

Rob
Dmitry Baryshkov May 6, 2022, 7:40 a.m. UTC | #2
On 06/05/2022 00:26, Rob Herring wrote:
> On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote:
>> On some of Qualcomm platforms each group of 32 MSI vectors is routed to the
>> separate GIC interrupt. Thus to receive higher MSI vectors properly,
>> add separate msi_host_init()/msi_host_deinit() handling additional host
>> IRQs.
> 
> msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI
> controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS,
> so mutiple MSI IRQ outputs must have been added in newer versions of the
> DWC IP. If so, it's only a matter of time for another platform to
> do the same thing. Maybe someone from Synopsys could confirm?

This is a valid question, and if you check, first iterations of this 
patchset had this in the dwc core ([1], [2]). Exactly for the reason 
this might be usable for other platforms.

Then I did some research for other platforms using DWC PCIe IP core. For 
example, both Tegra Xavier and iMX6 support up to 256 MSI vectors, they 
use DWC MSI IRQ controller. The iMX6 TRM explicitly describes using 
different MSI groups for different endpoints. The diagram shows 8 MSI 
IRQ signal lines. However in the end the signals from all groups are 
OR'ed to form a single host msi_ctrl_int. Thus currently I suppose that 
using multiple MSI IRQs is a peculiarity of Qualcomm platform.


> 
> Therefore this should all be handled in the DWC core. In general, I
> don't want to see more users nor more ops if we don't have to. Let's not
> create ops for what can be handled as data. AFAICT, this is just number
> of MSIs and # of MSIs per IRQ. It seems plausible another platform could
> do something similar and supporting it in the core code wouldn't
> negatively impact other platforms.

I wanted to balance adding additional ops vs complicating the core for 
other platforms. And I still suppose that platform specifics should go 
to the platform driver. However if you prefer [1] and [2], we can go 
back to that implementation.


[1]: 
https://lore.kernel.org/linux-arm-msm/20220427121653.3158569-2-dmitry.baryshkov@linaro.org/[2]: 
https://lore.kernel.org/linux-arm-msm/20220427121653.3158569-3-dmitry.baryshkov@linaro.org/
Rob Herring May 9, 2022, 9 p.m. UTC | #3
On Fri, May 06, 2022 at 10:40:56AM +0300, Dmitry Baryshkov wrote:
> On 06/05/2022 00:26, Rob Herring wrote:
> > On Thu, May 05, 2022 at 04:54:05PM +0300, Dmitry Baryshkov wrote:
> > > On some of Qualcomm platforms each group of 32 MSI vectors is routed to the
> > > separate GIC interrupt. Thus to receive higher MSI vectors properly,
> > > add separate msi_host_init()/msi_host_deinit() handling additional host
> > > IRQs.
> > 
> > msi_host_init() has 1 user (keystone) as it doesn't use the DWC MSI
> > controller. But QCom does given the access to PCIE_MSI_INTR0_STATUS,
> > so mutiple MSI IRQ outputs must have been added in newer versions of the
> > DWC IP. If so, it's only a matter of time for another platform to
> > do the same thing. Maybe someone from Synopsys could confirm?
> 
> This is a valid question, and if you check, first iterations of this
> patchset had this in the dwc core ([1], [2]). Exactly for the reason this
> might be usable for other platforms.
> 
> Then I did some research for other platforms using DWC PCIe IP core. For
> example, both Tegra Xavier and iMX6 support up to 256 MSI vectors, they use
> DWC MSI IRQ controller. The iMX6 TRM explicitly describes using different
> MSI groups for different endpoints. The diagram shows 8 MSI IRQ signal
> lines. However in the end the signals from all groups are OR'ed to form a
> single host msi_ctrl_int. Thus currently I suppose that using multiple MSI
> IRQs is a peculiarity of Qualcomm platform.

Chip integration very often will just OR together interrupts or not. 
It's completely at the whim of the SoC design, so I'd say both cases are 
very likely. Seems to be a feature in newer versions of the IP. Probably 
requested by some misguided h/w person thinking split interrupts would 
be 'faster'. (Sorry, too many past discussions with h/w designers on 
this topic.)

> > Therefore this should all be handled in the DWC core. In general, I
> > don't want to see more users nor more ops if we don't have to. Let's not
> > create ops for what can be handled as data. AFAICT, this is just number
> > of MSIs and # of MSIs per IRQ. It seems plausible another platform could
> > do something similar and supporting it in the core code wouldn't
> > negatively impact other platforms.
> 
> I wanted to balance adding additional ops vs complicating the core for other
> platforms. And I still suppose that platform specifics should go to the
> platform driver. However if you prefer [1] and [2], we can go back to that
> implementation.

Yes, I prefer that implementation.

Rob
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 375f27ab9403..53a7dc266cf4 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -194,6 +194,7 @@  struct qcom_pcie_ops {
 
 struct qcom_pcie_cfg {
 	const struct qcom_pcie_ops *ops;
+	unsigned int has_split_msi_irqs:1;
 	unsigned int pipe_clk_need_muxing:1;
 	unsigned int has_tbu_clk:1;
 	unsigned int has_ddrss_sf_tbu_clk:1;
@@ -209,6 +210,7 @@  struct qcom_pcie {
 	struct phy *phy;
 	struct gpio_desc *reset;
 	const struct qcom_pcie_cfg *cfg;
+	int msi_irqs[MAX_MSI_CTRLS];
 };
 
 #define to_qcom_pcie(x)		dev_get_drvdata((x)->dev)
@@ -1387,6 +1389,124 @@  static int qcom_pcie_config_sid_sm8250(struct qcom_pcie *pcie)
 	return 0;
 }
 
+static void qcom_chained_msi_isr(struct irq_desc *desc)
+{
+	struct irq_chip *chip = irq_desc_get_chip(desc);
+	int irq = irq_desc_get_irq(desc);
+	struct pcie_port *pp;
+	int i, pos;
+	unsigned long val;
+	u32 status, num_ctrls;
+	struct dw_pcie *pci;
+	struct qcom_pcie *pcie;
+
+	chained_irq_enter(chip, desc);
+
+	pp = irq_desc_get_handler_data(desc);
+	pci = to_dw_pcie_from_pp(pp);
+	pcie = to_qcom_pcie(pci);
+
+	/*
+	 * Unlike generic dw_handle_msi_irq(), we can determine which group of
+	 * MSIs triggered the IRQ, so process just that group.
+	 */
+	num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL;
+
+	for (i = 0; i < num_ctrls; i++) {
+		if (pcie->msi_irqs[i] == irq)
+			break;
+	}
+
+	if (WARN_ON_ONCE(unlikely(i == num_ctrls)))
+		goto out;
+
+	status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
+				   (i * MSI_REG_CTRL_BLOCK_SIZE));
+	if (!status)
+		goto out;
+
+	val = status;
+	pos = 0;
+	while ((pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL,
+				    pos)) != MAX_MSI_IRQS_PER_CTRL) {
+		generic_handle_domain_irq(pp->irq_domain,
+					  (i * MAX_MSI_IRQS_PER_CTRL) +
+					  pos);
+		pos++;
+	}
+
+out:
+	chained_irq_exit(chip, desc);
+}
+
+static int qcom_pcie_msi_host_init(struct pcie_port *pp)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	struct platform_device *pdev = to_platform_device(pci->dev);
+	char irq_name[] = "msiXXX";
+	unsigned int ctrl, num_ctrls;
+	int msi_irq, ret;
+
+	pp->msi_irq = -EINVAL;
+
+	/*
+	 * We provide our own implementation of MSI init/deinit, but rely on
+	 * using the rest of DWC MSI functionality.
+	 */
+	pp->has_msi_ctrl = true;
+
+	msi_irq = platform_get_irq_byname_optional(pdev, "msi");
+	if (msi_irq < 0) {
+		msi_irq = platform_get_irq(pdev, 0);
+		if (msi_irq < 0)
+			return msi_irq;
+	}
+
+	pcie->msi_irqs[0] = msi_irq;
+
+	for (num_ctrls = 1; num_ctrls < MAX_MSI_CTRLS; num_ctrls++) {
+		snprintf(irq_name, sizeof(irq_name), "msi%d", num_ctrls + 1);
+		msi_irq = platform_get_irq_byname_optional(pdev, irq_name);
+		if (msi_irq == -ENXIO)
+			break;
+
+		pcie->msi_irqs[num_ctrls] = msi_irq;
+	}
+
+	pp->num_vectors = num_ctrls * MAX_MSI_IRQS_PER_CTRL;
+	dev_info(&pdev->dev, "Using %d MSI vectors\n", pp->num_vectors);
+	for (ctrl = 0; ctrl < num_ctrls; ctrl++)
+		pp->irq_mask[ctrl] = ~0;
+
+	ret = dw_pcie_allocate_msi(pp);
+	if (ret)
+		return ret;
+
+	for (ctrl = 0; ctrl < num_ctrls; ctrl++)
+		irq_set_chained_handler_and_data(pcie->msi_irqs[ctrl],
+						 qcom_chained_msi_isr,
+						 pp);
+
+	return 0;
+}
+
+static void qcom_pcie_msi_host_deinit(struct pcie_port *pp)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
+	struct qcom_pcie *pcie = to_qcom_pcie(pci);
+	unsigned int ctrl, num_ctrls;
+
+	num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL;
+
+	for (ctrl = 0; ctrl < num_ctrls; ctrl++)
+		irq_set_chained_handler_and_data(pcie->msi_irqs[ctrl],
+						 NULL,
+						 NULL);
+
+	dw_pcie_free_msi(pp);
+}
+
 static int qcom_pcie_host_init(struct pcie_port *pp)
 {
 	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
@@ -1435,6 +1555,12 @@  static const struct dw_pcie_host_ops qcom_pcie_dw_ops = {
 	.host_init = qcom_pcie_host_init,
 };
 
+static const struct dw_pcie_host_ops qcom_pcie_msi_dw_ops = {
+	.host_init = qcom_pcie_host_init,
+	.msi_host_init = qcom_pcie_msi_host_init,
+	.msi_host_deinit = qcom_pcie_msi_host_deinit,
+};
+
 /* Qcom IP rev.: 2.1.0	Synopsys IP rev.: 4.01a */
 static const struct qcom_pcie_ops ops_2_1_0 = {
 	.get_resources = qcom_pcie_get_resources_2_1_0,
@@ -1508,6 +1634,7 @@  static const struct qcom_pcie_cfg ipq8064_cfg = {
 
 static const struct qcom_pcie_cfg msm8996_cfg = {
 	.ops = &ops_2_3_2,
+	.has_split_msi_irqs = true,
 };
 
 static const struct qcom_pcie_cfg ipq8074_cfg = {
@@ -1520,6 +1647,7 @@  static const struct qcom_pcie_cfg ipq4019_cfg = {
 
 static const struct qcom_pcie_cfg sdm845_cfg = {
 	.ops = &ops_2_7_0,
+	.has_split_msi_irqs = true,
 	.has_tbu_clk = true,
 };
 
@@ -1532,12 +1660,14 @@  static const struct qcom_pcie_cfg sm8150_cfg = {
 
 static const struct qcom_pcie_cfg sm8250_cfg = {
 	.ops = &ops_1_9_0,
+	.has_split_msi_irqs = true,
 	.has_tbu_clk = true,
 	.has_ddrss_sf_tbu_clk = true,
 };
 
 static const struct qcom_pcie_cfg sm8450_pcie0_cfg = {
 	.ops = &ops_1_9_0,
+	.has_split_msi_irqs = true,
 	.has_ddrss_sf_tbu_clk = true,
 	.pipe_clk_need_muxing = true,
 	.has_aggre0_clk = true,
@@ -1546,6 +1676,7 @@  static const struct qcom_pcie_cfg sm8450_pcie0_cfg = {
 
 static const struct qcom_pcie_cfg sm8450_pcie1_cfg = {
 	.ops = &ops_1_9_0,
+	.has_split_msi_irqs = true,
 	.has_ddrss_sf_tbu_clk = true,
 	.pipe_clk_need_muxing = true,
 	.has_aggre1_clk = true,
@@ -1553,6 +1684,7 @@  static const struct qcom_pcie_cfg sm8450_pcie1_cfg = {
 
 static const struct qcom_pcie_cfg sc7280_cfg = {
 	.ops = &ops_1_9_0,
+	.has_split_msi_irqs = true,
 	.has_tbu_clk = true,
 	.pipe_clk_need_muxing = true,
 };
@@ -1626,7 +1758,10 @@  static int qcom_pcie_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_pm_runtime_put;
 
-	pp->ops = &qcom_pcie_dw_ops;
+	if (pcie->cfg->has_split_msi_irqs)
+		pp->ops = &qcom_pcie_msi_dw_ops;
+	else
+		pp->ops = &qcom_pcie_dw_ops;
 
 	ret = phy_init(pcie->phy);
 	if (ret) {