mbox series

[V3,00/10] PCIe TPH and cache direct injection support

Message ID 20240717205511.2541693-1-wei.huang2@amd.com
Headers show
Series PCIe TPH and cache direct injection support | expand

Message

Wei Huang July 17, 2024, 8:55 p.m. UTC
Hi All,

TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to
provide optimization hints for requests that target memory space. These hints,
in a format called steering tag (ST), are provided in the requester's TLP
headers and allow the system hardware, including the Root Complex, to
optimize the utilization of platform resources for the requests.

Upcoming AMD hardware implement a new Cache Injection feature that leverages
TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes
directly into an L2 within the CCX (core complex) closest to the CPU core that
will consume it. This technology is aimed at applications requiring high
performance and low latency, such as networking and storage applications.

This series introduces generic TPH support in Linux, allowing STs to be
retrieved from ACPI _DSM (as defined by ACPI) and used by PCIe endpoint
drivers as needed. As a demonstration, it includes an example usage in the
Broadcom BNXT driver. When running on Broadcom NICs with the appropriate
firmware, Cache Injection shows substantial memory bandwidth savings and
better network bandwidth using real-world benchmarks. This solution is
vendor-neutral, as both TPH and ACPI _DSM are industry standards.

V2->V3:
 * Rebase on top of pci/next tree (tag: pci-v6.11-changes)
 * Redefine PCI TPH registers (pci_regs.h) without breaking uapi
 * Fix commit subjects/messages for kernel options (Jonathan and Bjorn)
 * Break API functions into three individual patches for easy review
 * Rewrite lots of code in tph.c/tph.h based on feedback (Jonathan and Bjorn)

V1->V2:
 * Rebase on top of pci.git/for-linus (6.10-rc1)
 * Address mismatched data types reported by Sparse (Sparse check passed)
 * Add a new API, pcie_tph_intr_vec_supported(), for checking IRQ mode support
 * Skip bnxt affinity notifier registration if pcie_tph_intr_vec_supported()=false
 * Minor fixes in bnxt driver (i.e. warning messages)

Manoj Panicker (1):
  bnxt_en: Add TPH support in BNXT driver

Michael Chan (1):
  bnxt_en: Pass NQ ID to the FW when allocating RX/RX AGG rings

Wei Huang (8):
  PCI: Introduce PCIe TPH support framework
  PCI: Add TPH related register definition
  PCI/TPH: Add pci=notph to prevent use of TPH
  PCI/TPH: Add pci=nostmode to force No ST Mode
  PCI/TPH: Introduce API to check interrupt vector mode support
  PCI/TPH: Introduce API to retrieve TPH steering tags from ACPI
  PCI/TPH: Introduce API to update TPH steering tags in PCIe devices
  PCI/TPH: Add TPH documentation

 Documentation/PCI/index.rst                   |   1 +
 Documentation/PCI/tph.rst                     |  57 +++
 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/driver-api/pci/pci.rst          |   3 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  62 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |   4 +
 drivers/pci/pci-driver.c                      |  12 +-
 drivers/pci/pci.c                             |  24 +
 drivers/pci/pci.h                             |   6 +
 drivers/pci/pcie/Kconfig                      |  11 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/tph.c                        | 443 ++++++++++++++++++
 drivers/pci/probe.c                           |   1 +
 include/linux/pci-tph.h                       |  42 ++
 include/linux/pci.h                           |   6 +
 include/uapi/linux/pci_regs.h                 |  28 +-
 16 files changed, 696 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/PCI/tph.rst
 create mode 100644 drivers/pci/pcie/tph.c
 create mode 100644 include/linux/pci-tph.h

Comments

Lukas Wunner July 20, 2024, 8:08 a.m. UTC | #1
[cc += Paul Luse, Jing Liu]

On Wed, Jul 17, 2024 at 03:55:01PM -0500, Wei Huang wrote:
> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to
> provide optimization hints for requests that target memory space. These hints,
> in a format called steering tag (ST), are provided in the requester's TLP
> headers and allow the system hardware, including the Root Complex, to
> optimize the utilization of platform resources for the requests.
[...]
> This series introduces generic TPH support in Linux, allowing STs to be
> retrieved from ACPI _DSM (as defined by ACPI) and used by PCIe endpoint
> drivers as needed. As a demonstration, it includes an example usage in the
> Broadcom BNXT driver. When running on Broadcom NICs with the appropriate
> firmware, Cache Injection shows substantial memory bandwidth savings and
> better network bandwidth using real-world benchmarks. This solution is
> vendor-neutral, as both TPH and ACPI _DSM are industry standards.

I think you need to add support for saving and restoring TPH registers,
otherwise the changes you make to those registers may not survive
reset recovery or system sleep.  Granted, system sleep may not be
relevant for servers (which I assume you're targeting with your patches),
but reset recovery very much is.

Paul Luse submitted a patch two years ago to save and restore
TPH registers, perhaps you can include it in your patch set?

https://lore.kernel.org/all/20220712123641.2319-1-paul.e.luse@intel.com/

Bjorn left some comments on Paul's patch:

https://lore.kernel.org/all/20220912214516.GA538566@bhelgaas/

In particular, Bjorn asked for shared infrastructure to access
TPH registers (which you're adding in your patch set) and spotted
several nits (which should be easy to address).  So I think you may
be able to integrate Paul's patch into your series without too much
effort.

However note that when writing to TPH registers through the API you're
introducing, you also need to update the saved register state so that
those changes aren't lost upon a subsequent reset recovery.

Thanks,

Lukas
David Wei July 20, 2024, 7:25 p.m. UTC | #2
On 2024-07-17 13:55, Wei Huang wrote:
> Hi All,
> 
> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to
> provide optimization hints for requests that target memory space. These hints,
> in a format called steering tag (ST), are provided in the requester's TLP
> headers and allow the system hardware, including the Root Complex, to
> optimize the utilization of platform resources for the requests.
> 
> Upcoming AMD hardware implement a new Cache Injection feature that leverages
> TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes
> directly into an L2 within the CCX (core complex) closest to the CPU core that
> will consume it. This technology is aimed at applications requiring high
> performance and low latency, such as networking and storage applications.

This sounds very exciting Wei and it's good to see bnxt support. When
you say 'upcoming AMD hardware' are you able to share exactly which? I
would like to try this out.
Wei Huang July 22, 2024, 2:44 p.m. UTC | #3
On 7/20/24 03:08, Lukas Wunner wrote:
> [cc += Paul Luse, Jing Liu]
> 
> On Wed, Jul 17, 2024 at 03:55:01PM -0500, Wei Huang wrote:
>> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to
>> provide optimization hints for requests that target memory space. These hints,
>> in a format called steering tag (ST), are provided in the requester's TLP
>> headers and allow the system hardware, including the Root Complex, to
>> optimize the utilization of platform resources for the requests.
> [...]
>> This series introduces generic TPH support in Linux, allowing STs to be
>> retrieved from ACPI _DSM (as defined by ACPI) and used by PCIe endpoint
>> drivers as needed. As a demonstration, it includes an example usage in the
>> Broadcom BNXT driver. When running on Broadcom NICs with the appropriate
>> firmware, Cache Injection shows substantial memory bandwidth savings and
>> better network bandwidth using real-world benchmarks. This solution is
>> vendor-neutral, as both TPH and ACPI _DSM are industry standards.
> 
> I think you need to add support for saving and restoring TPH registers,
> otherwise the changes you make to those registers may not survive
> reset recovery or system sleep.  Granted, system sleep may not be
> relevant for servers (which I assume you're targeting with your patches),
> but reset recovery very much is.
> 
> Paul Luse submitted a patch two years ago to save and restore
> TPH registers, perhaps you can include it in your patch set?

Thanks for pointing them out. I skimmed through Paul's patch and it is
straightforward to integrate.

Depending on Bjorn's preference, I can either integrate it into my
patchset with full credits to Paul and Jing, or Paul want to resubmit a
new version.

> 
> https://lore.kernel.org/all/20220712123641.2319-1-paul.e.luse@intel.com/
> 
> Bjorn left some comments on Paul's patch:
> 
> https://lore.kernel.org/all/20220912214516.GA538566@bhelgaas/
> 
> In particular, Bjorn asked for shared infrastructure to access
> TPH registers (which you're adding in your patch set) and spotted
> several nits (which should be easy to address).  So I think you may
> be able to integrate Paul's patch into your series without too much
> effort.

I read Bjorn's comments, lots of them have been addressed in my patchset
(e.g. move under /pci/pcie, support _DSM and dev->tph). So, as I said,
integration is doable.

> 
> However note that when writing to TPH registers through the API you're
> introducing, you also need to update the saved register state so that
> those changes aren't lost upon a subsequent reset recovery.
> 
> Thanks,
> 
> Lukas
Lukas Wunner July 22, 2024, 2:58 p.m. UTC | #4
On Mon, Jul 22, 2024 at 09:44:32AM -0500, Wei Huang wrote:
> On 7/20/24 03:08, Lukas Wunner wrote:
> > Paul Luse submitted a patch two years ago to save and restore
> > TPH registers, perhaps you can include it in your patch set?
> 
> Thanks for pointing them out. I skimmed through Paul's patch and it is
> straightforward to integrate.
> 
> Depending on Bjorn's preference, I can either integrate it into my
> patchset with full credits to Paul and Jing, or Paul want to resubmit a
> new version.

The former would likely be better as I'm not sure Paul has the time
to respin the patch.  My recollection is that TPH save/restore support
was dropped as a requirement for the Intel device this was originally
developed for, but it would be a shame to lose the time and effort
that already went into it and I think it might be useful for your
use case as well to support reset recovery.

> I read Bjorn's comments, lots of them have been addressed in my patchset
> (e.g. move under /pci/pcie, support _DSM and dev->tph).

Indeed, good job!

Thanks for taking a look!

Lukas
Wei Huang July 22, 2024, 3:38 p.m. UTC | #5
On 7/20/24 14:25, David Wei wrote:
> On 2024-07-17 13:55, Wei Huang wrote:
>> Hi All,
>>
>> TPH (TLP Processing Hints) is a PCIe feature that allows endpoint devices to
>> provide optimization hints for requests that target memory space. These hints,
>> in a format called steering tag (ST), are provided in the requester's TLP
>> headers and allow the system hardware, including the Root Complex, to
>> optimize the utilization of platform resources for the requests.
>>
>> Upcoming AMD hardware implement a new Cache Injection feature that leverages
>> TPH. Cache Injection allows PCIe endpoints to inject I/O Coherent DMA writes
>> directly into an L2 within the CCX (core complex) closest to the CPU core that
>> will consume it. This technology is aimed at applications requiring high
>> performance and low latency, such as networking and storage applications.
> 
> This sounds very exciting Wei and it's good to see bnxt support. When
> you say 'upcoming AMD hardware' are you able to share exactly which? I
> would like to try this out.

I can't specify which server platforms yet. But you can find this
feature in either BIOS options or decode it from ACPI DSDT table (search
UUID e5c937d0-3553-4d7a-9117-ea4d19c3434d, Func 0x0F).