mbox series

[v2,00/18] Improve PCI memory mapping API

Message ID 20240330041928.1555578-1-dlemoal@kernel.org
Headers show
Series Improve PCI memory mapping API | expand

Message

Damien Le Moal March 30, 2024, 4:19 a.m. UTC
This series introduces the new functions pci_epc_map_align(),
pci_epc_mem_map() and pci_epc_mem_unmap() to improve handling of the
PCI address mapping alignment constraints of endpoint controllers in a
controller independent manner.

The issue fixed is that the fixed alignment defined by the "align" field
of struct pci_epc_features assumes that the alignment of the endpoint
memory used to map a RC PCI address range is independent of the PCI
address being mapped. But that is not the case for the rk3399 SoC
controller: in endpoint mode, this controller uses the lower bits of the
local endpoint memory address as the lower bits for the PCI addresses
for data transfers. That is, when mapping local memory, one must take
into account the number of bits of the RC PCI address that change from
the start address of the mapping.

To fix this, the new endpoint controller method .map_align is introduced
and called from pci_epc_map_align(). This method is optional and for
controllers that do not define it, the mapping information returned
is based of the fixed alignment constraint as defined by the align
feature.

The functions pci_epc_mem_map() is a helper function which obtains
mapping information, allocates endpoint controller memory according to
the mapping size obtained and maps the memory. pci_epc_mem_map() unmaps
and frees the endpoint memory.

This series is organized as follows:
 - Patch 1 tidy up the epc core code
 - Patch 2 and 3 introduce the new map_align endpoint controller method
   and related epc functions.
 - Patch 4 to 6 modify the test endpoint driver to use these new
   functions and improve the code of this driver.
 - Finally, Patch 7 to 18 fix the rk3399 endpoint driver, defining a
   .map_align method for it and improving its overall code readability
   and features.

Changes from v1:
 - Changed pci_epc_check_func() to pci_epc_function_is_valid() in patch
   1.
 - Removed patch "PCI: endpoint: Improve pci_epc_mem_alloc_addr()"
   (former patch 2 of v1)
 - Various typos cleanups all over. Also fixed some blank space
   indentation.
 - Added review tags

Damien Le Moal (17):
  PCI: endpoint: Introduce pci_epc_function_is_valid()
  PCI: endpoint: Introduce pci_epc_map_align()
  PCI: endpoint: Introduce pci_epc_mem_map()/unmap()
  PCI: endpoint: test: Use pci_epc_mem_map/unmap()
  PCI: endpoint: test: Synchronously cancel command handler work
  PCI: endpoint: test: Implement link_down event operation
  PCI: rockchip-ep: Fix address translation unit programming
  PCI: rockchip-ep: Use a macro to define EP controller .align feature
  PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
  PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
  PCI: rockchip-ep: Implement the map_align endpoint controller operation
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
  PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
  PCI: rockchip-ep: Refactor endpoint link training enable
  PCI: rockship-ep: Introduce rockchip_pcie_ep_stop()
  PCI: rockchip-ep: Improve link training
  PCI: rockchip-ep: Handle PERST# signal in endpoint mode

Wilfred Mallawa (1):
  dt-bindings: pci: rockchip,rk3399-pcie-ep: Add ep-gpios property

 .../bindings/pci/rockchip,rk3399-pcie-ep.yaml |   3 +
 drivers/pci/controller/pcie-rockchip-ep.c     | 393 ++++++++++++++----
 drivers/pci/controller/pcie-rockchip.c        |  17 +-
 drivers/pci/controller/pcie-rockchip.h        |  22 +
 drivers/pci/endpoint/functions/pci-epf-test.c | 390 +++++++++--------
 drivers/pci/endpoint/pci-epc-core.c           | 213 +++++++---
 include/linux/pci-epc.h                       |  39 ++
 7 files changed, 768 insertions(+), 309 deletions(-)

Comments

Rick Wertenbroek April 2, 2024, 12:36 p.m. UTC | #1
On Sat, Mar 30, 2024 at 5:19 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> This series introduces the new functions pci_epc_map_align(),
> pci_epc_mem_map() and pci_epc_mem_unmap() to improve handling of the
> PCI address mapping alignment constraints of endpoint controllers in a
> controller independent manner.
>
> The issue fixed is that the fixed alignment defined by the "align" field
> of struct pci_epc_features assumes that the alignment of the endpoint
> memory used to map a RC PCI address range is independent of the PCI
> address being mapped. But that is not the case for the rk3399 SoC
> controller: in endpoint mode, this controller uses the lower bits of the
> local endpoint memory address as the lower bits for the PCI addresses
> for data transfers. That is, when mapping local memory, one must take
> into account the number of bits of the RC PCI address that change from
> the start address of the mapping.
>
> To fix this, the new endpoint controller method .map_align is introduced
> and called from pci_epc_map_align(). This method is optional and for
> controllers that do not define it, the mapping information returned
> is based of the fixed alignment constraint as defined by the align
> feature.
>
> The functions pci_epc_mem_map() is a helper function which obtains
> mapping information, allocates endpoint controller memory according to
> the mapping size obtained and maps the memory. pci_epc_mem_map() unmaps
> and frees the endpoint memory.

This way of mapping is not only useful for the RK3399 but would also
help for the addition of other future PCI endpoint controller drivers.

For example, on several FPGA PCI endpoint IPs the window mapping is
also done by passing N bits from the mapped address and M bits from
the window mapping address (where N+M=bus width, e.g., 32 or 64).
Using AND/OR masks/operations to combine the bits for the hardware
address from the mapped address and map base uses less resources than
using add/subtract to get the hardware address from an unaligned map
base and offset. So I guess that more than a few IPs, being hard or
soft IPs, use this kind of mapping (to reduce size, logic, improve max
operating frequency, improve efficiency etc.)

Two major examples come to mind :
1) The AMD/Xilinx PCIe endpoint IP. The mapping is documentented in
"AXI Bridge for PCI Express Gen3 Subsystem Product Guide (PG194)" [1]
section BAR and Address Translation (Figure AXI to PCIe Address
Translation).
2) The Intel/Altera PCIe endpoint IP. The mapping is documented in
"Multi Channel DMA Intel® FPGA IP for PCI Express* User Guide" [2]
section 3.6. Root Port Address Translation Table Enablement.

Both those IPs don't have mainline support yet as PCIe endpoint
controllers but also use a similar kind of mapping as suggested here
for the RK3399. So these changes would also make the addition of these
controller drivers easier.

The new mapping scheme also makes it much clearer in the PCI endpoint
framework. Because without it some mapping operation would fail
because of alignment requirements in the controller, this requires
extra code and checks in the drivers that implement the endpoint
functions. With the current state of the PCI endpoint controller
framework there is no good way to express that the controller does an
AND/OR mask combination to create the hardware address and therefore
requires the map to be aligned to the window size, rather than doing a
window base addition with an offset (subtraction) in the mapping. This
could benefit from further clarification in the endpoint framework.

Best regards,
Rick

[1] https://docs.amd.com/r/en-US/pg194-axi-bridge-pcie-gen3/Address-Translation
[2] https://www.intel.com/content/www/us/en/docs/programmable/683821/23-4/

>
> This series is organized as follows:
>  - Patch 1 tidy up the epc core code
>  - Patch 2 and 3 introduce the new map_align endpoint controller method
>    and related epc functions.
>  - Patch 4 to 6 modify the test endpoint driver to use these new
>    functions and improve the code of this driver.
>  - Finally, Patch 7 to 18 fix the rk3399 endpoint driver, defining a
>    .map_align method for it and improving its overall code readability
>    and features.
>
> Changes from v1:
>  - Changed pci_epc_check_func() to pci_epc_function_is_valid() in patch
>    1.
>  - Removed patch "PCI: endpoint: Improve pci_epc_mem_alloc_addr()"
>    (former patch 2 of v1)
>  - Various typos cleanups all over. Also fixed some blank space
>    indentation.
>  - Added review tags
>
> Damien Le Moal (17):
>   PCI: endpoint: Introduce pci_epc_function_is_valid()
>   PCI: endpoint: Introduce pci_epc_map_align()
>   PCI: endpoint: Introduce pci_epc_mem_map()/unmap()
>   PCI: endpoint: test: Use pci_epc_mem_map/unmap()
>   PCI: endpoint: test: Synchronously cancel command handler work
>   PCI: endpoint: test: Implement link_down event operation
>   PCI: rockchip-ep: Fix address translation unit programming
>   PCI: rockchip-ep: Use a macro to define EP controller .align feature
>   PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
>   PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
>   PCI: rockchip-ep: Implement the map_align endpoint controller operation
>   PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
>   PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
>   PCI: rockchip-ep: Refactor endpoint link training enable
>   PCI: rockship-ep: Introduce rockchip_pcie_ep_stop()
>   PCI: rockchip-ep: Improve link training
>   PCI: rockchip-ep: Handle PERST# signal in endpoint mode
>
> Wilfred Mallawa (1):
>   dt-bindings: pci: rockchip,rk3399-pcie-ep: Add ep-gpios property
>
>  .../bindings/pci/rockchip,rk3399-pcie-ep.yaml |   3 +
>  drivers/pci/controller/pcie-rockchip-ep.c     | 393 ++++++++++++++----
>  drivers/pci/controller/pcie-rockchip.c        |  17 +-
>  drivers/pci/controller/pcie-rockchip.h        |  22 +
>  drivers/pci/endpoint/functions/pci-epf-test.c | 390 +++++++++--------
>  drivers/pci/endpoint/pci-epc-core.c           | 213 +++++++---
>  include/linux/pci-epc.h                       |  39 ++
>  7 files changed, 768 insertions(+), 309 deletions(-)
>
> --
> 2.44.0
>
Manivannan Sadhasivam April 3, 2024, 6:46 a.m. UTC | #2
On Sat, Mar 30, 2024 at 01:19:11PM +0900, Damien Le Moal wrote:
> Introduce the epc core helper function pci_epc_function_is_valid() to
> verify that an epc pointer, a physical function number and a virtual
> function number are all valid. This avoids repeating the code pattern:
> 
> if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> 	return err;
> 
> if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> 	return err;
> 
> in many functions of the endpoint controller core code.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

One nit below. With that fixed,

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

> ---
>  drivers/pci/endpoint/pci-epc-core.c | 79 +++++++++++------------------
>  1 file changed, 31 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index da3fc0795b0b..754afd115bbd 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -126,6 +126,18 @@ enum pci_barno pci_epc_get_next_free_bar(const struct pci_epc_features
>  }
>  EXPORT_SYMBOL_GPL(pci_epc_get_next_free_bar);
>  
> +static inline bool pci_epc_function_is_valid(struct pci_epc *epc,
> +					     u8 func_no, u8 vfunc_no)

No need to add 'inline' keyword to function definitions in a .c file. Compiler
will handle that.

- Mani

> +{
> +	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> +		return false;
> +
> +	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +		return false;
> +
> +	return true;
> +}
> +
>  /**
>   * pci_epc_get_features() - get the features supported by EPC
>   * @epc: the features supported by *this* EPC device will be returned
> @@ -143,10 +155,7 @@ const struct pci_epc_features *pci_epc_get_features(struct pci_epc *epc,
>  {
>  	const struct pci_epc_features *epc_features;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return NULL;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return NULL;
>  
>  	if (!epc->ops->get_features)
> @@ -216,10 +225,7 @@ int pci_epc_raise_irq(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  {
>  	int ret;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return -EINVAL;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
>  	if (!epc->ops->raise_irq)
> @@ -260,10 +266,7 @@ int pci_epc_map_msi_irq(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  {
>  	int ret;
>  
> -	if (IS_ERR_OR_NULL(epc))
> -		return -EINVAL;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
>  	if (!epc->ops->map_msi_irq)
> @@ -291,10 +294,7 @@ int pci_epc_get_msi(struct pci_epc *epc, u8 func_no, u8 vfunc_no)
>  {
>  	int interrupt;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return 0;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return 0;
>  
>  	if (!epc->ops->get_msi)
> @@ -327,11 +327,10 @@ int pci_epc_set_msi(struct pci_epc *epc, u8 func_no, u8 vfunc_no, u8 interrupts)
>  	int ret;
>  	u8 encode_int;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions ||
> -	    interrupts < 1 || interrupts > 32)
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (interrupts < 1 || interrupts > 32)
>  		return -EINVAL;
>  
>  	if (!epc->ops->set_msi)
> @@ -359,10 +358,7 @@ int pci_epc_get_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no)
>  {
>  	int interrupt;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return 0;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return 0;
>  
>  	if (!epc->ops->get_msix)
> @@ -395,11 +391,10 @@ int pci_epc_set_msix(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  {
>  	int ret;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions ||
> -	    interrupts < 1 || interrupts > 2048)
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (interrupts < 1 || interrupts > 2048)
>  		return -EINVAL;
>  
>  	if (!epc->ops->set_msix)
> @@ -426,10 +421,7 @@ EXPORT_SYMBOL_GPL(pci_epc_set_msix);
>  void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  			phys_addr_t phys_addr)
>  {
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return;
>  
>  	if (!epc->ops->unmap_addr)
> @@ -457,10 +449,7 @@ int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  {
>  	int ret;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return -EINVAL;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
>  	if (!epc->ops->map_addr)
> @@ -487,12 +476,11 @@ EXPORT_SYMBOL_GPL(pci_epc_map_addr);
>  void pci_epc_clear_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  		       struct pci_epf_bar *epf_bar)
>  {
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions ||
> -	    (epf_bar->barno == BAR_5 &&
> -	     epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return;
>  
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (epf_bar->barno == BAR_5 &&
> +	    epf_bar->flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
>  		return;
>  
>  	if (!epc->ops->clear_bar)
> @@ -519,18 +507,16 @@ int pci_epc_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  	int ret;
>  	int flags = epf_bar->flags;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions ||
> -	    (epf_bar->barno == BAR_5 &&
> -	     flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ||
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> +		return -EINVAL;
> +
> +	if ((epf_bar->barno == BAR_5 && flags & PCI_BASE_ADDRESS_MEM_TYPE_64) ||
>  	    (flags & PCI_BASE_ADDRESS_SPACE_IO &&
>  	     flags & PCI_BASE_ADDRESS_IO_MASK) ||
>  	    (upper_32_bits(epf_bar->size) &&
>  	     !(flags & PCI_BASE_ADDRESS_MEM_TYPE_64)))
>  		return -EINVAL;
>  
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> -		return -EINVAL;
> -
>  	if (!epc->ops->set_bar)
>  		return 0;
>  
> @@ -559,10 +545,7 @@ int pci_epc_write_header(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  {
>  	int ret;
>  
> -	if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> -		return -EINVAL;
> -
> -	if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>  		return -EINVAL;
>  
>  	/* Only Virtual Function #1 has deviceID */
> -- 
> 2.44.0
>
Manivannan Sadhasivam April 3, 2024, 7:45 a.m. UTC | #3
On Sat, Mar 30, 2024 at 01:19:12PM +0900, Damien Le Moal wrote:
> Some endpoint controllers have requirements on the alignment of the
> controller physical memory address that must be used to map a RC PCI
> address region. For instance, the rockchip endpoint controller uses
> at most the lower 20 bits of a physical memory address region as the
> lower bits of an RC PCI address. For mapping a PCI address region of
> size bytes starting from pci_addr, the exact number of address bits
> used is the number of address bits changing in the address range
> [pci_addr..pci_addr + size - 1].
> 
> For this example, this creates the following constraints:
> 1) The offset into the controller physical memory allocated for a
>    mapping depends on the mapping size *and* the starting PCI address
>    for the mapping.
> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>    the offset needed into the allocated physical memory, which can end
>    up being a smaller size than the desired mapping size.
> 
> Handling these constraints independently of the controller being used in
> a PCI EP function driver is not possible with the current EPC API as
> it only provides the ->align field in struct pci_epc_features.
> Furthermore, this alignment is static and does not depend on a mapping
> pci address and size.
> 
> Solve this by introducing the function pci_epc_map_align() and the
> endpoint controller operation ->map_align to allow endpoint function
> drivers to obtain the size and the offset into a controller address
> region that must be used to map an RC PCI address region. The size
> of the physical address region provided by pci_epc_map_align() can then
> be used as the size argument for the function pci_epc_mem_alloc_addr().
> The offset into the allocated controller memory can be used to
> correctly handle data transfers. Of note is that pci_epc_map_align() may
> indicate upon return a mapping size that is smaller (but not 0) than the
> requested PCI address region size. For such case, an endpoint function
> driver must handle data transfers in fragments.
> 

Is there any incentive in exposing pci_epc_map_align()? I mean, why can't it be
hidden inside the new alloc() API itself?

Furthermore, is it possible to avoid the map_align() callback and handle the
alignment within the EPC driver?

- Mani

> The controller operation ->map_align is optional: controllers that do
> not have any address alignment constraints for mapping a RC PCI address
> region do not need to implement this operation. For such controllers,
> pci_epc_map_align() always returns the mapping size as equal
> to the requested size and an offset equal to 0.
> 
> The structure pci_epc_map is introduced to represent a mapping start PCI
> address, size and the size and offset into the controller memory needed
> for mapping the PCI address region.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>  include/linux/pci-epc.h             | 33 +++++++++++++++
>  2 files changed, 99 insertions(+)
> 
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index 754afd115bbd..37758ca91d7f 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  }
>  EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>  
> +/**
> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
> + *			 address region needed to map a RC PCI address region
> + * @epc: the EPC device on which address is allocated
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @pci_addr: PCI address to which the physical address should be mapped
> + * @size: the size of the mapping starting from @pci_addr
> + * @map: populate here the actual size and offset into the controller memory
> + *       that must be allocated for the mapping
> + *
> + * Invoke the controller map_align operation to obtain the size and the offset
> + * into a controller address region that must be allocated to map @size
> + * bytes of the RC PCI address space starting from @pci_addr.
> + *
> + * The size of the mapping that can be handled by the controller is indicated
> + * using the pci_size field of @map. This size may be smaller than the requested
> + * @size. In such case, the function driver must handle the mapping using
> + * several fragments. The offset into the controller memory for the effective
> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
> + * using the map_ofst field of @map.
> + */
> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
> +{
> +	const struct pci_epc_features *features;
> +	size_t mask;
> +	int ret;
> +
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> +		return -EINVAL;
> +
> +	if (!size || !map)
> +		return -EINVAL;
> +
> +	memset(map, 0, sizeof(*map));
> +	map->pci_addr = pci_addr;
> +	map->pci_size = size;
> +
> +	if (epc->ops->map_align) {
> +		mutex_lock(&epc->lock);
> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
> +		mutex_unlock(&epc->lock);
> +		return ret;
> +	}
> +
> +	/*
> +	 * Assume a fixed alignment constraint as specified by the controller
> +	 * features.
> +	 */
> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
> +	if (!features || !features->align) {
> +		map->map_pci_addr = pci_addr;
> +		map->map_size = size;
> +		map->map_ofst = 0;

These values are overwritten anyway below.

> +	}
> +
> +	mask = features->align - 1;
> +	map->map_pci_addr = map->pci_addr & ~mask;
> +	map->map_ofst = map->pci_addr & mask;
> +	map->map_size = ALIGN(map->map_ofst + map->pci_size, features->align);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_map_align);
> +
>  /**
>   * pci_epc_map_addr() - map CPU address to PCI address
>   * @epc: the EPC device on which address is allocated
> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
> index cc2f70d061c8..8cfb4aaf2628 100644
> --- a/include/linux/pci-epc.h
> +++ b/include/linux/pci-epc.h
> @@ -32,11 +32,40 @@ pci_epc_interface_string(enum pci_epc_interface_type type)
>  	}
>  }
>  
> +/**
> + * struct pci_epc_map - information about EPC memory for mapping a RC PCI
> + *                      address range
> + * @pci_addr: start address of the RC PCI address range to map
> + * @pci_size: size of the RC PCI address range to map
> + * @map_pci_addr: RC PCI address used as the first address mapped
> + * @map_size: size of the controller memory needed for the mapping
> + * @map_ofst: offset into the controller memory needed for the mapping
> + * @phys_base: base physical address of the allocated EPC memory
> + * @phys_addr: physical address at which @pci_addr is mapped
> + * @virt_base: base virtual address of the allocated EPC memory
> + * @virt_addr: virtual address at which @pci_addr is mapped
> + */
> +struct pci_epc_map {
> +	phys_addr_t	pci_addr;
> +	size_t		pci_size;
> +
> +	phys_addr_t	map_pci_addr;
> +	size_t		map_size;
> +	phys_addr_t	map_ofst;
> +
> +	phys_addr_t	phys_base;
> +	phys_addr_t	phys_addr;
> +	void __iomem	*virt_base;
> +	void __iomem	*virt_addr;
> +};
> +
>  /**
>   * struct pci_epc_ops - set of function pointers for performing EPC operations
>   * @write_header: ops to populate configuration space header
>   * @set_bar: ops to configure the BAR
>   * @clear_bar: ops to reset the BAR
> + * @map_align: operation to get the size and offset into a controller memory
> + *             window needed to map an RC PCI address region
>   * @map_addr: ops to map CPU address to PCI address
>   * @unmap_addr: ops to unmap CPU address and PCI address
>   * @set_msi: ops to set the requested number of MSI interrupts in the MSI
> @@ -61,6 +90,8 @@ struct pci_epc_ops {
>  			   struct pci_epf_bar *epf_bar);
>  	void	(*clear_bar)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  			     struct pci_epf_bar *epf_bar);
> +	int	(*map_align)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			    struct pci_epc_map *map);
>  	int	(*map_addr)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  			    phys_addr_t addr, u64 pci_addr, size_t size);
>  	void	(*unmap_addr)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> @@ -234,6 +265,8 @@ int pci_epc_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  		    struct pci_epf_bar *epf_bar);
>  void pci_epc_clear_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  		       struct pci_epf_bar *epf_bar);
> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		      u64 pci_addr, size_t size, struct pci_epc_map *map);
>  int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  		     phys_addr_t phys_addr,
>  		     u64 pci_addr, size_t size);
> -- 
> 2.44.0
>
Manivannan Sadhasivam April 3, 2024, 7:47 a.m. UTC | #4
On Sat, Mar 30, 2024 at 01:19:15PM +0900, Damien Le Moal wrote:
> Replace the call to cancel_delayed_work() with a call to
> cancel_delayed_work_sync() in pci_epf_test_unbind(). This ensures that
> the command handler is really stopped when proceeding with dma and bar
> cleanup.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

- Mani

> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> ---
>  drivers/pci/endpoint/functions/pci-epf-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> index 0e285e539538..ab40c3182677 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> @@ -709,7 +709,7 @@ static void pci_epf_test_unbind(struct pci_epf *epf)
>  	struct pci_epf_bar *epf_bar;
>  	int bar;
>  
> -	cancel_delayed_work(&epf_test->cmd_handler);
> +	cancel_delayed_work_sync(&epf_test->cmd_handler);
>  	pci_epf_test_clean_dma_chan(epf_test);
>  	for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
>  		epf_bar = &epf->bar[bar];
> -- 
> 2.44.0
>
Manivannan Sadhasivam April 3, 2024, 7:48 a.m. UTC | #5
On Sat, Mar 30, 2024 at 01:19:16PM +0900, Damien Le Moal wrote:
> Implement the link_down event operation to stop the command execution
> delayed work when the endpoint controller notifies a link down event.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>

This patch is already part of another series I posted [1] and under review. So
this can be dropped.

- Mani

[1] https://lore.kernel.org/linux-pci/20240401-pci-epf-rework-v2-9-970dbe90b99d@linaro.org/

> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> ---
>  drivers/pci/endpoint/functions/pci-epf-test.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> index ab40c3182677..e6d4e1747c9f 100644
> --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> @@ -824,9 +824,19 @@ static int pci_epf_test_link_up(struct pci_epf *epf)
>  	return 0;
>  }
>  
> +static int pci_epf_test_link_down(struct pci_epf *epf)
> +{
> +	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
> +
> +	cancel_delayed_work_sync(&epf_test->cmd_handler);
> +
> +	return 0;
> +}
> +
>  static const struct pci_epc_event_ops pci_epf_test_event_ops = {
>  	.core_init = pci_epf_test_core_init,
>  	.link_up = pci_epf_test_link_up,
> +	.link_down = pci_epf_test_link_down,
>  };
>  
>  static int pci_epf_test_alloc_space(struct pci_epf *epf)
> -- 
> 2.44.0
>
Manivannan Sadhasivam April 3, 2024, 7:50 a.m. UTC | #6
On Sat, Mar 30, 2024 at 01:19:10PM +0900, Damien Le Moal wrote:
> This series introduces the new functions pci_epc_map_align(),
> pci_epc_mem_map() and pci_epc_mem_unmap() to improve handling of the
> PCI address mapping alignment constraints of endpoint controllers in a
> controller independent manner.
> 
> The issue fixed is that the fixed alignment defined by the "align" field
> of struct pci_epc_features assumes that the alignment of the endpoint
> memory used to map a RC PCI address range is independent of the PCI
> address being mapped. But that is not the case for the rk3399 SoC
> controller: in endpoint mode, this controller uses the lower bits of the
> local endpoint memory address as the lower bits for the PCI addresses
> for data transfers. That is, when mapping local memory, one must take
> into account the number of bits of the RC PCI address that change from
> the start address of the mapping.
> 
> To fix this, the new endpoint controller method .map_align is introduced
> and called from pci_epc_map_align(). This method is optional and for
> controllers that do not define it, the mapping information returned
> is based of the fixed alignment constraint as defined by the align
> feature.
> 
> The functions pci_epc_mem_map() is a helper function which obtains
> mapping information, allocates endpoint controller memory according to
> the mapping size obtained and maps the memory. pci_epc_mem_map() unmaps
> and frees the endpoint memory.
> 
> This series is organized as follows:
>  - Patch 1 tidy up the epc core code
>  - Patch 2 and 3 introduce the new map_align endpoint controller method
>    and related epc functions.
>  - Patch 4 to 6 modify the test endpoint driver to use these new
>    functions and improve the code of this driver.

While posting the next version, please split the endpoint patches into a
separate series. It helps in code review and can be applied separately.

- Mani

>  - Finally, Patch 7 to 18 fix the rk3399 endpoint driver, defining a
>    .map_align method for it and improving its overall code readability
>    and features.
> 
> Changes from v1:
>  - Changed pci_epc_check_func() to pci_epc_function_is_valid() in patch
>    1.
>  - Removed patch "PCI: endpoint: Improve pci_epc_mem_alloc_addr()"
>    (former patch 2 of v1)
>  - Various typos cleanups all over. Also fixed some blank space
>    indentation.
>  - Added review tags
> 
> Damien Le Moal (17):
>   PCI: endpoint: Introduce pci_epc_function_is_valid()
>   PCI: endpoint: Introduce pci_epc_map_align()
>   PCI: endpoint: Introduce pci_epc_mem_map()/unmap()
>   PCI: endpoint: test: Use pci_epc_mem_map/unmap()
>   PCI: endpoint: test: Synchronously cancel command handler work
>   PCI: endpoint: test: Implement link_down event operation
>   PCI: rockchip-ep: Fix address translation unit programming
>   PCI: rockchip-ep: Use a macro to define EP controller .align feature
>   PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
>   PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
>   PCI: rockchip-ep: Implement the map_align endpoint controller operation
>   PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
>   PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
>   PCI: rockchip-ep: Refactor endpoint link training enable
>   PCI: rockship-ep: Introduce rockchip_pcie_ep_stop()
>   PCI: rockchip-ep: Improve link training
>   PCI: rockchip-ep: Handle PERST# signal in endpoint mode
> 
> Wilfred Mallawa (1):
>   dt-bindings: pci: rockchip,rk3399-pcie-ep: Add ep-gpios property
> 
>  .../bindings/pci/rockchip,rk3399-pcie-ep.yaml |   3 +
>  drivers/pci/controller/pcie-rockchip-ep.c     | 393 ++++++++++++++----
>  drivers/pci/controller/pcie-rockchip.c        |  17 +-
>  drivers/pci/controller/pcie-rockchip.h        |  22 +
>  drivers/pci/endpoint/functions/pci-epf-test.c | 390 +++++++++--------
>  drivers/pci/endpoint/pci-epc-core.c           | 213 +++++++---
>  include/linux/pci-epc.h                       |  39 ++
>  7 files changed, 768 insertions(+), 309 deletions(-)
> 
> -- 
> 2.44.0
>
Damien Le Moal April 3, 2024, 7:54 a.m. UTC | #7
On 4/3/24 16:45, Manivannan Sadhasivam wrote:
> On Sat, Mar 30, 2024 at 01:19:12PM +0900, Damien Le Moal wrote:
>> Some endpoint controllers have requirements on the alignment of the
>> controller physical memory address that must be used to map a RC PCI
>> address region. For instance, the rockchip endpoint controller uses
>> at most the lower 20 bits of a physical memory address region as the
>> lower bits of an RC PCI address. For mapping a PCI address region of
>> size bytes starting from pci_addr, the exact number of address bits
>> used is the number of address bits changing in the address range
>> [pci_addr..pci_addr + size - 1].
>>
>> For this example, this creates the following constraints:
>> 1) The offset into the controller physical memory allocated for a
>>    mapping depends on the mapping size *and* the starting PCI address
>>    for the mapping.
>> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>>    the offset needed into the allocated physical memory, which can end
>>    up being a smaller size than the desired mapping size.
>>
>> Handling these constraints independently of the controller being used in
>> a PCI EP function driver is not possible with the current EPC API as
>> it only provides the ->align field in struct pci_epc_features.
>> Furthermore, this alignment is static and does not depend on a mapping
>> pci address and size.
>>
>> Solve this by introducing the function pci_epc_map_align() and the
>> endpoint controller operation ->map_align to allow endpoint function
>> drivers to obtain the size and the offset into a controller address
>> region that must be used to map an RC PCI address region. The size
>> of the physical address region provided by pci_epc_map_align() can then
>> be used as the size argument for the function pci_epc_mem_alloc_addr().
>> The offset into the allocated controller memory can be used to
>> correctly handle data transfers. Of note is that pci_epc_map_align() may
>> indicate upon return a mapping size that is smaller (but not 0) than the
>> requested PCI address region size. For such case, an endpoint function
>> driver must handle data transfers in fragments.
>>
> 
> Is there any incentive in exposing pci_epc_map_align()? I mean, why can't it be
> hidden inside the new alloc() API itself?

I could drop pci_epc_map_align(), but the idea here was to have an API that is
not restrictive. E.g., a function driver could allocate memory, keep it and
repetedly use map_align and map() function to remap it to different PCI
addresses. With your suggestion, that would not be possible.

> 
> Furthermore, is it possible to avoid the map_align() callback and handle the
> alignment within the EPC driver?

I am not so sure that this is possible because handling the alignment can
potentially result in changing the amount of memory to allocate, based on the
PCI address also. So the allocation API would need to change, a lot.

>> +	/*
>> +	 * Assume a fixed alignment constraint as specified by the controller
>> +	 * features.
>> +	 */
>> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
>> +	if (!features || !features->align) {
>> +		map->map_pci_addr = pci_addr;
>> +		map->map_size = size;
>> +		map->map_ofst = 0;
> 
> These values are overwritten anyway below.

Looks like "return" got dropped. Bug. Will re-add it.
Damien Le Moal April 3, 2024, 7:58 a.m. UTC | #8
On 4/3/24 16:50, Manivannan Sadhasivam wrote:
> On Sat, Mar 30, 2024 at 01:19:10PM +0900, Damien Le Moal wrote:
>> This series introduces the new functions pci_epc_map_align(),
>> pci_epc_mem_map() and pci_epc_mem_unmap() to improve handling of the
>> PCI address mapping alignment constraints of endpoint controllers in a
>> controller independent manner.
>>
>> The issue fixed is that the fixed alignment defined by the "align" field
>> of struct pci_epc_features assumes that the alignment of the endpoint
>> memory used to map a RC PCI address range is independent of the PCI
>> address being mapped. But that is not the case for the rk3399 SoC
>> controller: in endpoint mode, this controller uses the lower bits of the
>> local endpoint memory address as the lower bits for the PCI addresses
>> for data transfers. That is, when mapping local memory, one must take
>> into account the number of bits of the RC PCI address that change from
>> the start address of the mapping.
>>
>> To fix this, the new endpoint controller method .map_align is introduced
>> and called from pci_epc_map_align(). This method is optional and for
>> controllers that do not define it, the mapping information returned
>> is based of the fixed alignment constraint as defined by the align
>> feature.
>>
>> The functions pci_epc_mem_map() is a helper function which obtains
>> mapping information, allocates endpoint controller memory according to
>> the mapping size obtained and maps the memory. pci_epc_mem_map() unmaps
>> and frees the endpoint memory.
>>
>> This series is organized as follows:
>>  - Patch 1 tidy up the epc core code
>>  - Patch 2 and 3 introduce the new map_align endpoint controller method
>>    and related epc functions.
>>  - Patch 4 to 6 modify the test endpoint driver to use these new
>>    functions and improve the code of this driver.
> 
> While posting the next version, please split the endpoint patches into a
> separate series. It helps in code review and can be applied separately.

Which patches ? They are all endpoint related:
 (1) Core code
 (2) test function driver
 (3) rockchip rk3399 controller

(2) and (3) depend on the patches in (1), so splitting the series is a big
possible only if (1) is applied first, so that is a source of delays and breaks
the context of the patches...
Manivannan Sadhasivam April 3, 2024, 9:21 a.m. UTC | #9
On Wed, Apr 03, 2024 at 04:54:32PM +0900, Damien Le Moal wrote:
> On 4/3/24 16:45, Manivannan Sadhasivam wrote:
> > On Sat, Mar 30, 2024 at 01:19:12PM +0900, Damien Le Moal wrote:
> >> Some endpoint controllers have requirements on the alignment of the
> >> controller physical memory address that must be used to map a RC PCI
> >> address region. For instance, the rockchip endpoint controller uses
> >> at most the lower 20 bits of a physical memory address region as the
> >> lower bits of an RC PCI address. For mapping a PCI address region of
> >> size bytes starting from pci_addr, the exact number of address bits
> >> used is the number of address bits changing in the address range
> >> [pci_addr..pci_addr + size - 1].
> >>
> >> For this example, this creates the following constraints:
> >> 1) The offset into the controller physical memory allocated for a
> >>    mapping depends on the mapping size *and* the starting PCI address
> >>    for the mapping.
> >> 2) A mapping size cannot exceed the controller windows size (1MB) minus
> >>    the offset needed into the allocated physical memory, which can end
> >>    up being a smaller size than the desired mapping size.
> >>
> >> Handling these constraints independently of the controller being used in
> >> a PCI EP function driver is not possible with the current EPC API as
> >> it only provides the ->align field in struct pci_epc_features.
> >> Furthermore, this alignment is static and does not depend on a mapping
> >> pci address and size.
> >>
> >> Solve this by introducing the function pci_epc_map_align() and the
> >> endpoint controller operation ->map_align to allow endpoint function
> >> drivers to obtain the size and the offset into a controller address
> >> region that must be used to map an RC PCI address region. The size
> >> of the physical address region provided by pci_epc_map_align() can then
> >> be used as the size argument for the function pci_epc_mem_alloc_addr().
> >> The offset into the allocated controller memory can be used to
> >> correctly handle data transfers. Of note is that pci_epc_map_align() may
> >> indicate upon return a mapping size that is smaller (but not 0) than the
> >> requested PCI address region size. For such case, an endpoint function
> >> driver must handle data transfers in fragments.
> >>
> > 
> > Is there any incentive in exposing pci_epc_map_align()? I mean, why can't it be
> > hidden inside the new alloc() API itself?
> 
> I could drop pci_epc_map_align(), but the idea here was to have an API that is
> not restrictive. E.g., a function driver could allocate memory, keep it and
> repetedly use map_align and map() function to remap it to different PCI
> addresses. With your suggestion, that would not be possible.
> 

Is there any requirement currently? If not, let's try to introduce it when the
actual requirement comes.

> > 
> > Furthermore, is it possible to avoid the map_align() callback and handle the
> > alignment within the EPC driver?
> 
> I am not so sure that this is possible because handling the alignment can
> potentially result in changing the amount of memory to allocate, based on the
> PCI address also. So the allocation API would need to change, a lot.
> 

Hmm, looking at patch 11/18, I think it might become complicated.

- Mani

> >> +	/*
> >> +	 * Assume a fixed alignment constraint as specified by the controller
> >> +	 * features.
> >> +	 */
> >> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
> >> +	if (!features || !features->align) {
> >> +		map->map_pci_addr = pci_addr;
> >> +		map->map_size = size;
> >> +		map->map_ofst = 0;
> > 
> > These values are overwritten anyway below.
> 
> Looks like "return" got dropped. Bug. Will re-add it.
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research
>
Manivannan Sadhasivam April 3, 2024, 9:25 a.m. UTC | #10
On Wed, Apr 03, 2024 at 04:58:42PM +0900, Damien Le Moal wrote:
> On 4/3/24 16:50, Manivannan Sadhasivam wrote:
> > On Sat, Mar 30, 2024 at 01:19:10PM +0900, Damien Le Moal wrote:
> >> This series introduces the new functions pci_epc_map_align(),
> >> pci_epc_mem_map() and pci_epc_mem_unmap() to improve handling of the
> >> PCI address mapping alignment constraints of endpoint controllers in a
> >> controller independent manner.
> >>
> >> The issue fixed is that the fixed alignment defined by the "align" field
> >> of struct pci_epc_features assumes that the alignment of the endpoint
> >> memory used to map a RC PCI address range is independent of the PCI
> >> address being mapped. But that is not the case for the rk3399 SoC
> >> controller: in endpoint mode, this controller uses the lower bits of the
> >> local endpoint memory address as the lower bits for the PCI addresses
> >> for data transfers. That is, when mapping local memory, one must take
> >> into account the number of bits of the RC PCI address that change from
> >> the start address of the mapping.
> >>
> >> To fix this, the new endpoint controller method .map_align is introduced
> >> and called from pci_epc_map_align(). This method is optional and for
> >> controllers that do not define it, the mapping information returned
> >> is based of the fixed alignment constraint as defined by the align
> >> feature.
> >>
> >> The functions pci_epc_mem_map() is a helper function which obtains
> >> mapping information, allocates endpoint controller memory according to
> >> the mapping size obtained and maps the memory. pci_epc_mem_map() unmaps
> >> and frees the endpoint memory.
> >>
> >> This series is organized as follows:
> >>  - Patch 1 tidy up the epc core code
> >>  - Patch 2 and 3 introduce the new map_align endpoint controller method
> >>    and related epc functions.
> >>  - Patch 4 to 6 modify the test endpoint driver to use these new
> >>    functions and improve the code of this driver.
> > 
> > While posting the next version, please split the endpoint patches into a
> > separate series. It helps in code review and can be applied separately.
> 
> Which patches ? They are all endpoint related:
>  (1) Core code
>  (2) test function driver

Till patch 6, that's why I inlined my reply at the 3rd point.

>  (3) rockchip rk3399 controller
> 
> (2) and (3) depend on the patches in (1), so splitting the series is a big
> possible only if (1) is applied first, so that is a source of delays and breaks
> the context of the patches...
> 

If you split patches 1-6 and post the rest of the Rockchip patches as a follow
up, it perfectly makes sense.

- Mani
Manivannan Sadhasivam April 3, 2024, 9:48 a.m. UTC | #11
On Sat, Mar 30, 2024 at 01:19:13PM +0900, Damien Le Moal wrote:
> Introduce the function pci_epc_mem_map() to facilitate controller memory
> address allocation and mapping to a RC PCI address region in endpoint
> function drivers.
> 
> This function first uses pci_epc_map_align() to determine the controller
> memory address alignment (offset and size) constraints. The result of
> this function is used to allocate a controller physical memory region
> using pci_epc_mem_alloc_addr() and map it to the RC PCI address
> space with pci_epc_map_addr(). Since pci_epc_map_align() may indicate
> that a mapping can be smaller than the requested size, pci_epc_mem_map()
> may only partially map the RC PCI address region specified and return
> a smaller size for the effective mapping.
> 
> The counterpart of pci_epc_mem_map() to unmap and free the controller
> memory address region is pci_epc_mem_unmap().
> 
> Both functions operate using struct pci_epc_map data structure which is
> extended to contain the physical and virtual addresses of the allocated
> controller memory. Endpoint function drivers can use struct pci_epc_map
> to implement read/write accesses within the mapped RC PCI address region
> using the ->virt_addr and ->size fields.
> 
> This commit contains contributions from Rick Wertenbroek
> <rick.wertenbroek@gmail.com>.
> 

Adding 'Co-developed-by && Signed-off-by' tags would give the due credit.

> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  drivers/pci/endpoint/pci-epc-core.c | 68 +++++++++++++++++++++++++++++
>  include/linux/pci-epc.h             |  6 +++
>  2 files changed, 74 insertions(+)
> 
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index 37758ca91d7f..0095b54bdf9e 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -530,6 +530,74 @@ int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  }
>  EXPORT_SYMBOL_GPL(pci_epc_map_addr);
>  
> +/**
> + * pci_epc_mem_map() - allocate and map CPU address to PCI address

How about, 'pci_epc_alloc_map()'? I think the 'mem' prefix was added to the
existing APIs since the function definitions are in pci-epc-mem driver, but
not needed here.

> + * @epc: the EPC device on which the CPU address is to be allocated and mapped
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @pci_addr: PCI address to which the CPU address should be mapped
> + * @size: the number of bytes to map starting from @pci_addr
> + * @map: where to return the mapping information
> + *
> + * Allocate a controller physical address region and map it to a RC PCI address

"Allocate an EPC address space region..."

> + * region, taking into account the controller physical address mapping
> + * constraints (if any). Returns the effective size of the mapping, which may

Return value should be specified separately for Kdoc.

> + * be less than @size, or a negative error code in case of error.
> + */
> +ssize_t pci_epc_mem_map(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			u64 pci_addr, size_t size, struct pci_epc_map *map)
> +{
> +	int ret;
> +
> +	ret = pci_epc_map_align(epc, func_no, vfunc_no, pci_addr, size, map);
> +	if (ret)
> +		return ret;
> +
> +	map->virt_base = pci_epc_mem_alloc_addr(epc, &map->phys_base,
> +						map->map_size);

It'd be nice to move pci_epc_map_align() inside the existing
pci_epc_mem_alloc_addr() API to make sure that the allocated memory follows the
constraints of the EPC.

Would that make sense?

- Mani

> +	if (!map->virt_base)
> +		return -ENOMEM;
> +
> +	map->phys_addr = map->phys_base + map->map_ofst;
> +	map->virt_addr = map->virt_base + map->map_ofst;
> +
> +	ret = pci_epc_map_addr(epc, func_no, vfunc_no, map->phys_base,
> +			       map->map_pci_addr, map->map_size);
> +	if (ret) {
> +		pci_epc_mem_free_addr(epc, map->phys_base, map->virt_base,
> +				      map->map_size);
> +		return ret;
> +	}
> +
> +	return map->pci_size;
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_mem_map);
> +
> +/**
> + * pci_epc_mem_unmap() - unmap from PCI address and free a CPU address region
> + * @epc: the EPC device on which the CPU address is allocated and mapped
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @map: the mapping information
> + *
> + * Allocate and map local CPU address to a PCI address, accounting for the
> + * controller local CPU address alignment constraints.
> + */
> +void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		       struct pci_epc_map *map)
> +{
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> +		return;
> +
> +	if (!map || !map->pci_size)
> +		return;
> +
> +	pci_epc_unmap_addr(epc, func_no, vfunc_no, map->phys_base);
> +	pci_epc_mem_free_addr(epc, map->phys_base, map->virt_base,
> +			      map->map_size);
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_mem_unmap);
> +
>  /**
>   * pci_epc_clear_bar() - reset the BAR
>   * @epc: the EPC device for which the BAR has to be cleared
> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
> index 8cfb4aaf2628..86397a500b54 100644
> --- a/include/linux/pci-epc.h
> +++ b/include/linux/pci-epc.h
> @@ -304,4 +304,10 @@ void __iomem *pci_epc_mem_alloc_addr(struct pci_epc *epc,
>  				     phys_addr_t *phys_addr, size_t size);
>  void pci_epc_mem_free_addr(struct pci_epc *epc, phys_addr_t phys_addr,
>  			   void __iomem *virt_addr, size_t size);
> +
> +ssize_t pci_epc_mem_map(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			u64 pci_addr, size_t size, struct pci_epc_map *map);
> +void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		       struct pci_epc_map *map);
> +
>  #endif /* __LINUX_PCI_EPC_H */
> -- 
> 2.44.0
>
Rick Wertenbroek April 3, 2024, 11:54 a.m. UTC | #12
On Sat, Mar 30, 2024 at 5:20 AM Damien Le Moal <dlemoal@kernel.org> wrote:
>
> The Rockchip rk339 technical reference manual describe the endpoint mode
> link training process clearly and states that:
>   Insure link training completion and success by observing link_st field
>   in PCIe Client BASIC_STATUS1 register change to 2'b11. If both side
>   support PCIe Gen2 speed, re-train can be Initiated by asserting the
>   Retrain Link field in Link Control and Status Register. The software
>   should insure the BASIC_STATUS0[negotiated_speed] changes to "1", that
>   indicates re-train to Gen2 successfully.
> This procedure is very similar to what is done for the root-port mode in
> rockchip_pcie_host_init_port().
>
> Implement this link training procedure for the endpoint mode as well.
> Given that the rk3399 SoC does not have an interrupt signaling link
> status changes, training is implemented as a delayed work which is
> rescheduled until the link training completes or the endpoint controller
> is stopped. The link training work is first scheduled in
> rockchip_pcie_ep_start() when the endpoint function is started. Link
> training completion is signaled to the function using pci_epc_linkup().
> Accordingly, the linkup_notifier field of the rockchip pci_epc_features
> structure is changed to true.
>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  drivers/pci/controller/pcie-rockchip-ep.c | 79 ++++++++++++++++++++++-
>  drivers/pci/controller/pcie-rockchip.h    | 11 ++++
>  2 files changed, 89 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
> index 2767e8f1771d..4006e7dee71a 100644
> --- a/drivers/pci/controller/pcie-rockchip-ep.c
> +++ b/drivers/pci/controller/pcie-rockchip-ep.c
> @@ -16,6 +16,8 @@
>  #include <linux/platform_device.h>
>  #include <linux/pci-epf.h>
>  #include <linux/sizes.h>
> +#include <linux/workqueue.h>
> +#include <linux/iopoll.h>
>
>  #include "pcie-rockchip.h"
>
> @@ -48,6 +50,7 @@ struct rockchip_pcie_ep {
>         u64                     irq_pci_addr;
>         u8                      irq_pci_fn;
>         u8                      irq_pending;
> +       struct delayed_work     link_training;
>  };
>
>  static void rockchip_pcie_clear_ep_ob_atu(struct rockchip_pcie *rockchip,
> @@ -467,6 +470,8 @@ static int rockchip_pcie_ep_start(struct pci_epc *epc)
>                             PCIE_CLIENT_CONF_ENABLE,
>                             PCIE_CLIENT_CONFIG);
>
> +       schedule_delayed_work(&ep->link_training, 0);
> +
>         return 0;
>  }
>
> @@ -475,6 +480,8 @@ static void rockchip_pcie_ep_stop(struct pci_epc *epc)
>         struct rockchip_pcie_ep *ep = epc_get_drvdata(epc);
>         struct rockchip_pcie *rockchip = &ep->rockchip;
>
> +       cancel_delayed_work_sync(&ep->link_training);
> +
>         /* Stop link training and disable configuration */
>         rockchip_pcie_write(rockchip,
>                             PCIE_CLIENT_CONF_DISABLE |
> @@ -482,8 +489,77 @@ static void rockchip_pcie_ep_stop(struct pci_epc *epc)
>                             PCIE_CLIENT_CONFIG);
>  }
>
> +static void rockchip_pcie_ep_retrain_link(struct rockchip_pcie *rockchip)
> +{
> +       u32 status;
> +
> +       status = rockchip_pcie_read(rockchip, PCIE_EP_CONFIG_LCS);
> +       status |= PCI_EXP_LNKCTL_RL;
> +       rockchip_pcie_write(rockchip, status, PCIE_EP_CONFIG_LCS);
> +}
> +
> +static bool rockchip_pcie_ep_link_up(struct rockchip_pcie *rockchip)
> +{
> +       u32 val = rockchip_pcie_read(rockchip, PCIE_CLIENT_BASIC_STATUS1);
> +
> +       return PCIE_LINK_UP(val);
> +}
> +
> +static void rockchip_pcie_ep_link_training(struct work_struct *work)
> +{
> +       struct rockchip_pcie_ep *ep =
> +               container_of(work, struct rockchip_pcie_ep, link_training.work);
> +       struct rockchip_pcie *rockchip = &ep->rockchip;
> +       struct device *dev = rockchip->dev;
> +       u32 val;
> +       int ret;
> +
> +       /* Enable Gen1 training and wait for its completion */
> +       ret = readl_poll_timeout(rockchip->apb_base + PCIE_CORE_CTRL,
> +                                val, PCIE_LINK_TRAINING_DONE(val), 50,
> +                                LINK_TRAIN_TIMEOUT);
> +       if (ret)
> +               goto again;
> +
> +       /* Make sure that the link is up */
> +       ret = readl_poll_timeout(rockchip->apb_base + PCIE_CLIENT_BASIC_STATUS1,
> +                                val, PCIE_LINK_UP(val), 50,
> +                                LINK_TRAIN_TIMEOUT);
> +       if (ret)
> +               goto again;
> +
> +       /* Check the current speed */
> +       val = rockchip_pcie_read(rockchip, PCIE_CORE_CTRL);
> +       if (!PCIE_LINK_IS_GEN2(val) && rockchip->link_gen == 2) {
> +               /* Enable retrain for gen2 */
> +               rockchip_pcie_ep_retrain_link(rockchip);
> +               readl_poll_timeout(rockchip->apb_base + PCIE_CORE_CTRL,
> +                                  val, PCIE_LINK_IS_GEN2(val), 50,
> +                                  LINK_TRAIN_TIMEOUT);
> +       }
> +
> +       /* Check again that the link is up */
> +       if (!rockchip_pcie_ep_link_up(rockchip))
> +               goto again;
> +
> +       val = rockchip_pcie_read(rockchip, PCIE_CLIENT_BASIC_STATUS0);
> +       dev_info(dev,
> +                "Link UP (Negociated speed: %sGT/s, width: x%lu)\n",
> +                (val & PCIE_CLIENT_NEG_LINK_SPEED) ? "5" : "2.5",
> +                ((val & PCIE_CLIENT_NEG_LINK_WIDTH_MASK) >>
> +                 PCIE_CLIENT_NEG_LINK_WIDTH_SHIFT) << 1);
> +

This does not print the correct link width for x1 :

# [   60.518339] rockchip-pcie-ep fd000000.pcie-ep: Link UP
(Negociated speed: 5GT/s, width: x0)

This is because :

((val & PCIE_CLIENT_NEG_LINK_WIDTH_MASK) >>
 PCIE_CLIENT_NEG_LINK_WIDTH_SHIFT) << 1
will print 0 if the link width is 1, because bits 7:6 are 0b00, and
0b00 << 1 is still 0. (0b00 => x0, 0b01 => x2, 0b10 => x4)

Therefore the formula should be :
1 << ((val & PCIE_CLIENT_NEG_LINK_WIDTH_MASK) >>
 PCIE_CLIENT_NEG_LINK_WIDTH_SHIFT)
This shows the correct link width for all cases (0b00 => x1, 0b01 =>
x2, 0b10 => x4).

Reference : RK3399 TRM V1.3 pages 768-769 PCIE_CLIENT_BASIC_STATUS0
register description


> +       /* Notify the function */
> +       pci_epc_linkup(ep->epc);
> +
> +       return;
> +
> +again:
> +       schedule_delayed_work(&ep->link_training, msecs_to_jiffies(5));
> +}
> +
>  static const struct pci_epc_features rockchip_pcie_epc_features = {
> -       .linkup_notifier = false,
> +       .linkup_notifier = true,
>         .msi_capable = true,
>         .msix_capable = false,
>         .align = ROCKCHIP_PCIE_AT_SIZE_ALIGN,
> @@ -644,6 +720,7 @@ static int rockchip_pcie_ep_probe(struct platform_device *pdev)
>         rockchip = &ep->rockchip;
>         rockchip->is_rc = false;
>         rockchip->dev = dev;
> +       INIT_DELAYED_WORK(&ep->link_training, rockchip_pcie_ep_link_training);
>
>         epc = devm_pci_epc_create(dev, &rockchip_pcie_epc_ops);
>         if (IS_ERR(epc)) {
> diff --git a/drivers/pci/controller/pcie-rockchip.h b/drivers/pci/controller/pcie-rockchip.h
> index 0263f158ee8d..3963b7097a91 100644
> --- a/drivers/pci/controller/pcie-rockchip.h
> +++ b/drivers/pci/controller/pcie-rockchip.h
> @@ -26,6 +26,7 @@
>  #define MAX_LANE_NUM                   4
>  #define MAX_REGION_LIMIT               32
>  #define MIN_EP_APERTURE                        28
> +#define LINK_TRAIN_TIMEOUT             (5000 * USEC_PER_MSEC)
>
>  #define PCIE_CLIENT_BASE               0x0
>  #define PCIE_CLIENT_CONFIG             (PCIE_CLIENT_BASE + 0x00)
> @@ -50,6 +51,10 @@
>  #define   PCIE_CLIENT_DEBUG_LTSSM_MASK         GENMASK(5, 0)
>  #define   PCIE_CLIENT_DEBUG_LTSSM_L1           0x18
>  #define   PCIE_CLIENT_DEBUG_LTSSM_L2           0x19
> +#define PCIE_CLIENT_BASIC_STATUS0      (PCIE_CLIENT_BASE + 0x44)
> +#define   PCIE_CLIENT_NEG_LINK_WIDTH_MASK      GENMASK(7, 6)
> +#define   PCIE_CLIENT_NEG_LINK_WIDTH_SHIFT     6
> +#define   PCIE_CLIENT_NEG_LINK_SPEED           BIT(5)
>  #define PCIE_CLIENT_BASIC_STATUS1      (PCIE_CLIENT_BASE + 0x48)
>  #define   PCIE_CLIENT_LINK_STATUS_UP           0x00300000
>  #define   PCIE_CLIENT_LINK_STATUS_MASK         0x00300000
> @@ -87,6 +92,8 @@
>
>  #define PCIE_CORE_CTRL_MGMT_BASE       0x900000
>  #define PCIE_CORE_CTRL                 (PCIE_CORE_CTRL_MGMT_BASE + 0x000)
> +#define   PCIE_CORE_PL_CONF_LS_MASK            0x00000001
> +#define   PCIE_CORE_PL_CONF_LS_READY           0x00000001
>  #define   PCIE_CORE_PL_CONF_SPEED_5G           0x00000008
>  #define   PCIE_CORE_PL_CONF_SPEED_MASK         0x00000018
>  #define   PCIE_CORE_PL_CONF_LANE_MASK          0x00000006
> @@ -144,6 +151,7 @@
>  #define PCIE_RC_CONFIG_BASE            0xa00000
>  #define PCIE_EP_CONFIG_BASE            0xa00000
>  #define PCIE_EP_CONFIG_DID_VID         (PCIE_EP_CONFIG_BASE + 0x00)
> +#define PCIE_EP_CONFIG_LCS             (PCIE_EP_CONFIG_BASE + 0xd0)
>  #define PCIE_RC_CONFIG_RID_CCR         (PCIE_RC_CONFIG_BASE + 0x08)
>  #define PCIE_RC_CONFIG_DCR             (PCIE_RC_CONFIG_BASE + 0xc4)
>  #define   PCIE_RC_CONFIG_DCR_CSPL_SHIFT                18
> @@ -155,6 +163,7 @@
>  #define PCIE_RC_CONFIG_LINK_CAP                (PCIE_RC_CONFIG_BASE + 0xcc)
>  #define   PCIE_RC_CONFIG_LINK_CAP_L0S          BIT(10)
>  #define PCIE_RC_CONFIG_LCS             (PCIE_RC_CONFIG_BASE + 0xd0)
> +#define PCIE_EP_CONFIG_LCS             (PCIE_EP_CONFIG_BASE + 0xd0)
>  #define PCIE_RC_CONFIG_L1_SUBSTATE_CTRL2 (PCIE_RC_CONFIG_BASE + 0x90c)
>  #define PCIE_RC_CONFIG_THP_CAP         (PCIE_RC_CONFIG_BASE + 0x274)
>  #define   PCIE_RC_CONFIG_THP_CAP_NEXT_MASK     GENMASK(31, 20)
> @@ -192,6 +201,8 @@
>  #define ROCKCHIP_VENDOR_ID                     0x1d87
>  #define PCIE_LINK_IS_L2(x) \
>         (((x) & PCIE_CLIENT_DEBUG_LTSSM_MASK) == PCIE_CLIENT_DEBUG_LTSSM_L2)
> +#define PCIE_LINK_TRAINING_DONE(x) \
> +       (((x) & PCIE_CORE_PL_CONF_LS_MASK) == PCIE_CORE_PL_CONF_LS_READY)
>  #define PCIE_LINK_UP(x) \
>         (((x) & PCIE_CLIENT_LINK_STATUS_MASK) == PCIE_CLIENT_LINK_STATUS_UP)
>  #define PCIE_LINK_IS_GEN2(x) \
> --
> 2.44.0
>

Tested-by: Rick Wertenbroek <rick.wertenbroek@gmail.com>

Best regards,
Rick
Kishon Vijay Abraham I April 3, 2024, 12:33 p.m. UTC | #13
Hi Damien,

On 3/30/2024 9:49 AM, Damien Le Moal wrote:
> Some endpoint controllers have requirements on the alignment of the
> controller physical memory address that must be used to map a RC PCI
> address region. For instance, the rockchip endpoint controller uses
> at most the lower 20 bits of a physical memory address region as the
> lower bits of an RC PCI address. For mapping a PCI address region of
> size bytes starting from pci_addr, the exact number of address bits
> used is the number of address bits changing in the address range
> [pci_addr..pci_addr + size - 1].
> 
> For this example, this creates the following constraints:
> 1) The offset into the controller physical memory allocated for a
>     mapping depends on the mapping size *and* the starting PCI address
>     for the mapping.
> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>     the offset needed into the allocated physical memory, which can end
>     up being a smaller size than the desired mapping size.
> 
> Handling these constraints independently of the controller being used in
> a PCI EP function driver is not possible with the current EPC API as
> it only provides the ->align field in struct pci_epc_features.
> Furthermore, this alignment is static and does not depend on a mapping
> pci address and size.
> 
> Solve this by introducing the function pci_epc_map_align() and the
> endpoint controller operation ->map_align to allow endpoint function
> drivers to obtain the size and the offset into a controller address
> region that must be used to map an RC PCI address region. The size
> of the physical address region provided by pci_epc_map_align() can then
> be used as the size argument for the function pci_epc_mem_alloc_addr().
> The offset into the allocated controller memory can be used to
> correctly handle data transfers. Of note is that pci_epc_map_align() may
> indicate upon return a mapping size that is smaller (but not 0) than the
> requested PCI address region size. For such case, an endpoint function
> driver must handle data transfers in fragments.
> 
> The controller operation ->map_align is optional: controllers that do
> not have any address alignment constraints for mapping a RC PCI address
> region do not need to implement this operation. For such controllers,
> pci_epc_map_align() always returns the mapping size as equal
> to the requested size and an offset equal to 0.
> 
> The structure pci_epc_map is introduced to represent a mapping start PCI
> address, size and the size and offset into the controller memory needed
> for mapping the PCI address region.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>   drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>   include/linux/pci-epc.h             | 33 +++++++++++++++
>   2 files changed, 99 insertions(+)
> 
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index 754afd115bbd..37758ca91d7f 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   }
>   EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>   
> +/**
> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
> + *			 address region needed to map a RC PCI address region
> + * @epc: the EPC device on which address is allocated
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @pci_addr: PCI address to which the physical address should be mapped
> + * @size: the size of the mapping starting from @pci_addr
> + * @map: populate here the actual size and offset into the controller memory
> + *       that must be allocated for the mapping
> + *
> + * Invoke the controller map_align operation to obtain the size and the offset
> + * into a controller address region that must be allocated to map @size
> + * bytes of the RC PCI address space starting from @pci_addr.
> + *
> + * The size of the mapping that can be handled by the controller is indicated
> + * using the pci_size field of @map. This size may be smaller than the requested
> + * @size. In such case, the function driver must handle the mapping using
> + * several fragments. The offset into the controller memory for the effective
> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
> + * using the map_ofst field of @map.
> + */
> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
> +{
> +	const struct pci_epc_features *features;
> +	size_t mask;
> +	int ret;
> +
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> +		return -EINVAL;
> +
> +	if (!size || !map)
> +		return -EINVAL;
> +
> +	memset(map, 0, sizeof(*map));
> +	map->pci_addr = pci_addr;
> +	map->pci_size = size;
> +
> +	if (epc->ops->map_align) {
> +		mutex_lock(&epc->lock);
> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
> +		mutex_unlock(&epc->lock);
> +		return ret;
> +	}
> +
> +	/*
> +	 * Assume a fixed alignment constraint as specified by the controller
> +	 * features.
> +	 */
> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
> +	if (!features || !features->align) {
> +		map->map_pci_addr = pci_addr;
> +		map->map_size = size;
> +		map->map_ofst = 0;
> +	}

The 'align' of pci_epc_features was initially added only to address the 
inbound ATU constraints. This is also added as comment in [1]. The PCI 
address restrictions (only fixed alignment constraint) were handled by 
the host side driver and depends on the connected endpoint device 
(atleast it was like that for pci_endpoint_test.c [2]).
So pci-epf-test.c used the 'align' in pci_epc_features only as part of 
pci_epf_alloc_space().

Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using 
it out of pci_epf_alloc_space(), I think we should keep the 'align' of 
pci_epc_features only within pci_epf_alloc_space() and controllers with 
any PCI address restrictions to implement ->map_align(). This could as 
well be done in a phased manner to let controllers implement 
->map_align() and then remove using  pci_epc_features in 
pci_epc_map_align(). Let me know what you think?

Thanks,
Kishon

[1] -> 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/pci-epc.h?h=v6.9-rc2#n187

[2] -> 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/misc/pci_endpoint_test.c?h=v6.9-rc2#n127
> +
> +	mask = features->align - 1;
> +	map->map_pci_addr = map->pci_addr & ~mask;
> +	map->map_ofst = map->pci_addr & mask;
> +	map->map_size = ALIGN(map->map_ofst + map->pci_size, features->align);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_map_align);
> +
>   /**
>    * pci_epc_map_addr() - map CPU address to PCI address
>    * @epc: the EPC device on which address is allocated
> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
> index cc2f70d061c8..8cfb4aaf2628 100644
> --- a/include/linux/pci-epc.h
> +++ b/include/linux/pci-epc.h
> @@ -32,11 +32,40 @@ pci_epc_interface_string(enum pci_epc_interface_type type)
>   	}
>   }
>   
> +/**
> + * struct pci_epc_map - information about EPC memory for mapping a RC PCI
> + *                      address range
> + * @pci_addr: start address of the RC PCI address range to map
> + * @pci_size: size of the RC PCI address range to map
> + * @map_pci_addr: RC PCI address used as the first address mapped
> + * @map_size: size of the controller memory needed for the mapping
> + * @map_ofst: offset into the controller memory needed for the mapping
> + * @phys_base: base physical address of the allocated EPC memory
> + * @phys_addr: physical address at which @pci_addr is mapped
> + * @virt_base: base virtual address of the allocated EPC memory
> + * @virt_addr: virtual address at which @pci_addr is mapped
> + */
> +struct pci_epc_map {
> +	phys_addr_t	pci_addr;
> +	size_t		pci_size;
> +
> +	phys_addr_t	map_pci_addr;
> +	size_t		map_size;
> +	phys_addr_t	map_ofst;
> +
> +	phys_addr_t	phys_base;
> +	phys_addr_t	phys_addr;
> +	void __iomem	*virt_base;
> +	void __iomem	*virt_addr;
> +};
> +
>   /**
>    * struct pci_epc_ops - set of function pointers for performing EPC operations
>    * @write_header: ops to populate configuration space header
>    * @set_bar: ops to configure the BAR
>    * @clear_bar: ops to reset the BAR
> + * @map_align: operation to get the size and offset into a controller memory
> + *             window needed to map an RC PCI address region
>    * @map_addr: ops to map CPU address to PCI address
>    * @unmap_addr: ops to unmap CPU address and PCI address
>    * @set_msi: ops to set the requested number of MSI interrupts in the MSI
> @@ -61,6 +90,8 @@ struct pci_epc_ops {
>   			   struct pci_epf_bar *epf_bar);
>   	void	(*clear_bar)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   			     struct pci_epf_bar *epf_bar);
> +	int	(*map_align)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			    struct pci_epc_map *map);
>   	int	(*map_addr)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   			    phys_addr_t addr, u64 pci_addr, size_t size);
>   	void	(*unmap_addr)(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> @@ -234,6 +265,8 @@ int pci_epc_set_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   		    struct pci_epf_bar *epf_bar);
>   void pci_epc_clear_bar(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   		       struct pci_epf_bar *epf_bar);
> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		      u64 pci_addr, size_t size, struct pci_epc_map *map);
>   int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>   		     phys_addr_t phys_addr,
>   		     u64 pci_addr, size_t size);
Damien Le Moal April 4, 2024, 2:43 a.m. UTC | #14
On 4/3/24 21:33, Kishon Vijay Abraham I wrote:
> Hi Damien,
> 
> On 3/30/2024 9:49 AM, Damien Le Moal wrote:
>> Some endpoint controllers have requirements on the alignment of the
>> controller physical memory address that must be used to map a RC PCI
>> address region. For instance, the rockchip endpoint controller uses
>> at most the lower 20 bits of a physical memory address region as the
>> lower bits of an RC PCI address. For mapping a PCI address region of
>> size bytes starting from pci_addr, the exact number of address bits
>> used is the number of address bits changing in the address range
>> [pci_addr..pci_addr + size - 1].
>>
>> For this example, this creates the following constraints:
>> 1) The offset into the controller physical memory allocated for a
>>     mapping depends on the mapping size *and* the starting PCI address
>>     for the mapping.
>> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>>     the offset needed into the allocated physical memory, which can end
>>     up being a smaller size than the desired mapping size.
>>
>> Handling these constraints independently of the controller being used in
>> a PCI EP function driver is not possible with the current EPC API as
>> it only provides the ->align field in struct pci_epc_features.
>> Furthermore, this alignment is static and does not depend on a mapping
>> pci address and size.
>>
>> Solve this by introducing the function pci_epc_map_align() and the
>> endpoint controller operation ->map_align to allow endpoint function
>> drivers to obtain the size and the offset into a controller address
>> region that must be used to map an RC PCI address region. The size
>> of the physical address region provided by pci_epc_map_align() can then
>> be used as the size argument for the function pci_epc_mem_alloc_addr().
>> The offset into the allocated controller memory can be used to
>> correctly handle data transfers. Of note is that pci_epc_map_align() may
>> indicate upon return a mapping size that is smaller (but not 0) than the
>> requested PCI address region size. For such case, an endpoint function
>> driver must handle data transfers in fragments.
>>
>> The controller operation ->map_align is optional: controllers that do
>> not have any address alignment constraints for mapping a RC PCI address
>> region do not need to implement this operation. For such controllers,
>> pci_epc_map_align() always returns the mapping size as equal
>> to the requested size and an offset equal to 0.
>>
>> The structure pci_epc_map is introduced to represent a mapping start PCI
>> address, size and the size and offset into the controller memory needed
>> for mapping the PCI address region.
>>
>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>> ---
>>   drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>>   include/linux/pci-epc.h             | 33 +++++++++++++++
>>   2 files changed, 99 insertions(+)
>>
>> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
>> index 754afd115bbd..37758ca91d7f 100644
>> --- a/drivers/pci/endpoint/pci-epc-core.c
>> +++ b/drivers/pci/endpoint/pci-epc-core.c
>> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>   }
>>   EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>   
>> +/**
>> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
>> + *			 address region needed to map a RC PCI address region
>> + * @epc: the EPC device on which address is allocated
>> + * @func_no: the physical endpoint function number in the EPC device
>> + * @vfunc_no: the virtual endpoint function number in the physical function
>> + * @pci_addr: PCI address to which the physical address should be mapped
>> + * @size: the size of the mapping starting from @pci_addr
>> + * @map: populate here the actual size and offset into the controller memory
>> + *       that must be allocated for the mapping
>> + *
>> + * Invoke the controller map_align operation to obtain the size and the offset
>> + * into a controller address region that must be allocated to map @size
>> + * bytes of the RC PCI address space starting from @pci_addr.
>> + *
>> + * The size of the mapping that can be handled by the controller is indicated
>> + * using the pci_size field of @map. This size may be smaller than the requested
>> + * @size. In such case, the function driver must handle the mapping using
>> + * several fragments. The offset into the controller memory for the effective
>> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
>> + * using the map_ofst field of @map.
>> + */
>> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
>> +{
>> +	const struct pci_epc_features *features;
>> +	size_t mask;
>> +	int ret;
>> +
>> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>> +		return -EINVAL;
>> +
>> +	if (!size || !map)
>> +		return -EINVAL;
>> +
>> +	memset(map, 0, sizeof(*map));
>> +	map->pci_addr = pci_addr;
>> +	map->pci_size = size;
>> +
>> +	if (epc->ops->map_align) {
>> +		mutex_lock(&epc->lock);
>> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
>> +		mutex_unlock(&epc->lock);
>> +		return ret;
>> +	}
>> +
>> +	/*
>> +	 * Assume a fixed alignment constraint as specified by the controller
>> +	 * features.
>> +	 */
>> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
>> +	if (!features || !features->align) {
>> +		map->map_pci_addr = pci_addr;
>> +		map->map_size = size;
>> +		map->map_ofst = 0;
>> +	}
> 
> The 'align' of pci_epc_features was initially added only to address the 
> inbound ATU constraints. This is also added as comment in [1]. The PCI 
> address restrictions (only fixed alignment constraint) were handled by 
> the host side driver and depends on the connected endpoint device 
> (atleast it was like that for pci_endpoint_test.c [2]).
> So pci-epf-test.c used the 'align' in pci_epc_features only as part of 
> pci_epf_alloc_space().
> 
> Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using 
> it out of pci_epf_alloc_space(), I think we should keep the 'align' of 
> pci_epc_features only within pci_epf_alloc_space() and controllers with 
> any PCI address restrictions to implement ->map_align(). This could as 
> well be done in a phased manner to let controllers implement 
> ->map_align() and then remove using  pci_epc_features in 
> pci_epc_map_align(). Let me know what you think?

Yep, good idea. I will remove the use of "align" as a default alignment
constraint. For controllers that have a fixed alignment constraint (not
necessarilly epc->features->align), it is trivial to provide a generic helper
function that implements the ->map_align method.
Niklas Cassel April 5, 2024, 12:20 p.m. UTC | #15
On Thu, Apr 04, 2024 at 11:43:47AM +0900, Damien Le Moal wrote:
> On 4/3/24 21:33, Kishon Vijay Abraham I wrote:
> > Hi Damien,
> > 
> > On 3/30/2024 9:49 AM, Damien Le Moal wrote:
> >> Some endpoint controllers have requirements on the alignment of the
> >> controller physical memory address that must be used to map a RC PCI
> >> address region. For instance, the rockchip endpoint controller uses
> >> at most the lower 20 bits of a physical memory address region as the
> >> lower bits of an RC PCI address. For mapping a PCI address region of
> >> size bytes starting from pci_addr, the exact number of address bits
> >> used is the number of address bits changing in the address range
> >> [pci_addr..pci_addr + size - 1].
> >>
> >> For this example, this creates the following constraints:
> >> 1) The offset into the controller physical memory allocated for a
> >>     mapping depends on the mapping size *and* the starting PCI address
> >>     for the mapping.
> >> 2) A mapping size cannot exceed the controller windows size (1MB) minus
> >>     the offset needed into the allocated physical memory, which can end
> >>     up being a smaller size than the desired mapping size.
> >>
> >> Handling these constraints independently of the controller being used in
> >> a PCI EP function driver is not possible with the current EPC API as
> >> it only provides the ->align field in struct pci_epc_features.
> >> Furthermore, this alignment is static and does not depend on a mapping
> >> pci address and size.
> >>
> >> Solve this by introducing the function pci_epc_map_align() and the
> >> endpoint controller operation ->map_align to allow endpoint function
> >> drivers to obtain the size and the offset into a controller address
> >> region that must be used to map an RC PCI address region. The size
> >> of the physical address region provided by pci_epc_map_align() can then
> >> be used as the size argument for the function pci_epc_mem_alloc_addr().
> >> The offset into the allocated controller memory can be used to
> >> correctly handle data transfers. Of note is that pci_epc_map_align() may
> >> indicate upon return a mapping size that is smaller (but not 0) than the
> >> requested PCI address region size. For such case, an endpoint function
> >> driver must handle data transfers in fragments.
> >>
> >> The controller operation ->map_align is optional: controllers that do
> >> not have any address alignment constraints for mapping a RC PCI address
> >> region do not need to implement this operation. For such controllers,
> >> pci_epc_map_align() always returns the mapping size as equal
> >> to the requested size and an offset equal to 0.
> >>
> >> The structure pci_epc_map is introduced to represent a mapping start PCI
> >> address, size and the size and offset into the controller memory needed
> >> for mapping the PCI address region.
> >>
> >> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> >> ---
> >>   drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
> >>   include/linux/pci-epc.h             | 33 +++++++++++++++
> >>   2 files changed, 99 insertions(+)
> >>
> >> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> >> index 754afd115bbd..37758ca91d7f 100644
> >> --- a/drivers/pci/endpoint/pci-epc-core.c
> >> +++ b/drivers/pci/endpoint/pci-epc-core.c
> >> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> >>   }
> >>   EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
> >>   
> >> +/**
> >> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
> >> + *			 address region needed to map a RC PCI address region
> >> + * @epc: the EPC device on which address is allocated
> >> + * @func_no: the physical endpoint function number in the EPC device
> >> + * @vfunc_no: the virtual endpoint function number in the physical function
> >> + * @pci_addr: PCI address to which the physical address should be mapped
> >> + * @size: the size of the mapping starting from @pci_addr
> >> + * @map: populate here the actual size and offset into the controller memory
> >> + *       that must be allocated for the mapping
> >> + *
> >> + * Invoke the controller map_align operation to obtain the size and the offset
> >> + * into a controller address region that must be allocated to map @size
> >> + * bytes of the RC PCI address space starting from @pci_addr.
> >> + *
> >> + * The size of the mapping that can be handled by the controller is indicated
> >> + * using the pci_size field of @map. This size may be smaller than the requested
> >> + * @size. In such case, the function driver must handle the mapping using
> >> + * several fragments. The offset into the controller memory for the effective
> >> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
> >> + * using the map_ofst field of @map.
> >> + */
> >> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> >> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
> >> +{
> >> +	const struct pci_epc_features *features;
> >> +	size_t mask;
> >> +	int ret;
> >> +
> >> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> >> +		return -EINVAL;
> >> +
> >> +	if (!size || !map)
> >> +		return -EINVAL;
> >> +
> >> +	memset(map, 0, sizeof(*map));
> >> +	map->pci_addr = pci_addr;
> >> +	map->pci_size = size;
> >> +
> >> +	if (epc->ops->map_align) {
> >> +		mutex_lock(&epc->lock);
> >> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
> >> +		mutex_unlock(&epc->lock);
> >> +		return ret;
> >> +	}
> >> +
> >> +	/*
> >> +	 * Assume a fixed alignment constraint as specified by the controller
> >> +	 * features.
> >> +	 */
> >> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
> >> +	if (!features || !features->align) {
> >> +		map->map_pci_addr = pci_addr;
> >> +		map->map_size = size;
> >> +		map->map_ofst = 0;
> >> +	}
> > 
> > The 'align' of pci_epc_features was initially added only to address the 
> > inbound ATU constraints. This is also added as comment in [1]. The PCI 
> > address restrictions (only fixed alignment constraint) were handled by 
> > the host side driver and depends on the connected endpoint device 
> > (atleast it was like that for pci_endpoint_test.c [2]).
> > So pci-epf-test.c used the 'align' in pci_epc_features only as part of 
> > pci_epf_alloc_space().
> > 
> > Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using 
> > it out of pci_epf_alloc_space(), I think we should keep the 'align' of 
> > pci_epc_features only within pci_epf_alloc_space() and controllers with 
> > any PCI address restrictions to implement ->map_align(). This could as 
> > well be done in a phased manner to let controllers implement 
> > ->map_align() and then remove using  pci_epc_features in 
> > pci_epc_map_align(). Let me know what you think?

First you say that you want to avoid using epc_features->align inside
pci_epc_map_align(), and then you say that we could do it in phases,
and eventually stop using epc_features->align in pci_epc_map_align().

I'm confused... :)

Do you really want pci_epc_map_align() to make use of epc_features->align ?

Don't you mean ep->page_size ?
(Please read the whole email to see my reasoning.)


> 
> Yep, good idea. I will remove the use of "align" as a default alignment
> constraint. For controllers that have a fixed alignment constraint (not
> necessarilly epc->features->align), it is trivial to provide a generic helper
> function that implements the ->map_align method.

We can see that commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a9a801620efac92885fc9cd53594c0b9aba87a4

Introduced epc_features->align and modified pci_epf_alloc_space() to use it.

From reading the commit, it appears that epc_features->align was intended to
represent inbound iATU alignment requirement.

For DWC based controllers, the inbound iATU address must be aligned to:
CX_ATU_MIN_REGION_SIZE.

AFAICT, epc_features->align currently has nothing to do with traffic outbound
from the EP.


For aligning the reads/writes to buffers allocated on the host side,
we currently have .alignment in the host side driver:
https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L966-L1021

Which should be set to the outbound iATU alignment requirement.

For DWC based controllers, the outbound iATU address must be aligned to:
CX_ATU_MIN_REGION_SIZE.


Additionally, we have ep->page_size, which defines the smallest outbound unit
that can be mapped.
(On DWC based controllers, tis is CX_ATU_MIN_REGION_SIZE.)

ep->page_size is used to specify the outbound alignment for e.g.
dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq():
https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L488
https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L555

which makes sure that we can write to the RC side MSI/MSI-X address
while satisfying the outbound iATU alignment requirement.

See also:
https://lore.kernel.org/linux-pci/20240402-pci2_upstream-v3-2-803414bdb430@nxp.com/



Now I understand that rockchip is the first one that does not have a fixed
alignment.
So for that platform, map_align() will be different from ep->page_size.
(For all DWC based drivers the outbound iATU alignment requirement is
the same as the page size.)

However, it would be nice if:
1) We could have a default implementation of map_align() that by default uses
ep->page_size. Platforms that have non-fixed alignment requirements could
define their own map_align().

2) We fix dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() to use
the new pci_epc_map_align().

3) It is getting too complicated with all these...
epc_features->align, ep->page_size, map_align(), and .alignment in host driver.
I think that we need to document each of these in Documentation/PCI/endpoint/

4) It would be nice if we could set page_size correctly for all the PCI device
and vendor IDs that have defined an .alignment in drivers/misc/pci_endpoint_test.c
in the correct EPC driver. That way, we should be able to completely remove all
.alignment specified in drivers/misc/pci_endpoint_test.c.

5) Unfortunately drivers/misc/pci_endpoint_test.c defines a default alignment
of 4K:
https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L968
https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L820

It would be nice if we could get rid of this as well. Or perhaps add an option
to pci_test so that it does not use this 4k alignment, such that we can verify
that pci_epc_map_align() is actually working.



In my opinion 4) is the biggest win with this series, because it means that
we define the alignment in the EPC driver, instead of needing to define it in
each and every host side driver. But right now, this great improvement is not
really visible for someone looking quickly at the current series.


Kind regards,
Niklas
Damien Le Moal April 5, 2024, 12:43 p.m. UTC | #16
On 4/5/24 21:20, Niklas Cassel wrote:
> On Thu, Apr 04, 2024 at 11:43:47AM +0900, Damien Le Moal wrote:
>> On 4/3/24 21:33, Kishon Vijay Abraham I wrote:
>>> Hi Damien,
>>>
>>> On 3/30/2024 9:49 AM, Damien Le Moal wrote:
>>>> Some endpoint controllers have requirements on the alignment of the
>>>> controller physical memory address that must be used to map a RC PCI
>>>> address region. For instance, the rockchip endpoint controller uses
>>>> at most the lower 20 bits of a physical memory address region as the
>>>> lower bits of an RC PCI address. For mapping a PCI address region of
>>>> size bytes starting from pci_addr, the exact number of address bits
>>>> used is the number of address bits changing in the address range
>>>> [pci_addr..pci_addr + size - 1].
>>>>
>>>> For this example, this creates the following constraints:
>>>> 1) The offset into the controller physical memory allocated for a
>>>>     mapping depends on the mapping size *and* the starting PCI address
>>>>     for the mapping.
>>>> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>>>>     the offset needed into the allocated physical memory, which can end
>>>>     up being a smaller size than the desired mapping size.
>>>>
>>>> Handling these constraints independently of the controller being used in
>>>> a PCI EP function driver is not possible with the current EPC API as
>>>> it only provides the ->align field in struct pci_epc_features.
>>>> Furthermore, this alignment is static and does not depend on a mapping
>>>> pci address and size.
>>>>
>>>> Solve this by introducing the function pci_epc_map_align() and the
>>>> endpoint controller operation ->map_align to allow endpoint function
>>>> drivers to obtain the size and the offset into a controller address
>>>> region that must be used to map an RC PCI address region. The size
>>>> of the physical address region provided by pci_epc_map_align() can then
>>>> be used as the size argument for the function pci_epc_mem_alloc_addr().
>>>> The offset into the allocated controller memory can be used to
>>>> correctly handle data transfers. Of note is that pci_epc_map_align() may
>>>> indicate upon return a mapping size that is smaller (but not 0) than the
>>>> requested PCI address region size. For such case, an endpoint function
>>>> driver must handle data transfers in fragments.
>>>>
>>>> The controller operation ->map_align is optional: controllers that do
>>>> not have any address alignment constraints for mapping a RC PCI address
>>>> region do not need to implement this operation. For such controllers,
>>>> pci_epc_map_align() always returns the mapping size as equal
>>>> to the requested size and an offset equal to 0.
>>>>
>>>> The structure pci_epc_map is introduced to represent a mapping start PCI
>>>> address, size and the size and offset into the controller memory needed
>>>> for mapping the PCI address region.
>>>>
>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>> ---
>>>>   drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>>>>   include/linux/pci-epc.h             | 33 +++++++++++++++
>>>>   2 files changed, 99 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
>>>> index 754afd115bbd..37758ca91d7f 100644
>>>> --- a/drivers/pci/endpoint/pci-epc-core.c
>>>> +++ b/drivers/pci/endpoint/pci-epc-core.c
>>>> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>>>   
>>>> +/**
>>>> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
>>>> + *			 address region needed to map a RC PCI address region
>>>> + * @epc: the EPC device on which address is allocated
>>>> + * @func_no: the physical endpoint function number in the EPC device
>>>> + * @vfunc_no: the virtual endpoint function number in the physical function
>>>> + * @pci_addr: PCI address to which the physical address should be mapped
>>>> + * @size: the size of the mapping starting from @pci_addr
>>>> + * @map: populate here the actual size and offset into the controller memory
>>>> + *       that must be allocated for the mapping
>>>> + *
>>>> + * Invoke the controller map_align operation to obtain the size and the offset
>>>> + * into a controller address region that must be allocated to map @size
>>>> + * bytes of the RC PCI address space starting from @pci_addr.
>>>> + *
>>>> + * The size of the mapping that can be handled by the controller is indicated
>>>> + * using the pci_size field of @map. This size may be smaller than the requested
>>>> + * @size. In such case, the function driver must handle the mapping using
>>>> + * several fragments. The offset into the controller memory for the effective
>>>> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
>>>> + * using the map_ofst field of @map.
>>>> + */
>>>> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>>> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
>>>> +{
>>>> +	const struct pci_epc_features *features;
>>>> +	size_t mask;
>>>> +	int ret;
>>>> +
>>>> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!size || !map)
>>>> +		return -EINVAL;
>>>> +
>>>> +	memset(map, 0, sizeof(*map));
>>>> +	map->pci_addr = pci_addr;
>>>> +	map->pci_size = size;
>>>> +
>>>> +	if (epc->ops->map_align) {
>>>> +		mutex_lock(&epc->lock);
>>>> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
>>>> +		mutex_unlock(&epc->lock);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Assume a fixed alignment constraint as specified by the controller
>>>> +	 * features.
>>>> +	 */
>>>> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
>>>> +	if (!features || !features->align) {
>>>> +		map->map_pci_addr = pci_addr;
>>>> +		map->map_size = size;
>>>> +		map->map_ofst = 0;
>>>> +	}
>>>
>>> The 'align' of pci_epc_features was initially added only to address the 
>>> inbound ATU constraints. This is also added as comment in [1]. The PCI 
>>> address restrictions (only fixed alignment constraint) were handled by 
>>> the host side driver and depends on the connected endpoint device 
>>> (atleast it was like that for pci_endpoint_test.c [2]).
>>> So pci-epf-test.c used the 'align' in pci_epc_features only as part of 
>>> pci_epf_alloc_space().
>>>
>>> Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using 
>>> it out of pci_epf_alloc_space(), I think we should keep the 'align' of 
>>> pci_epc_features only within pci_epf_alloc_space() and controllers with 
>>> any PCI address restrictions to implement ->map_align(). This could as 
>>> well be done in a phased manner to let controllers implement 
>>> ->map_align() and then remove using  pci_epc_features in 
>>> pci_epc_map_align(). Let me know what you think?
> 
> First you say that you want to avoid using epc_features->align inside
> pci_epc_map_align(), and then you say that we could do it in phases,
> and eventually stop using epc_features->align in pci_epc_map_align().
> 
> I'm confused... :)
> 
> Do you really want pci_epc_map_align() to make use of epc_features->align ?
> 
> Don't you mean ep->page_size ?
> (Please read the whole email to see my reasoning.)
> 
> 
>>
>> Yep, good idea. I will remove the use of "align" as a default alignment
>> constraint. For controllers that have a fixed alignment constraint (not
>> necessarilly epc->features->align), it is trivial to provide a generic helper
>> function that implements the ->map_align method.
> 
> We can see that commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a9a801620efac92885fc9cd53594c0b9aba87a4
> 
> Introduced epc_features->align and modified pci_epf_alloc_space() to use it.
> 
> From reading the commit, it appears that epc_features->align was intended to
> represent inbound iATU alignment requirement.
> 
> For DWC based controllers, the inbound iATU address must be aligned to:
> CX_ATU_MIN_REGION_SIZE.
> 
> AFAICT, epc_features->align currently has nothing to do with traffic outbound
> from the EP.

Yes, correct. It is for BARs, not for PCI address alignment constraint.

> For aligning the reads/writes to buffers allocated on the host side,
> we currently have .alignment in the host side driver:
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L966-L1021
> 
> Which should be set to the outbound iATU alignment requirement.
> 
> For DWC based controllers, the outbound iATU address must be aligned to:
> CX_ATU_MIN_REGION_SIZE.
> Additionally, we have ep->page_size, which defines the smallest outbound unit
> that can be mapped.
> (On DWC based controllers, tis is CX_ATU_MIN_REGION_SIZE.)
> 
> ep->page_size is used to specify the outbound alignment for e.g.
> dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq():
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L488
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L555> which makes sure that we can write to the RC side MSI/MSI-X address
> while satisfying the outbound iATU alignment requirement.
> 
> See also:
> https://lore.kernel.org/linux-pci/20240402-pci2_upstream-v3-2-803414bdb430@nxp.com/
> 
> 
> 
> Now I understand that rockchip is the first one that does not have a fixed
> alignment.
> So for that platform, map_align() will be different from ep->page_size.
> (For all DWC based drivers the outbound iATU alignment requirement is
> the same as the page size.)

Yes. So we can have a generic map_align() implementation that all these drivers
can use as there .map_align method. No need to expose page size to the epc/epf
core code.

> However, it would be nice if:
> 1) We could have a default implementation of map_align() that by default uses
> ep->page_size. Platforms that have non-fixed alignment requirements could
> define their own map_align().

See above. The default implementation can be a helper function defined in epc
core that the drivers can use for their .map_align() method.

> 
> 2) We fix dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() to use
> the new pci_epc_map_align().

Why ? That is completely internal to the controller driver.

> 
> 3) It is getting too complicated with all these...
> epc_features->align, ep->page_size, map_align(), and .alignment in host driver.
> I think that we need to document each of these in Documentation/PCI/endpoint/

test host driver .alignment needs to be nuked. That one is nonsense.
ep->page_size needs to stay internal to the driver. .map_align method is enough
to handle any PCI address mapping constraint and will indicate memory size to
allocate, offset into it etc. And for the BARs alignment, .align feature is not
exactly great as it is not clear, but it is enough I think. So we could just
rename it to be clear. And even maybe remove it from features. I do not see why
an EPF needs to care about it given that epc core funstions are used to setup
the bars.

> 4) It would be nice if we could set page_size correctly for all the PCI device
> and vendor IDs that have defined an .alignment in drivers/misc/pci_endpoint_test.c
> in the correct EPC driver. That way, we should be able to completely remove all
> .alignment specified in drivers/misc/pci_endpoint_test.c.

The host side should be allowed to use any PCI address alignment it wants. So no
alignment communicated at all. It is the EP side that needs to deal with alignment.

> 5) Unfortunately drivers/misc/pci_endpoint_test.c defines a default alignment
> of 4K:
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L968
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L820
> 
> It would be nice if we could get rid of this as well. Or perhaps add an option
> to pci_test so that it does not use this 4k alignment, such that we can verify
> that pci_epc_map_align() is actually working.

Exactly. Get rid of any default alignment, add a test parameter to define one so
that we can test different alignment+size combinations.

> In my opinion 4) is the biggest win with this series, because it means that
> we define the alignment in the EPC driver, instead of needing to define it in
> each and every host side driver. But right now, this great improvement is not
> really visible for someone looking quickly at the current series.

Yes. Once in place, we can rework the test driver alignment stuff to make it
optional instead of mandatory because of bad handling on the EP side :)
Niklas Cassel April 5, 2024, 1:33 p.m. UTC | #17
On Sat, Mar 30, 2024 at 01:19:11PM +0900, Damien Le Moal wrote:
> Introduce the epc core helper function pci_epc_function_is_valid() to
> verify that an epc pointer, a physical function number and a virtual
> function number are all valid. This avoids repeating the code pattern:
> 
> if (IS_ERR_OR_NULL(epc) || func_no >= epc->max_functions)
> 	return err;
> 
> if (vfunc_no > 0 && (!epc->max_vfs || vfunc_no > epc->max_vfs[func_no]))
> 	return err;
> 
> in many functions of the endpoint controller core code.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---

Reviewed-by: Niklas Cassel <cassel@kernel.org>
Niklas Cassel April 5, 2024, 1:37 p.m. UTC | #18
On Sat, Mar 30, 2024 at 01:19:14PM +0900, Damien Le Moal wrote:
> Modify the endpoint test driver to use the functions pci_epc_mem_map()
> and pci_epc_mem_unmap() for the read, write and copy tests. For each
> test case, the transfer (dma or mmio) are executed in a loop to ensure
> that potentially partial mappings returned by pci_epc_mem_map() are
> correctly handled.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---

You probably want to rebase this series on top of:
[1] https://lore.kernel.org/linux-pci/20240314-pci-dbi-rework-v10-0-14a45c5a938e@linaro.org/
[2] https://lore.kernel.org/linux-pci/20240320113157.322695-1-cassel@kernel.org/

As these both modify pci-epf-test.c.

AFAICT both these series [1] (DBI rework v12, not v10) and [2] are fully
reviewed and seem to be ready to go.

They just seem to take a really long time to be picked up.


Kind regards,
Niklas
Niklas Cassel April 5, 2024, 1:39 p.m. UTC | #19
On Wed, Apr 03, 2024 at 01:18:23PM +0530, Manivannan Sadhasivam wrote:
> On Sat, Mar 30, 2024 at 01:19:16PM +0900, Damien Le Moal wrote:
> > Implement the link_down event operation to stop the command execution
> > delayed work when the endpoint controller notifies a link down event.
> > 
> > Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> 
> This patch is already part of another series I posted [1] and under review. So
> this can be dropped.
> 
> - Mani
> 
> [1] https://lore.kernel.org/linux-pci/20240401-pci-epf-rework-v2-9-970dbe90b99d@linaro.org/

Mani, your patch does not use _sync(),
so I don't think that we can simply drop this patch.


Kind regards,
Niklas

> 
> > Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > ---
> >  drivers/pci/endpoint/functions/pci-epf-test.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> > index ab40c3182677..e6d4e1747c9f 100644
> > --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> > +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> > @@ -824,9 +824,19 @@ static int pci_epf_test_link_up(struct pci_epf *epf)
> >  	return 0;
> >  }
> >  
> > +static int pci_epf_test_link_down(struct pci_epf *epf)
> > +{
> > +	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
> > +
> > +	cancel_delayed_work_sync(&epf_test->cmd_handler);
> > +
> > +	return 0;
> > +}
> > +
> >  static const struct pci_epc_event_ops pci_epf_test_event_ops = {
> >  	.core_init = pci_epf_test_core_init,
> >  	.link_up = pci_epf_test_link_up,
> > +	.link_down = pci_epf_test_link_down,
> >  };
> >  
> >  static int pci_epf_test_alloc_space(struct pci_epf *epf)
> > -- 
> > 2.44.0
> > 
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Niklas Cassel April 5, 2024, 1:41 p.m. UTC | #20
On Wed, Apr 03, 2024 at 01:17:02PM +0530, Manivannan Sadhasivam wrote:
> On Sat, Mar 30, 2024 at 01:19:15PM +0900, Damien Le Moal wrote:
> > Replace the call to cancel_delayed_work() with a call to
> > cancel_delayed_work_sync() in pci_epf_test_unbind(). This ensures that
> > the command handler is really stopped when proceeding with dma and bar
> > cleanup.
> > 
> > Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> 
> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> 
> - Mani
> 
> > Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > ---

Reviewed-by: Niklas Cassel <cassel@kernel.org>
Niklas Cassel April 5, 2024, 2:10 p.m. UTC | #21
On Sat, Mar 30, 2024 at 01:19:13PM +0900, Damien Le Moal wrote:
> Introduce the function pci_epc_mem_map() to facilitate controller memory
> address allocation and mapping to a RC PCI address region in endpoint
> function drivers.
> 
> This function first uses pci_epc_map_align() to determine the controller
> memory address alignment (offset and size) constraints. The result of
> this function is used to allocate a controller physical memory region
> using pci_epc_mem_alloc_addr() and map it to the RC PCI address
> space with pci_epc_map_addr(). Since pci_epc_map_align() may indicate
> that a mapping can be smaller than the requested size, pci_epc_mem_map()
> may only partially map the RC PCI address region specified and return
> a smaller size for the effective mapping.
> 
> The counterpart of pci_epc_mem_map() to unmap and free the controller
> memory address region is pci_epc_mem_unmap().
> 
> Both functions operate using struct pci_epc_map data structure which is
> extended to contain the physical and virtual addresses of the allocated
> controller memory. Endpoint function drivers can use struct pci_epc_map
> to implement read/write accesses within the mapped RC PCI address region
> using the ->virt_addr and ->size fields.
> 
> This commit contains contributions from Rick Wertenbroek
> <rick.wertenbroek@gmail.com>.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  drivers/pci/endpoint/pci-epc-core.c | 68 +++++++++++++++++++++++++++++
>  include/linux/pci-epc.h             |  6 +++
>  2 files changed, 74 insertions(+)
> 
> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
> index 37758ca91d7f..0095b54bdf9e 100644
> --- a/drivers/pci/endpoint/pci-epc-core.c
> +++ b/drivers/pci/endpoint/pci-epc-core.c
> @@ -530,6 +530,74 @@ int pci_epc_map_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>  }
>  EXPORT_SYMBOL_GPL(pci_epc_map_addr);
>  
> +/**
> + * pci_epc_mem_map() - allocate and map CPU address to PCI address
> + * @epc: the EPC device on which the CPU address is to be allocated and mapped
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @pci_addr: PCI address to which the CPU address should be mapped
> + * @size: the number of bytes to map starting from @pci_addr
> + * @map: where to return the mapping information
> + *
> + * Allocate a controller physical address region and map it to a RC PCI address
> + * region, taking into account the controller physical address mapping
> + * constraints (if any). Returns the effective size of the mapping, which may
> + * be less than @size, or a negative error code in case of error.
> + */
> +ssize_t pci_epc_mem_map(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			u64 pci_addr, size_t size, struct pci_epc_map *map)
> +{
> +	int ret;
> +
> +	ret = pci_epc_map_align(epc, func_no, vfunc_no, pci_addr, size, map);
> +	if (ret)
> +		return ret;
> +
> +	map->virt_base = pci_epc_mem_alloc_addr(epc, &map->phys_base,
> +						map->map_size);
> +	if (!map->virt_base)
> +		return -ENOMEM;
> +
> +	map->phys_addr = map->phys_base + map->map_ofst;
> +	map->virt_addr = map->virt_base + map->map_ofst;
> +
> +	ret = pci_epc_map_addr(epc, func_no, vfunc_no, map->phys_base,
> +			       map->map_pci_addr, map->map_size);
> +	if (ret) {
> +		pci_epc_mem_free_addr(epc, map->phys_base, map->virt_base,
> +				      map->map_size);
> +		return ret;
> +	}
> +
> +	return map->pci_size;

map->pci_size is of type size_t.
pci_epc_mem_map returns a type of ssize_t.

This means that on ILP32 you will truncate the result, and will only be
able to map a region of max size 2GB.

Could we perhaps change this function to return an int instead?
(0 on success). The mapped size can still be accessed in map->pci_size.


Kind regards,
Niklas

> +}
> +EXPORT_SYMBOL_GPL(pci_epc_mem_map);
> +
> +/**
> + * pci_epc_mem_unmap() - unmap from PCI address and free a CPU address region
> + * @epc: the EPC device on which the CPU address is allocated and mapped
> + * @func_no: the physical endpoint function number in the EPC device
> + * @vfunc_no: the virtual endpoint function number in the physical function
> + * @map: the mapping information
> + *
> + * Allocate and map local CPU address to a PCI address, accounting for the
> + * controller local CPU address alignment constraints.
> + */
> +void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		       struct pci_epc_map *map)
> +{
> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
> +		return;
> +
> +	if (!map || !map->pci_size)
> +		return;
> +
> +	pci_epc_unmap_addr(epc, func_no, vfunc_no, map->phys_base);
> +	pci_epc_mem_free_addr(epc, map->phys_base, map->virt_base,
> +			      map->map_size);
> +}
> +EXPORT_SYMBOL_GPL(pci_epc_mem_unmap);
> +
>  /**
>   * pci_epc_clear_bar() - reset the BAR
>   * @epc: the EPC device for which the BAR has to be cleared
> diff --git a/include/linux/pci-epc.h b/include/linux/pci-epc.h
> index 8cfb4aaf2628..86397a500b54 100644
> --- a/include/linux/pci-epc.h
> +++ b/include/linux/pci-epc.h
> @@ -304,4 +304,10 @@ void __iomem *pci_epc_mem_alloc_addr(struct pci_epc *epc,
>  				     phys_addr_t *phys_addr, size_t size);
>  void pci_epc_mem_free_addr(struct pci_epc *epc, phys_addr_t phys_addr,
>  			   void __iomem *virt_addr, size_t size);
> +
> +ssize_t pci_epc_mem_map(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +			u64 pci_addr, size_t size, struct pci_epc_map *map);
> +void pci_epc_mem_unmap(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
> +		       struct pci_epc_map *map);
> +
>  #endif /* __LINUX_PCI_EPC_H */
> -- 
> 2.44.0
>
Niklas Cassel April 5, 2024, 3:18 p.m. UTC | #22
On Fri, Apr 05, 2024 at 09:43:42PM +0900, Damien Le Moal wrote:
> On 4/5/24 21:20, Niklas Cassel wrote:
> > 
> > Now I understand that rockchip is the first one that does not have a fixed
> > alignment.
> > So for that platform, map_align() will be different from ep->page_size.
> > (For all DWC based drivers the outbound iATU alignment requirement is
> > the same as the page size.)
> 
> Yes. So we can have a generic map_align() implementation that all these drivers
> can use as there .map_align method. No need to expose page size to the epc/epf
> core code.

I don't follow.

ep->page_size is used by pci_epc_multi_mem_init(), pci_epc_mem_alloc_addr()
and pci_epc_mem_free_addr(), so it is used by EPC core.

pci_epc_mem_alloc_addr() currently uses it to align up an allocation to the
page size, so that an allocation from the PCI window/memory space is a multiple
of page_size. How can we avoid exposing the page size to EPC core?


For a DWC-based driver, the mapping part requires that the start address of
the mapping should be aligned to the page size.

But e.g.
drivers/pci/controller/pcie-rockchip-ep.c, sets page size (smallest allocation):
to 1 MB:
windows[i].page_size = SZ_1M;

But the mapping part for rockchip-ep appears to have different requirements.


> 
> > However, it would be nice if:
> > 1) We could have a default implementation of map_align() that by default uses
> > ep->page_size. Platforms that have non-fixed alignment requirements could
> > define their own map_align().
> 
> See above. The default implementation can be a helper function defined in epc
> core that the drivers can use for their .map_align() method.

Sounds good.


> > 2) We fix dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() to use
> > the new pci_epc_map_align().
> 
> Why ? That is completely internal to the controller driver.

Well, dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() currently does
some clearing of lower bits of the address before using .map_addr() to map the
an aligned outbound address. It will then write to the mapped address + some
offset.

Since it is already using .map_addr(), it seems like the perfect use case for
pci_epc_map_align(), as the function then would not need to do any clearing
of lower bits before mapping, nor writing to an offset within the mapping.

It would just use pci_epc_map_align() and then write to map->pci_addr.


> > 3) It is getting too complicated with all these...
> > epc_features->align, ep->page_size, map_align(), and .alignment in host driver.
> > I think that we need to document each of these in Documentation/PCI/endpoint/
> 
> test host driver .alignment needs to be nuked. That one is nonsense.
> ep->page_size needs to stay internal to the driver. .map_align method is enough
> to handle any PCI address mapping constraint and will indicate memory size to
> allocate, offset into it etc. And for the BARs alignment, .align feature is not
> exactly great as it is not clear, but it is enough I think. So we could just
> rename it to be clear. And even maybe remove it from features. I do not see why
> an EPF needs to care about it given that epc core funstions are used to setup
> the bars.

+1 on renaming it to bar_alignment or inbound_alignment or similar.

I don't think that we can remove it from epc_features. It is used by
pci_epf_alloc_space() which uses epc_features->align to ensure that
when an EPF driver allocates the backing memory for a BAR, the backing
memory is aligned to bar_alignment. (An allocation of size X is guaranteed
to be aligned to X.)


> > 4) It would be nice if we could set page_size correctly for all the PCI device
> > and vendor IDs that have defined an .alignment in drivers/misc/pci_endpoint_test.c
> > in the correct EPC driver. That way, we should be able to completely remove all
> > .alignment specified in drivers/misc/pci_endpoint_test.c.
> 
> The host side should be allowed to use any PCI address alignment it wants. So no
> alignment communicated at all. It is the EP side that needs to deal with alignment.

I think we are saying the same thing.
But in order to remove all .alignment uses in drivers/misc/pci_endpoint_test.c,
we will need to add modify the corresponding EPC driver to either:
- Define the ep->page_size, so that the generic map_align() implementation will
work.
(If you grep for page_size in drivers/pci/controller, you will see that
very few EPC drivers currently set ep->page_size, even though the PCI device
and vendor IDs for those same controllers have specified an alignment in
drivers/misc/pci_endpoint_test.c)
- Define a custom map_align() implementation.


> > 5) Unfortunately drivers/misc/pci_endpoint_test.c defines a default alignment
> > of 4K:
> > https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L968
> > https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L820
> > 
> > It would be nice if we could get rid of this as well. Or perhaps add an option
> > to pci_test so that it does not use this 4k alignment, such that we can verify
> > that pci_epc_map_align() is actually working.
> 
> Exactly. Get rid of any default alignment, add a test parameter to define one so
> that we can test different alignment+size combinations.
> 
> > In my opinion 4) is the biggest win with this series, because it means that
> > we define the alignment in the EPC driver, instead of needing to define it in
> > each and every host side driver. But right now, this great improvement is not
> > really visible for someone looking quickly at the current series.
> 
> Yes. Once in place, we can rework the test driver alignment stuff to make it
> optional instead of mandatory because of bad handling on the EP side :)

Perhaps it would be nice to have 5) implemented for this initial series,
so that it is possible to test that this new API is behaving as intended?


Kind regards,
Niklas
Manivannan Sadhasivam April 6, 2024, 2:24 a.m. UTC | #23
On Fri, Apr 05, 2024 at 03:39:58PM +0200, Niklas Cassel wrote:
> On Wed, Apr 03, 2024 at 01:18:23PM +0530, Manivannan Sadhasivam wrote:
> > On Sat, Mar 30, 2024 at 01:19:16PM +0900, Damien Le Moal wrote:
> > > Implement the link_down event operation to stop the command execution
> > > delayed work when the endpoint controller notifies a link down event.
> > > 
> > > Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> > 
> > This patch is already part of another series I posted [1] and under review. So
> > this can be dropped.
> > 
> > - Mani
> > 
> > [1] https://lore.kernel.org/linux-pci/20240401-pci-epf-rework-v2-9-970dbe90b99d@linaro.org/
> 
> Mani, your patch does not use _sync(),
> so I don't think that we can simply drop this patch.
> 

Agree, I was planning to update it in my next version anyway.

- Mani

> 
> Kind regards,
> Niklas
> 
> > 
> > > Reviewed-by: Frank Li <Frank.Li@nxp.com>
> > > ---
> > >  drivers/pci/endpoint/functions/pci-epf-test.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/pci/endpoint/functions/pci-epf-test.c b/drivers/pci/endpoint/functions/pci-epf-test.c
> > > index ab40c3182677..e6d4e1747c9f 100644
> > > --- a/drivers/pci/endpoint/functions/pci-epf-test.c
> > > +++ b/drivers/pci/endpoint/functions/pci-epf-test.c
> > > @@ -824,9 +824,19 @@ static int pci_epf_test_link_up(struct pci_epf *epf)
> > >  	return 0;
> > >  }
> > >  
> > > +static int pci_epf_test_link_down(struct pci_epf *epf)
> > > +{
> > > +	struct pci_epf_test *epf_test = epf_get_drvdata(epf);
> > > +
> > > +	cancel_delayed_work_sync(&epf_test->cmd_handler);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static const struct pci_epc_event_ops pci_epf_test_event_ops = {
> > >  	.core_init = pci_epf_test_core_init,
> > >  	.link_up = pci_epf_test_link_up,
> > > +	.link_down = pci_epf_test_link_down,
> > >  };
> > >  
> > >  static int pci_epf_test_alloc_space(struct pci_epf *epf)
> > > -- 
> > > 2.44.0
> > > 
> > 
> > -- 
> > மணிவண்ணன் சதாசிவம்
Kishon Vijay Abraham I April 10, 2024, 11:57 a.m. UTC | #24
On 4/5/2024 5:50 PM, Niklas Cassel wrote:
> On Thu, Apr 04, 2024 at 11:43:47AM +0900, Damien Le Moal wrote:
>> On 4/3/24 21:33, Kishon Vijay Abraham I wrote:
>>> Hi Damien,
>>>
>>> On 3/30/2024 9:49 AM, Damien Le Moal wrote:
>>>> Some endpoint controllers have requirements on the alignment of the
>>>> controller physical memory address that must be used to map a RC PCI
>>>> address region. For instance, the rockchip endpoint controller uses
>>>> at most the lower 20 bits of a physical memory address region as the
>>>> lower bits of an RC PCI address. For mapping a PCI address region of
>>>> size bytes starting from pci_addr, the exact number of address bits
>>>> used is the number of address bits changing in the address range
>>>> [pci_addr..pci_addr + size - 1].
>>>>
>>>> For this example, this creates the following constraints:
>>>> 1) The offset into the controller physical memory allocated for a
>>>>      mapping depends on the mapping size *and* the starting PCI address
>>>>      for the mapping.
>>>> 2) A mapping size cannot exceed the controller windows size (1MB) minus
>>>>      the offset needed into the allocated physical memory, which can end
>>>>      up being a smaller size than the desired mapping size.
>>>>
>>>> Handling these constraints independently of the controller being used in
>>>> a PCI EP function driver is not possible with the current EPC API as
>>>> it only provides the ->align field in struct pci_epc_features.
>>>> Furthermore, this alignment is static and does not depend on a mapping
>>>> pci address and size.
>>>>
>>>> Solve this by introducing the function pci_epc_map_align() and the
>>>> endpoint controller operation ->map_align to allow endpoint function
>>>> drivers to obtain the size and the offset into a controller address
>>>> region that must be used to map an RC PCI address region. The size
>>>> of the physical address region provided by pci_epc_map_align() can then
>>>> be used as the size argument for the function pci_epc_mem_alloc_addr().
>>>> The offset into the allocated controller memory can be used to
>>>> correctly handle data transfers. Of note is that pci_epc_map_align() may
>>>> indicate upon return a mapping size that is smaller (but not 0) than the
>>>> requested PCI address region size. For such case, an endpoint function
>>>> driver must handle data transfers in fragments.
>>>>
>>>> The controller operation ->map_align is optional: controllers that do
>>>> not have any address alignment constraints for mapping a RC PCI address
>>>> region do not need to implement this operation. For such controllers,
>>>> pci_epc_map_align() always returns the mapping size as equal
>>>> to the requested size and an offset equal to 0.
>>>>
>>>> The structure pci_epc_map is introduced to represent a mapping start PCI
>>>> address, size and the size and offset into the controller memory needed
>>>> for mapping the PCI address region.
>>>>
>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>> ---
>>>>    drivers/pci/endpoint/pci-epc-core.c | 66 +++++++++++++++++++++++++++++
>>>>    include/linux/pci-epc.h             | 33 +++++++++++++++
>>>>    2 files changed, 99 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/endpoint/pci-epc-core.c b/drivers/pci/endpoint/pci-epc-core.c
>>>> index 754afd115bbd..37758ca91d7f 100644
>>>> --- a/drivers/pci/endpoint/pci-epc-core.c
>>>> +++ b/drivers/pci/endpoint/pci-epc-core.c
>>>> @@ -433,6 +433,72 @@ void pci_epc_unmap_addr(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(pci_epc_unmap_addr);
>>>>    
>>>> +/**
>>>> + * pci_epc_map_align() - Get the offset into and the size of a controller memory
>>>> + *			 address region needed to map a RC PCI address region
>>>> + * @epc: the EPC device on which address is allocated
>>>> + * @func_no: the physical endpoint function number in the EPC device
>>>> + * @vfunc_no: the virtual endpoint function number in the physical function
>>>> + * @pci_addr: PCI address to which the physical address should be mapped
>>>> + * @size: the size of the mapping starting from @pci_addr
>>>> + * @map: populate here the actual size and offset into the controller memory
>>>> + *       that must be allocated for the mapping
>>>> + *
>>>> + * Invoke the controller map_align operation to obtain the size and the offset
>>>> + * into a controller address region that must be allocated to map @size
>>>> + * bytes of the RC PCI address space starting from @pci_addr.
>>>> + *
>>>> + * The size of the mapping that can be handled by the controller is indicated
>>>> + * using the pci_size field of @map. This size may be smaller than the requested
>>>> + * @size. In such case, the function driver must handle the mapping using
>>>> + * several fragments. The offset into the controller memory for the effective
>>>> + * mapping of the @pci_addr..@pci_addr+@map->pci_size address range is indicated
>>>> + * using the map_ofst field of @map.
>>>> + */
>>>> +int pci_epc_map_align(struct pci_epc *epc, u8 func_no, u8 vfunc_no,
>>>> +		      u64 pci_addr, size_t size, struct pci_epc_map *map)
>>>> +{
>>>> +	const struct pci_epc_features *features;
>>>> +	size_t mask;
>>>> +	int ret;
>>>> +
>>>> +	if (!pci_epc_function_is_valid(epc, func_no, vfunc_no))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!size || !map)
>>>> +		return -EINVAL;
>>>> +
>>>> +	memset(map, 0, sizeof(*map));
>>>> +	map->pci_addr = pci_addr;
>>>> +	map->pci_size = size;
>>>> +
>>>> +	if (epc->ops->map_align) {
>>>> +		mutex_lock(&epc->lock);
>>>> +		ret = epc->ops->map_align(epc, func_no, vfunc_no, map);
>>>> +		mutex_unlock(&epc->lock);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Assume a fixed alignment constraint as specified by the controller
>>>> +	 * features.
>>>> +	 */
>>>> +	features = pci_epc_get_features(epc, func_no, vfunc_no);
>>>> +	if (!features || !features->align) {
>>>> +		map->map_pci_addr = pci_addr;
>>>> +		map->map_size = size;
>>>> +		map->map_ofst = 0;
>>>> +	}
>>>
>>> The 'align' of pci_epc_features was initially added only to address the
>>> inbound ATU constraints. This is also added as comment in [1]. The PCI
>>> address restrictions (only fixed alignment constraint) were handled by
>>> the host side driver and depends on the connected endpoint device
>>> (atleast it was like that for pci_endpoint_test.c [2]).
>>> So pci-epf-test.c used the 'align' in pci_epc_features only as part of
>>> pci_epf_alloc_space().
>>>
>>> Though I have abused 'align' of pci_epc_features in pci-epf-ntb.c using
>>> it out of pci_epf_alloc_space(), I think we should keep the 'align' of
>>> pci_epc_features only within pci_epf_alloc_space() and controllers with
>>> any PCI address restrictions to implement ->map_align(). This could as
>>> well be done in a phased manner to let controllers implement
>>> ->map_align() and then remove using  pci_epc_features in
>>> pci_epc_map_align(). Let me know what you think?
> 
> First you say that you want to avoid using epc_features->align inside
> pci_epc_map_align(), and then you say that we could do it in phases,
> and eventually stop using epc_features->align in pci_epc_map_align().
> 
> I'm confused... :)
> 
> Do you really want pci_epc_map_align() to make use of epc_features->align ?

Would like pci_epc_map_align() to not use epc_features->align as 
pci_epc_map_align() is for PCIe address programmed in outbound ATU 
(destination) and epc_features->align is for physical address programmed 
in inbound ATU (target for BAR accesses).

I mentioned "in phases" for if some platforms require 
pci_epc_map_align() to use epc_features->align, we could keep 
epc_features->align in pci_epc_map_align() till all of them implement 
the map_align() callback (and need not be part of this series itself). 
But eventually stop using epc_features->align in pci_epc_map_align() 
once all the platforms that require alignment implement ->map_align().

 > > Don't you mean ep->page_size ?
> (Please read the whole email to see my reasoning.)

page_size is for two purposes:
1) Dividing into fixed size blocks the outbound address space for 
simpler memory management
2) Alignment restrictions for source address (outbound address space) of 
the outbound ATU (so page_size is for source alignment restriction in 
outbound ATU and pci_epc_map_align() is for destination alignment 
restriction in outbound ATU. Source address would be an address in the 
outbound address space and destination address would be PCI address 
usually provided by the host).

Even if some platform doesn't require source alignment restriction, it 
still has to divide into fixed size blocks.
> 
> 
>>
>> Yep, good idea. I will remove the use of "align" as a default alignment
>> constraint. For controllers that have a fixed alignment constraint (not
>> necessarilly epc->features->align), it is trivial to provide a generic helper
>> function that implements the ->map_align method.
> 
> We can see that commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a9a801620efac92885fc9cd53594c0b9aba87a4
> 
> Introduced epc_features->align and modified pci_epf_alloc_space() to use it.
> 
>  From reading the commit, it appears that epc_features->align was intended to
> represent inbound iATU alignment requirement.
> 
> For DWC based controllers, the inbound iATU address must be aligned to:
> CX_ATU_MIN_REGION_SIZE.
> 
> AFAICT, epc_features->align currently has nothing to do with traffic outbound
> from the EP.

It was initially added that way but I abused that in pci-epf-ntb.c
> 
> 
> For aligning the reads/writes to buffers allocated on the host side,
> we currently have .alignment in the host side driver:
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L966-L1021
> 
> Which should be set to the outbound iATU alignment requirement.
> 
> For DWC based controllers, the outbound iATU address must be aligned to:
> CX_ATU_MIN_REGION_SIZE.
> 
> 
> Additionally, we have ep->page_size, which defines the smallest outbound unit
> that can be mapped.
> (On DWC based controllers, tis is CX_ATU_MIN_REGION_SIZE.)
> 
> ep->page_size is used to specify the outbound alignment for e.g.
> dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq():
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L488
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/pci/controller/dwc/pcie-designware-ep.c#L555
> 
> which makes sure that we can write to the RC side MSI/MSI-X address
> while satisfying the outbound iATU alignment requirement.
> 
> See also:
> https://lore.kernel.org/linux-pci/20240402-pci2_upstream-v3-2-803414bdb430@nxp.com/
> 
> 
> 
> Now I understand that rockchip is the first one that does not have a fixed
> alignment.
> So for that platform, map_align() will be different from ep->page_size.
> (For all DWC based drivers the outbound iATU alignment requirement is
> the same as the page size.)
> 
> However, it would be nice if:
> 1) We could have a default implementation of map_align() that by default uses
> ep->page_size. Platforms that have non-fixed alignment requirements could
> define their own map_align().

IMHO generic/core functions should not overload ep->page_size or 
epc_features->align as each of them has their own specific purpose.

> 
> 2) We fix dw_pcie_ep_raise_msi_irq() and dw_pcie_ep_raise_msix_irq() to use
> the new pci_epc_map_align().
> 
> 3) It is getting too complicated with all these...
> epc_features->align, ep->page_size, map_align(), and .alignment in host driver.
> I think that we need to document each of these in Documentation/PCI/endpoint/

right, .alignment should be deprecated and each of the others should be 
documented to indicate the specific purpose it's added for.
> 
> 4) It would be nice if we could set page_size correctly for all the PCI device
> and vendor IDs that have defined an .alignment in drivers/misc/pci_endpoint_test.c
> in the correct EPC driver. That way, we should be able to completely remove all
> .alignment specified in drivers/misc/pci_endpoint_test.c.

We should remove .alignment specified in 
drivers/misc/pci_endpoint_test.c and each EPC driver should populate 
->map_align() callback to provide the correct alignment. Don't think 
this should change the page_size.
> 
> 5) Unfortunately drivers/misc/pci_endpoint_test.c defines a default alignment
> of 4K:
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L968
> https://github.com/torvalds/linux/blob/v6.9-rc2/drivers/misc/pci_endpoint_test.c#L820
> 
> It would be nice if we could get rid of this as well. Or perhaps add an option
> to pci_test so that it does not use this 4k alignment, such that we can verify
> that pci_epc_map_align() is actually working.

+1

Thanks,
Kishon

> 
> 
> 
> In my opinion 4) is the biggest win with this series, because it means that
> we define the alignment in the EPC driver, instead of needing to define it in
> each and every host side driver. But right now, this great improvement is not
> really visible for someone looking quickly at the current series.
> 
> 
> Kind regards,
> Niklas