mbox series

[v3,0/7] Add-DMA-MDMA-chaining-support

Message ID 1538139715-24406-1-git-send-email-pierre-yves.mordret@st.com
Headers show
Series Add-DMA-MDMA-chaining-support | expand

Message

Pierre Yves MORDRET Sept. 28, 2018, 1:01 p.m. UTC
This serie adds support for M2M transfer triggered by STM32 DMA in order to
transfer data from/to SRAM to/from DDR.

Normally, this mode should not be needed as transferring data from/to DDR
is supported by the STM32 DMA.
However, the STM32 DMA don't have the ability to generate burst transfer
on the DDR as it only embeds only a 4-word FIFO although the minimal burst
length on the DDR is 8 words.
Due to this constraint, the STM32 DMA transfers data from/to DDR in a
single way and could lead to pollute the DDR.
To avoid this, we have to use SRAM for all transfers where STM32 DMA is
involved.

An Hw design has been specially put in place to allow this chaining where DMA
interrupt is connected on GIC and MDMA request line as well. This grants the
possibility to trigger an MDMA transfer from the completion of DMA.
At the same time MDMA has the ability to acknowlege DMA. The aim is to have an
self refreching mechanism to transfer from/to device to/from DDR with minimal
sw support.
For instance the DMA is set in cyclic double buffering to feed SRAM and MDMA
transfer to DDR thanks to LLI.

---
  Version history:
    v3:
       * Solve KBuild warnings
    v2:
       * Rework binding content
    v1:
       * Initial
---

Pierre-Yves MORDRET (3):
  dt-bindings: stm32-dma: Add DMA/MDMA chaining support bindings
  dt-bindings: stm32-dmamux: Add one cell to support DMA/MDMA chain
  dt-bindings: stm32-mdma: Add DMA/MDMA chaining support bindings
  dmaengine: stm32-dma: Add DMA/MDMA chaining support
  dmaengine: stm32-mdma: Add DMA/MDMA chaining support
  dmaengine: stm32-dma: enable descriptor_reuse
  dmaengine: stm32-mdma: enable descriptor_reuse

 .../devicetree/bindings/dma/stm32-dma.txt          |  27 +-
 .../devicetree/bindings/dma/stm32-dmamux.txt       |   6 +-
 .../devicetree/bindings/dma/stm32-mdma.txt         |  12 +-
 drivers/dma/stm32-dma.c                            | 903 ++++++++++++++++++---
 drivers/dma/stm32-mdma.c                           | 133 ++-
 5 files changed, 949 insertions(+), 132 deletions(-)

Comments

Vinod Koul Oct. 7, 2018, 4 p.m. UTC | #1
On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> This patch adds support of DMA/MDMA chaining support.
> It introduces an intermediate transfer between peripherals and STM32 DMA.
> This intermediate transfer is triggered by SW for single M2D transfer and
> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> 
> A generic SRAM allocator is used for this intermediate buffer
> Each DMA channel will be able to define its SRAM needs to achieve chaining
> feature : (2 ^ order) * PAGE_SIZE.
> For cyclic, SRAM buffer is derived from period length (rounded on
> PAGE_SIZE).

So IIUC, you chain two dma txns together and transfer data via an SRAM?

> 
> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
> ---
>   Version history:
>     v3:
>        * Solve KBuild warning
>     v2:
>     v1:
>        * Initial
> ---
> ---
>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------

that is a lot of change for a driver, consider splitting it up
logically in smaller changes...

>  1 file changed, 772 insertions(+), 107 deletions(-)
> 
> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
> index 379e8d5..85e81c4 100644
> --- a/drivers/dma/stm32-dma.c
> +++ b/drivers/dma/stm32-dma.c
> @@ -15,11 +15,14 @@
>  #include <linux/dmaengine.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/err.h>
> +#include <linux/genalloc.h>
>  #include <linux/init.h>
> +#include <linux/iopoll.h>
>  #include <linux/jiffies.h>
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> +#include <linux/of_address.h>
>  #include <linux/of_device.h>
>  #include <linux/of_dma.h>
>  #include <linux/platform_device.h>
> @@ -118,6 +121,7 @@
>  #define STM32_DMA_FIFO_THRESHOLD_FULL			0x03
>  
>  #define STM32_DMA_MAX_DATA_ITEMS	0xffff
> +#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
>  /*
>   * Valid transfer starts from @0 to @0xFFFE leading to unaligned scatter
>   * gather at boundary. Thus it's safer to round down this value on FIFO
> @@ -135,6 +139,12 @@
>  /* DMA Features */
>  #define STM32_DMA_THRESHOLD_FTR_MASK	GENMASK(1, 0)
>  #define STM32_DMA_THRESHOLD_FTR_GET(n)	((n) & STM32_DMA_THRESHOLD_FTR_MASK)
> +#define STM32_DMA_MDMA_CHAIN_FTR_MASK	BIT(2)
> +#define STM32_DMA_MDMA_CHAIN_FTR_GET(n)	(((n) & STM32_DMA_MDMA_CHAIN_FTR_MASK) \
> +					 >> 2)
> +#define STM32_DMA_MDMA_SRAM_SIZE_MASK	GENMASK(4, 3)
> +#define STM32_DMA_MDMA_SRAM_SIZE_GET(n)	(((n) & STM32_DMA_MDMA_SRAM_SIZE_MASK) \
> +					 >> 3)
>  
>  enum stm32_dma_width {
>  	STM32_DMA_BYTE,
> @@ -176,15 +186,31 @@ struct stm32_dma_chan_reg {
>  	u32 dma_sfcr;
>  };
>  
> +struct stm32_dma_mdma_desc {
> +	struct sg_table sgt;
> +	struct dma_async_tx_descriptor *desc;
> +};
> +
> +struct stm32_dma_mdma {
> +	struct dma_chan *chan;
> +	enum dma_transfer_direction dir;
> +	dma_addr_t sram_buf;
> +	u32 sram_period;
> +	u32 num_sgs;
> +};
> +
>  struct stm32_dma_sg_req {
> -	u32 len;
> +	struct scatterlist stm32_sgl_req;
>  	struct stm32_dma_chan_reg chan_reg;
> +	struct stm32_dma_mdma_desc m_desc;
>  };
>  
>  struct stm32_dma_desc {
>  	struct virt_dma_desc vdesc;
>  	bool cyclic;
>  	u32 num_sgs;
> +	dma_addr_t dma_buf;
> +	void *dma_buf_cpu;
>  	struct stm32_dma_sg_req sg_req[];
>  };
>  
> @@ -201,6 +227,10 @@ struct stm32_dma_chan {
>  	u32 threshold;
>  	u32 mem_burst;
>  	u32 mem_width;
> +	struct stm32_dma_mdma mchan;
> +	u32 use_mdma;
> +	u32 sram_size;
> +	u32 residue_after_drain;
>  };
>  
>  struct stm32_dma_device {
> @@ -210,6 +240,7 @@ struct stm32_dma_device {
>  	struct reset_control *rst;
>  	bool mem2mem;
>  	struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS];
> +	struct gen_pool *sram_pool;
>  };
>  
>  static struct stm32_dma_device *stm32_dma_get_dev(struct stm32_dma_chan *chan)
> @@ -497,11 +528,15 @@ static void stm32_dma_stop(struct stm32_dma_chan *chan)
>  static int stm32_dma_terminate_all(struct dma_chan *c)
>  {
>  	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>  	unsigned long flags;
>  	LIST_HEAD(head);
>  
>  	spin_lock_irqsave(&chan->vchan.lock, flags);
>  
> +	if (chan->use_mdma)
> +		dmaengine_terminate_async(mchan->chan);
> +
>  	if (chan->busy) {
>  		stm32_dma_stop(chan);
>  		chan->desc = NULL;
> @@ -514,9 +549,96 @@ static int stm32_dma_terminate_all(struct dma_chan *c)
>  	return 0;
>  }
>  
> +static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
> +{
> +	u32 dma_scr, width, ndtr;
> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
> +
> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
> +	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
> +	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
> +
> +	return ndtr << width;
> +}
> +
> +static int stm32_dma_mdma_drain(struct stm32_dma_chan *chan)
> +{
> +	struct stm32_dma_mdma *mchan = &chan->mchan;
> +	struct stm32_dma_sg_req *sg_req;
> +	struct dma_device *ddev = mchan->chan->device;
> +	struct dma_async_tx_descriptor *desc = NULL;
> +	enum dma_status status;
> +	dma_addr_t src_buf, dst_buf;
> +	u32 mdma_residue, mdma_wrote, dma_to_write, len;
> +	struct dma_tx_state state;
> +	int ret;
> +
> +	/* DMA/MDMA chain: drain remaining data in SRAM */
> +
> +	/* Get the residue on MDMA side */
> +	status = dmaengine_tx_status(mchan->chan, mchan->chan->cookie, &state);
> +	if (status == DMA_COMPLETE)
> +		return status;
> +
> +	mdma_residue = state.residue;
> +	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
> +	len = sg_dma_len(&sg_req->stm32_sgl_req);
> +
> +	/*
> +	 * Total = mdma blocks * sram_period + rest (< sram_period)
> +	 * so mdma blocks * sram_period = len - mdma residue - rest
> +	 */
> +	mdma_wrote = len - mdma_residue - (len % mchan->sram_period);
> +
> +	/* Remaining data stuck in SRAM */
> +	dma_to_write = mchan->sram_period - stm32_dma_get_remaining_bytes(chan);
> +	if (dma_to_write > 0) {
> +		/* Stop DMA current operation */
> +		stm32_dma_disable_chan(chan);
> +
> +		/* Terminate current MDMA to initiate a new one */
> +		dmaengine_terminate_all(mchan->chan);
> +
> +		/* Double buffer management */
> +		src_buf = mchan->sram_buf +
> +			  ((mdma_wrote / mchan->sram_period) & 0x1) *
> +			  mchan->sram_period;
> +		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + mdma_wrote;
> +
> +		desc = ddev->device_prep_dma_memcpy(mchan->chan,
> +						    dst_buf, src_buf,
> +						    dma_to_write,
> +						    DMA_PREP_INTERRUPT);

why would you do that?

If at all you need to create anothe txn, I think it would be good to
prepare a new descriptor and chain it, not call the dmaengine APIs..
Pierre Yves MORDRET Oct. 9, 2018, 8:40 a.m. UTC | #2
On 10/07/2018 06:00 PM, Vinod wrote:
> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>> This patch adds support of DMA/MDMA chaining support.
>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>> This intermediate transfer is triggered by SW for single M2D transfer and
>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>
>> A generic SRAM allocator is used for this intermediate buffer
>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>> feature : (2 ^ order) * PAGE_SIZE.
>> For cyclic, SRAM buffer is derived from period length (rounded on
>> PAGE_SIZE).
> 
> So IIUC, you chain two dma txns together and transfer data via an SRAM?

Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
Intermediate transfer is between device and memory.
This intermediate transfer is using SDRAM.

> 
>>
>> Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
>> ---
>>   Version history:
>>     v3:
>>        * Solve KBuild warning
>>     v2:
>>     v1:
>>        * Initial
>> ---
>> ---
>>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
> 
> that is a lot of change for a driver, consider splitting it up
> logically in smaller changes...
> 

This feature is rather monolithic. Difficult to split up.
All the code is required at once.

>>  1 file changed, 772 insertions(+), 107 deletions(-)
>>
>> diff --git a/drivers/dma/stm32-dma.c b/drivers/dma/stm32-dma.c
>> index 379e8d5..85e81c4 100644
>> --- a/drivers/dma/stm32-dma.c
>> +++ b/drivers/dma/stm32-dma.c
>> @@ -15,11 +15,14 @@
>>  #include <linux/dmaengine.h>
>>  #include <linux/dma-mapping.h>
>>  #include <linux/err.h>
>> +#include <linux/genalloc.h>
>>  #include <linux/init.h>
>> +#include <linux/iopoll.h>
>>  #include <linux/jiffies.h>
>>  #include <linux/list.h>
>>  #include <linux/module.h>
>>  #include <linux/of.h>
>> +#include <linux/of_address.h>
>>  #include <linux/of_device.h>
>>  #include <linux/of_dma.h>
>>  #include <linux/platform_device.h>
>> @@ -118,6 +121,7 @@
>>  #define STM32_DMA_FIFO_THRESHOLD_FULL			0x03
>>  
>>  #define STM32_DMA_MAX_DATA_ITEMS	0xffff
>> +#define STM32_DMA_SRAM_GRANULARITY	PAGE_SIZE
>>  /*
>>   * Valid transfer starts from @0 to @0xFFFE leading to unaligned scatter
>>   * gather at boundary. Thus it's safer to round down this value on FIFO
>> @@ -135,6 +139,12 @@
>>  /* DMA Features */
>>  #define STM32_DMA_THRESHOLD_FTR_MASK	GENMASK(1, 0)
>>  #define STM32_DMA_THRESHOLD_FTR_GET(n)	((n) & STM32_DMA_THRESHOLD_FTR_MASK)
>> +#define STM32_DMA_MDMA_CHAIN_FTR_MASK	BIT(2)
>> +#define STM32_DMA_MDMA_CHAIN_FTR_GET(n)	(((n) & STM32_DMA_MDMA_CHAIN_FTR_MASK) \
>> +					 >> 2)
>> +#define STM32_DMA_MDMA_SRAM_SIZE_MASK	GENMASK(4, 3)
>> +#define STM32_DMA_MDMA_SRAM_SIZE_GET(n)	(((n) & STM32_DMA_MDMA_SRAM_SIZE_MASK) \
>> +					 >> 3)
>>  
>>  enum stm32_dma_width {
>>  	STM32_DMA_BYTE,
>> @@ -176,15 +186,31 @@ struct stm32_dma_chan_reg {
>>  	u32 dma_sfcr;
>>  };
>>  
>> +struct stm32_dma_mdma_desc {
>> +	struct sg_table sgt;
>> +	struct dma_async_tx_descriptor *desc;
>> +};
>> +
>> +struct stm32_dma_mdma {
>> +	struct dma_chan *chan;
>> +	enum dma_transfer_direction dir;
>> +	dma_addr_t sram_buf;
>> +	u32 sram_period;
>> +	u32 num_sgs;
>> +};
>> +
>>  struct stm32_dma_sg_req {
>> -	u32 len;
>> +	struct scatterlist stm32_sgl_req;
>>  	struct stm32_dma_chan_reg chan_reg;
>> +	struct stm32_dma_mdma_desc m_desc;
>>  };
>>  
>>  struct stm32_dma_desc {
>>  	struct virt_dma_desc vdesc;
>>  	bool cyclic;
>>  	u32 num_sgs;
>> +	dma_addr_t dma_buf;
>> +	void *dma_buf_cpu;
>>  	struct stm32_dma_sg_req sg_req[];
>>  };
>>  
>> @@ -201,6 +227,10 @@ struct stm32_dma_chan {
>>  	u32 threshold;
>>  	u32 mem_burst;
>>  	u32 mem_width;
>> +	struct stm32_dma_mdma mchan;
>> +	u32 use_mdma;
>> +	u32 sram_size;
>> +	u32 residue_after_drain;
>>  };
>>  
>>  struct stm32_dma_device {
>> @@ -210,6 +240,7 @@ struct stm32_dma_device {
>>  	struct reset_control *rst;
>>  	bool mem2mem;
>>  	struct stm32_dma_chan chan[STM32_DMA_MAX_CHANNELS];
>> +	struct gen_pool *sram_pool;
>>  };
>>  
>>  static struct stm32_dma_device *stm32_dma_get_dev(struct stm32_dma_chan *chan)
>> @@ -497,11 +528,15 @@ static void stm32_dma_stop(struct stm32_dma_chan *chan)
>>  static int stm32_dma_terminate_all(struct dma_chan *c)
>>  {
>>  	struct stm32_dma_chan *chan = to_stm32_dma_chan(c);
>> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>>  	unsigned long flags;
>>  	LIST_HEAD(head);
>>  
>>  	spin_lock_irqsave(&chan->vchan.lock, flags);
>>  
>> +	if (chan->use_mdma)
>> +		dmaengine_terminate_async(mchan->chan);
>> +
>>  	if (chan->busy) {
>>  		stm32_dma_stop(chan);
>>  		chan->desc = NULL;
>> @@ -514,9 +549,96 @@ static int stm32_dma_terminate_all(struct dma_chan *c)
>>  	return 0;
>>  }
>>  
>> +static u32 stm32_dma_get_remaining_bytes(struct stm32_dma_chan *chan)
>> +{
>> +	u32 dma_scr, width, ndtr;
>> +	struct stm32_dma_device *dmadev = stm32_dma_get_dev(chan);
>> +
>> +	dma_scr = stm32_dma_read(dmadev, STM32_DMA_SCR(chan->id));
>> +	width = STM32_DMA_SCR_PSIZE_GET(dma_scr);
>> +	ndtr = stm32_dma_read(dmadev, STM32_DMA_SNDTR(chan->id));
>> +
>> +	return ndtr << width;
>> +}
>> +
>> +static int stm32_dma_mdma_drain(struct stm32_dma_chan *chan)
>> +{
>> +	struct stm32_dma_mdma *mchan = &chan->mchan;
>> +	struct stm32_dma_sg_req *sg_req;
>> +	struct dma_device *ddev = mchan->chan->device;
>> +	struct dma_async_tx_descriptor *desc = NULL;
>> +	enum dma_status status;
>> +	dma_addr_t src_buf, dst_buf;
>> +	u32 mdma_residue, mdma_wrote, dma_to_write, len;
>> +	struct dma_tx_state state;
>> +	int ret;
>> +
>> +	/* DMA/MDMA chain: drain remaining data in SRAM */
>> +
>> +	/* Get the residue on MDMA side */
>> +	status = dmaengine_tx_status(mchan->chan, mchan->chan->cookie, &state);
>> +	if (status == DMA_COMPLETE)
>> +		return status;
>> +
>> +	mdma_residue = state.residue;
>> +	sg_req = &chan->desc->sg_req[chan->next_sg - 1];
>> +	len = sg_dma_len(&sg_req->stm32_sgl_req);
>> +
>> +	/*
>> +	 * Total = mdma blocks * sram_period + rest (< sram_period)
>> +	 * so mdma blocks * sram_period = len - mdma residue - rest
>> +	 */
>> +	mdma_wrote = len - mdma_residue - (len % mchan->sram_period);
>> +
>> +	/* Remaining data stuck in SRAM */
>> +	dma_to_write = mchan->sram_period - stm32_dma_get_remaining_bytes(chan);
>> +	if (dma_to_write > 0) {
>> +		/* Stop DMA current operation */
>> +		stm32_dma_disable_chan(chan);
>> +
>> +		/* Terminate current MDMA to initiate a new one */
>> +		dmaengine_terminate_all(mchan->chan);
>> +
>> +		/* Double buffer management */
>> +		src_buf = mchan->sram_buf +
>> +			  ((mdma_wrote / mchan->sram_period) & 0x1) *
>> +			  mchan->sram_period;
>> +		dst_buf = sg_dma_address(&sg_req->stm32_sgl_req) + mdma_wrote;
>> +
>> +		desc = ddev->device_prep_dma_memcpy(mchan->chan,
>> +						    dst_buf, src_buf,
>> +						    dma_to_write,
>> +						    DMA_PREP_INTERRUPT);
> 
> why would you do that?
> 
> If at all you need to create anothe txn, I think it would be good to
> prepare a new descriptor and chain it, not call the dmaengine APIs..
> 

In this UC, DMAv2 is configured in cyclic mode because this DMA doesn't work
with hw LLI only sw. This is really for performances reason we use this cyclic mode.
This very last txn is to flush remaining bytes stick in SDRAM.
I don't believe I can chain cyclic and this last txn.
Vinod Koul Oct. 10, 2018, 4:03 a.m. UTC | #3
On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> 
> 
> On 10/07/2018 06:00 PM, Vinod wrote:
> > On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >> This patch adds support of DMA/MDMA chaining support.
> >> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >> This intermediate transfer is triggered by SW for single M2D transfer and
> >> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>
> >> A generic SRAM allocator is used for this intermediate buffer
> >> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >> feature : (2 ^ order) * PAGE_SIZE.
> >> For cyclic, SRAM buffer is derived from period length (rounded on
> >> PAGE_SIZE).
> > 
> > So IIUC, you chain two dma txns together and transfer data via an SRAM?
> 
> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> Intermediate transfer is between device and memory.
> This intermediate transfer is using SDRAM.

Ah so you use dma calls to setup mdma xtfers? I dont think that is a
good idea. How do you know you should use mdma for subsequent transfer?


> >>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
> > 
> > that is a lot of change for a driver, consider splitting it up
> > logically in smaller changes...
> > 
> 
> This feature is rather monolithic. Difficult to split up.
> All the code is required at once.

It can be enabled at last but split up logically. Intrusive changes to a
driver make it hard to review..
Pierre Yves MORDRET Oct. 10, 2018, 7:02 a.m. UTC | #4
On 10/10/2018 06:03 AM, Vinod wrote:
> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/07/2018 06:00 PM, Vinod wrote:
>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>> This patch adds support of DMA/MDMA chaining support.
>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>
>>>> A generic SRAM allocator is used for this intermediate buffer
>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>> PAGE_SIZE).
>>>
>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>
>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>> Intermediate transfer is between device and memory.
>> This intermediate transfer is using SDRAM.
> 
> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> good idea. How do you know you should use mdma for subsequent transfer?
> 

When user bindings told to setup chaining intermediate MDMA transfers are always
triggers.
For instance if a user requests a Dev2Mem transfer with chaining. From client
pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
DMAv2 will flip/flop on those 2 buffers.
At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
2 buffers in SDRAM and fills final destination memory.

> 
>>>>  drivers/dma/stm32-dma.c | 879 ++++++++++++++++++++++++++++++++++++++++++------
>>>
>>> that is a lot of change for a driver, consider splitting it up
>>> logically in smaller changes...
>>>
>>
>> This feature is rather monolithic. Difficult to split up.
>> All the code is required at once.
> 
> It can be enabled at last but split up logically. Intrusive changes to a
> driver make it hard to review..
> 
Ok. I will to think about it how to proceed.
Vinod Koul Oct. 15, 2018, 5:14 p.m. UTC | #5
On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
> 
> 
> On 10/10/2018 06:03 AM, Vinod wrote:
> > On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> >>
> >>
> >> On 10/07/2018 06:00 PM, Vinod wrote:
> >>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >>>> This patch adds support of DMA/MDMA chaining support.
> >>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >>>> This intermediate transfer is triggered by SW for single M2D transfer and
> >>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>>>
> >>>> A generic SRAM allocator is used for this intermediate buffer
> >>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >>>> feature : (2 ^ order) * PAGE_SIZE.
> >>>> For cyclic, SRAM buffer is derived from period length (rounded on
> >>>> PAGE_SIZE).
> >>>
> >>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
> >>
> >> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> >> Intermediate transfer is between device and memory.
> >> This intermediate transfer is using SDRAM.
> > 
> > Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> > good idea. How do you know you should use mdma for subsequent transfer?
> > 
> 
> When user bindings told to setup chaining intermediate MDMA transfers are always
> triggers.
> For instance if a user requests a Dev2Mem transfer with chaining. From client
> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
> DMAv2 will flip/flop on those 2 buffers.
> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
> 2 buffers in SDRAM and fills final destination memory.

I am not able to follow is why does it need to be internal, why should
the client not set the two transfers and trigger them?
Pierre Yves MORDRET Oct. 16, 2018, 9:19 a.m. UTC | #6
On 10/15/18 7:14 PM, Vinod wrote:
> On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/10/2018 06:03 AM, Vinod wrote:
>>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>>>
>>>>
>>>> On 10/07/2018 06:00 PM, Vinod wrote:
>>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>>>> This patch adds support of DMA/MDMA chaining support.
>>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>>>
>>>>>> A generic SRAM allocator is used for this intermediate buffer
>>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>>>> PAGE_SIZE).
>>>>>
>>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>>>
>>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>>>> Intermediate transfer is between device and memory.
>>>> This intermediate transfer is using SDRAM.
>>>
>>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
>>> good idea. How do you know you should use mdma for subsequent transfer?
>>>
>>
>> When user bindings told to setup chaining intermediate MDMA transfers are always
>> triggers.
>> For instance if a user requests a Dev2Mem transfer with chaining. From client
>> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
>> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
>> DMAv2 will flip/flop on those 2 buffers.
>> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
>> 2 buffers in SDRAM and fills final destination memory.
> 
> I am not able to follow is why does it need to be internal, why should
> the client not set the two transfers and trigger them?
> 

Client may use or not chaining: defined within DT. API and dynamic are same at
driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
MDMA. This is totally hidden for client. If client sets both this would imply
changing all drivers that may want use chaining. Even more to deal with DMAv2
and MDMA at its level.
Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.

Regards
Vinod Koul Oct. 16, 2018, 2:44 p.m. UTC | #7
On 16-10-18, 11:19, Pierre Yves MORDRET wrote:
> 
> 
> On 10/15/18 7:14 PM, Vinod wrote:
> > On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
> >>
> >>
> >> On 10/10/2018 06:03 AM, Vinod wrote:
> >>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
> >>>>
> >>>>
> >>>> On 10/07/2018 06:00 PM, Vinod wrote:
> >>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
> >>>>>> This patch adds support of DMA/MDMA chaining support.
> >>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
> >>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
> >>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
> >>>>>>
> >>>>>> A generic SRAM allocator is used for this intermediate buffer
> >>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
> >>>>>> feature : (2 ^ order) * PAGE_SIZE.
> >>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
> >>>>>> PAGE_SIZE).
> >>>>>
> >>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
> >>>>
> >>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
> >>>> Intermediate transfer is between device and memory.
> >>>> This intermediate transfer is using SDRAM.
> >>>
> >>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
> >>> good idea. How do you know you should use mdma for subsequent transfer?
> >>>
> >>
> >> When user bindings told to setup chaining intermediate MDMA transfers are always
> >> triggers.
> >> For instance if a user requests a Dev2Mem transfer with chaining. From client
> >> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
> >> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
> >> DMAv2 will flip/flop on those 2 buffers.
> >> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
> >> 2 buffers in SDRAM and fills final destination memory.
> > 
> > I am not able to follow is why does it need to be internal, why should
> > the client not set the two transfers and trigger them?
> > 
> 
> Client may use or not chaining: defined within DT. API and dynamic are same at

That should be upto client... As a dmaengine driver you should enable
data transfer from src to dstn.

> driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
> MDMA. This is totally hidden for client. If client sets both this would imply

Why should a controller be hidden from user, I dont see why that would
be a good thing

> changing all drivers that may want use chaining. Even more to deal with DMAv2
> and MDMA at its level.
> Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.

It is not about changes, it is about the SW model you want to have.

The intermediate SRAM transfers should not be made within DMAengine
driver, client can chose to have two transfers and couple or not, it is
upto them to choose. Sorry I do not like this abstraction and would like
to see a cleaner approach
Pierre Yves MORDRET Oct. 19, 2018, 9:21 a.m. UTC | #8
On 10/16/18 4:44 PM, Vinod wrote:
> On 16-10-18, 11:19, Pierre Yves MORDRET wrote:
>>
>>
>> On 10/15/18 7:14 PM, Vinod wrote:
>>> On 10-10-18, 09:02, Pierre Yves MORDRET wrote:
>>>>
>>>>
>>>> On 10/10/2018 06:03 AM, Vinod wrote:
>>>>> On 09-10-18, 10:40, Pierre Yves MORDRET wrote:
>>>>>>
>>>>>>
>>>>>> On 10/07/2018 06:00 PM, Vinod wrote:
>>>>>>> On 28-09-18, 15:01, Pierre-Yves MORDRET wrote:
>>>>>>>> This patch adds support of DMA/MDMA chaining support.
>>>>>>>> It introduces an intermediate transfer between peripherals and STM32 DMA.
>>>>>>>> This intermediate transfer is triggered by SW for single M2D transfer and
>>>>>>>> by STM32 DMA IP for all other modes (sg, cyclic) and direction (D2M).
>>>>>>>>
>>>>>>>> A generic SRAM allocator is used for this intermediate buffer
>>>>>>>> Each DMA channel will be able to define its SRAM needs to achieve chaining
>>>>>>>> feature : (2 ^ order) * PAGE_SIZE.
>>>>>>>> For cyclic, SRAM buffer is derived from period length (rounded on
>>>>>>>> PAGE_SIZE).
>>>>>>>
>>>>>>> So IIUC, you chain two dma txns together and transfer data via an SRAM?
>>>>>>
>>>>>> Correct. one DMA is DMAv2 (stm32-dma) and the other is MDMA(stm32-mdma).
>>>>>> Intermediate transfer is between device and memory.
>>>>>> This intermediate transfer is using SDRAM.
>>>>>
>>>>> Ah so you use dma calls to setup mdma xtfers? I dont think that is a
>>>>> good idea. How do you know you should use mdma for subsequent transfer?
>>>>>
>>>>
>>>> When user bindings told to setup chaining intermediate MDMA transfers are always
>>>> triggers.
>>>> For instance if a user requests a Dev2Mem transfer with chaining. From client
>>>> pov this is still a prep_slave_sg. Internally DMAv2 is setup in cyclic mode (in
>>>> double buffer mode indeed => 2 buffer of PAGE_SIZE/2) and destination is SDRAM.
>>>> DMAv2 will flip/flop on those 2 buffers.
>>>> At the same time DMAv2 driver prepares a MDMA SG that will fetch data from those
>>>> 2 buffers in SDRAM and fills final destination memory.
>>>
>>> I am not able to follow is why does it need to be internal, why should
>>> the client not set the two transfers and trigger them?
>>>
>>
>> Client may use or not chaining: defined within DT. API and dynamic are same at
> 
> That should be upto client... As a dmaengine driver you should enable
> data transfer from src to dstn.
> 
>> driver client level. Moreover driver exposes only DMAv2 and not both DMAv2 and
>> MDMA. This is totally hidden for client. If client sets both this would imply
> 
> Why should a controller be hidden from user, I dont see why that would
> be a good thing
> 
>> changing all drivers that may want use chaining. Even more to deal with DMAv2
>> and MDMA at its level.
>> Since DMAv2 deals with MDMA, all drivers are same as before. no changes required.
> 
> It is not about changes, it is about the SW model you want to have.
> 
> The intermediate SRAM transfers should not be made within DMAengine
> driver, client can chose to have two transfers and couple or not, it is
> upto them to choose. Sorry I do not like this abstraction and would like
> to see a cleaner approach
> 

What we have done it to hide all the complexity related to DMA engine:
synchronization, residue and many other topics solved by this approach. If this
is up to client to perform intermediate transfer, each client drivers using
chaining will need to duplicate the required sw.
This approach is present as a feature from driver pov.

Regards