mbox series

[0/2] Apple SIO driver

Message ID 20230712133806.4450-1-povik+lin@cutebit.org
Headers show
Series Apple SIO driver | expand

Message

Martin Povišer July 12, 2023, 1:38 p.m. UTC
Hi all,

see attached a driver for the SIO coprocessor found on recent Apple
SoCs. This coprocessor provides general DMA services, it can feed
many peripherals but so far it seems it will only be useful for
audio output over HDMI/DisplayPort. So the driver here only supports
the DMA_CYCLIC mode of transactions with the focus being on audio.
There's a downstream prototype ALSA driver the DMA driver is being
tested against.

Some of the boilerplate code in implementing the dmaengine interface
was lifted from apple-admac.c. Among other things these two drivers
have in common that they implement the DMA_CYCLIC regime on top of
hardware/coprocessor layer supporting linear transactions only.

The binding schema saw two RFC rounds before and has a reviewed-by
from Rob.
https://lore.kernel.org/asahi/167693643966.613996.10372170526471864080.robh@kernel.org

Best regards,
Martin

Martin Povišer (2):
  dt-bindings: dma: apple,sio: Add schema
  dmaengine: apple-sio: Add Apple SIO driver

 .../devicetree/bindings/dma/apple,sio.yaml    | 111 ++
 MAINTAINERS                                   |   2 +
 drivers/dma/Kconfig                           |  10 +
 drivers/dma/Makefile                          |   1 +
 drivers/dma/apple-sio.c                       | 956 ++++++++++++++++++
 5 files changed, 1080 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/apple,sio.yaml
 create mode 100644 drivers/dma/apple-sio.c

Comments

Krzysztof Kozlowski July 12, 2023, 7:42 p.m. UTC | #1
On 12/07/2023 15:38, Martin Povišer wrote:
> Hi all,
> 
> see attached a driver for the SIO coprocessor found on recent Apple
> SoCs. This coprocessor provides general DMA services, it can feed
> many peripherals but so far it seems it will only be useful for
> audio output over HDMI/DisplayPort. So the driver here only supports
> the DMA_CYCLIC mode of transactions with the focus being on audio.
> There's a downstream prototype ALSA driver the DMA driver is being
> tested against.
> 
> Some of the boilerplate code in implementing the dmaengine interface
> was lifted from apple-admac.c. Among other things these two drivers
> have in common that they implement the DMA_CYCLIC regime on top of
> hardware/coprocessor layer supporting linear transactions only.
> 
> The binding schema saw two RFC rounds before and has a reviewed-by
> from Rob.
> https://lore.kernel.org/asahi/167693643966.613996.10372170526471864080.robh@kernel.org

Thank you for explanation. Then this is v3, not v1.

No need for resending, but if it happens, consider naming it v4. :)

Best regards,
Krzysztof
Vinod Koul Aug. 1, 2023, 6:14 p.m. UTC | #2
On 12-07-23, 15:38, Martin Povišer wrote:

> +struct sio_chan {
> +	unsigned int no;
> +	struct sio_data *host;
> +	struct dma_chan chan;
> +	struct tasklet_struct tasklet;
> +	struct work_struct terminate_wq;
> +
> +	spinlock_t lock;
> +	struct sio_tx *current_tx;
> +	/*
> +	 * 'tx_cookie' is used for distinguishing between transactions from
> +	 * within tag ack/nack callbacks. Without it, we would have no way
> +	 * of knowing if the current transaction is the one the callback handler
> +	 * was installed for.

not sure what you mean by here.. I dont see why you would need to store
cookie here, care to explain?

> +	 */
> +	unsigned long tx_cookie;
> +	int nperiod_acks;
> +
> +	/*
> +	 * We maintain a 'submitted' and 'issued' list mainly for interface
> +	 * correctness. Typical use of the driver (per channel) will be
> +	 * prepping, submitting and issuing a single cyclic transaction which
> +	 * will stay current until terminate_all is called.
> +	 */
> +	struct list_head submitted;
> +	struct list_head issued;
> +
> +	struct list_head to_free;

can you use virt_dma_chan, that should simplify list handling etc

> +};
> +
> +#define SIO_NTAGS		16
> +
> +typedef void (*sio_ack_callback)(struct sio_chan *, void *, bool);

any reason not to use dmaengine callbacks?

> +static int sio_alloc_tag(struct sio_data *sio)
> +{
> +	struct sio_tagdata *tags = &sio->tags;
> +	int tag, i;
> +
> +	/*
> +	 * Because tag number 0 is special, the usable tag range
> +	 * is 1...(SIO_NTAGS - 1). So, to pick the next usable tag,
> +	 * we do modulo (SIO_NTAGS - 1) *then* plus one.
> +	 */
> +
> +#define SIO_USABLE_TAGS (SIO_NTAGS - 1)
> +	tag = (READ_ONCE(tags->last_tag) % SIO_USABLE_TAGS) + 1;
> +
> +	for (i = 0; i < SIO_USABLE_TAGS; i++) {
> +		if (!test_and_set_bit(tag, &tags->allocated))
> +			break;
> +
> +		tag = (tag % SIO_USABLE_TAGS) + 1;
> +	}
> +
> +	WRITE_ONCE(tags->last_tag, tag);
> +
> +	if (i < SIO_USABLE_TAGS)
> +		return tag;
> +	else
> +		return -EBUSY;
> +#undef SIO_USABLE_TAGS
> +}

can you use kernel mechanisms like ida to alloc and free the tags...

> +static struct dma_async_tx_descriptor *sio_prep_dma_cyclic(
> +		struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> +		size_t period_len, enum dma_transfer_direction direction,
> +		unsigned long flags)
> +{
> +	struct sio_chan *siochan = to_sio_chan(chan);
> +	struct sio_tx *siotx = NULL;
> +	int i, nperiods = buf_len / period_len;
> +
> +	if (direction != sio_chan_direction(siochan->no))
> +		return NULL;
> +
> +	siotx = kzalloc(struct_size(siotx, siodesc, nperiods), GFP_NOWAIT);
> +	if (!siotx)
> +		return NULL;
> +
> +	init_completion(&siotx->done);
> +	dma_async_tx_descriptor_init(&siotx->tx, chan);
> +	siotx->period_len = period_len;
> +	siotx->nperiods = nperiods;
> +
> +	for (i = 0; i < nperiods; i++) {
> +		struct sio_coproc_desc *d;
> +
> +		siotx->siodesc[i] = d = sio_alloc_desc(siochan->host);
> +		if (!d) {
> +			sio_tx_free(&siotx->tx);
> +			return NULL;
> +		}
> +
> +		d->flag = 1; // not sure what's up with this
> +		d->iova = buf_addr + period_len * i;
> +		d->size = period_len;
> +	}
> +	dma_wmb();

why use barrier here? and to what purpose..
Martin Povišer Aug. 1, 2023, 9:55 p.m. UTC | #3
Hi Vinod!

> On 1. 8. 2023, at 20:14, Vinod Koul <vkoul@kernel.org> wrote:
> 
> On 12-07-23, 15:38, Martin Povišer wrote:
> 
>> +struct sio_chan {
>> +	unsigned int no;
>> +	struct sio_data *host;
>> +	struct dma_chan chan;
>> +	struct tasklet_struct tasklet;
>> +	struct work_struct terminate_wq;
>> +
>> +	spinlock_t lock;
>> +	struct sio_tx *current_tx;
>> +	/*
>> +	 * 'tx_cookie' is used for distinguishing between transactions from
>> +	 * within tag ack/nack callbacks. Without it, we would have no way
>> +	 * of knowing if the current transaction is the one the callback handler
>> +	 * was installed for.
> 
> not sure what you mean by here.. I dont see why you would need to store
> cookie here, care to explain?

I could have clarified this is not meant to be the dmaengine cookie, just
a driver-level cookie to address a race between

	a dmaengine user calling terminate_all to terminate a running
	cyclic transaction, then issuing a new one

on one hand, and

	the coprocessor acking the issuing of one of the coprocessor
	transactions that correspond to the first dmaengine transaction

on the other hand. With the cookie the driver should not get confused
about which dmaengine transaction the ACK belongs to, since if `current_tx`
changed in the meantime the cookie won’t match.

But now that I look at it... huh, I never increment that `tx_cookie` field!
I don’t know if I have considered using the dmaengine cookie to the same
effect. Maybe we can do that, I see how that would be much desirable.

>> +	 */
>> +	unsigned long tx_cookie;
>> +	int nperiod_acks;
>> +
>> +	/*
>> +	 * We maintain a 'submitted' and 'issued' list mainly for interface
>> +	 * correctness. Typical use of the driver (per channel) will be
>> +	 * prepping, submitting and issuing a single cyclic transaction which
>> +	 * will stay current until terminate_all is called.
>> +	 */
>> +	struct list_head submitted;
>> +	struct list_head issued;
>> +
>> +	struct list_head to_free;
> 
> can you use virt_dma_chan, that should simplify list handling etc

I looked into that when I wrote the sister driver apple-admac.c, I don’t
remember anymore why I decided against it, and I don’t think it came up
during review. Now that this driver is done, I hope we can take it as is.

There’s some benefit from the drivers having a similar structure, I sent
one or two fixes to apple-admac for things I found out because I was
writing this other driver.

>> +};
>> +
>> +#define SIO_NTAGS		16
>> +
>> +typedef void (*sio_ack_callback)(struct sio_chan *, void *, bool);
> 
> any reason not to use dmaengine callbacks?

Not sure what dmaengine callback you mean here. This callback means
the coprocessor acked a tag, not sure how we can fit something dmaengine
onto it.

>> +static int sio_alloc_tag(struct sio_data *sio)
>> +{
>> +	struct sio_tagdata *tags = &sio->tags;
>> +	int tag, i;
>> +
>> +	/*
>> +	 * Because tag number 0 is special, the usable tag range
>> +	 * is 1...(SIO_NTAGS - 1). So, to pick the next usable tag,
>> +	 * we do modulo (SIO_NTAGS - 1) *then* plus one.
>> +	 */
>> +
>> +#define SIO_USABLE_TAGS (SIO_NTAGS - 1)
>> +	tag = (READ_ONCE(tags->last_tag) % SIO_USABLE_TAGS) + 1;
>> +
>> +	for (i = 0; i < SIO_USABLE_TAGS; i++) {
>> +		if (!test_and_set_bit(tag, &tags->allocated))
>> +			break;
>> +
>> +		tag = (tag % SIO_USABLE_TAGS) + 1;
>> +	}
>> +
>> +	WRITE_ONCE(tags->last_tag, tag);
>> +
>> +	if (i < SIO_USABLE_TAGS)
>> +		return tag;
>> +	else
>> +		return -EBUSY;
>> +#undef SIO_USABLE_TAGS
>> +}
> 
> can you use kernel mechanisms like ida to alloc and free the tags...

I can look into that.

>> +static struct dma_async_tx_descriptor *sio_prep_dma_cyclic(
>> +		struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
>> +		size_t period_len, enum dma_transfer_direction direction,
>> +		unsigned long flags)
>> +{
>> +	struct sio_chan *siochan = to_sio_chan(chan);
>> +	struct sio_tx *siotx = NULL;
>> +	int i, nperiods = buf_len / period_len;
>> +
>> +	if (direction != sio_chan_direction(siochan->no))
>> +		return NULL;
>> +
>> +	siotx = kzalloc(struct_size(siotx, siodesc, nperiods), GFP_NOWAIT);
>> +	if (!siotx)
>> +		return NULL;
>> +
>> +	init_completion(&siotx->done);
>> +	dma_async_tx_descriptor_init(&siotx->tx, chan);
>> +	siotx->period_len = period_len;
>> +	siotx->nperiods = nperiods;
>> +
>> +	for (i = 0; i < nperiods; i++) {
>> +		struct sio_coproc_desc *d;
>> +
>> +		siotx->siodesc[i] = d = sio_alloc_desc(siochan->host);
>> +		if (!d) {
>> +			sio_tx_free(&siotx->tx);
>> +			return NULL;
>> +		}
>> +
>> +		d->flag = 1; // not sure what's up with this
>> +		d->iova = buf_addr + period_len * i;
>> +		d->size = period_len;
>> +	}
>> +	dma_wmb();
> 
> why use barrier here? and to what purpose..

Few lines above we are modifying a shared memory buffer that’s mapped into
the coprocessor’s address space (it’s what `d` points to).

> -- 
> ~Vinod
> 

Best regards, Martin
Martin Povišer Aug. 3, 2023, 8:32 a.m. UTC | #4
> On 1. 8. 2023, at 23:55, Martin Povišer <povik+lin@cutebit.org> wrote:
> 
> Hi Vinod!
> 
>> On 1. 8. 2023, at 20:14, Vinod Koul <vkoul@kernel.org> wrote:
>> 
>> On 12-07-23, 15:38, Martin Povišer wrote:
>> 
>>> +struct sio_chan {
>>> +	unsigned int no;
>>> +	struct sio_data *host;
>>> +	struct dma_chan chan;
>>> +	struct tasklet_struct tasklet;
>>> +	struct work_struct terminate_wq;
>>> +
>>> +	spinlock_t lock;
>>> +	struct sio_tx *current_tx;
>>> +	/*
>>> +	 * 'tx_cookie' is used for distinguishing between transactions from
>>> +	 * within tag ack/nack callbacks. Without it, we would have no way
>>> +	 * of knowing if the current transaction is the one the callback handler
>>> +	 * was installed for.
>> 
>> not sure what you mean by here.. I dont see why you would need to store
>> cookie here, care to explain?
> 
> I could have clarified this is not meant to be the dmaengine cookie, just
> a driver-level cookie to address a race between
> 
> 	a dmaengine user calling terminate_all to terminate a running
> 	cyclic transaction, then issuing a new one
> 
> on one hand, and
> 
> 	the coprocessor acking the issuing of one of the coprocessor
> 	transactions that correspond to the first dmaengine transaction
> 
> on the other hand. With the cookie the driver should not get confused
> about which dmaengine transaction the ACK belongs to, since if `current_tx`
> changed in the meantime the cookie won’t match.
> 
> But now that I look at it... huh, I never increment that `tx_cookie` field!
> I don’t know if I have considered using the dmaengine cookie to the same
> effect. Maybe we can do that, I see how that would be much desirable.

Indeed nothing is stopping us from matching on the dmaengine cookie to
address the race, so I will be dropping this `tx_cookie` field in v2.

>>> +static int sio_alloc_tag(struct sio_data *sio)
>>> +{
>>> +	struct sio_tagdata *tags = &sio->tags;
>>> +	int tag, i;
>>> +
>>> +	/*
>>> +	 * Because tag number 0 is special, the usable tag range
>>> +	 * is 1...(SIO_NTAGS - 1). So, to pick the next usable tag,
>>> +	 * we do modulo (SIO_NTAGS - 1) *then* plus one.
>>> +	 */
>>> +
>>> +#define SIO_USABLE_TAGS (SIO_NTAGS - 1)
>>> +	tag = (READ_ONCE(tags->last_tag) % SIO_USABLE_TAGS) + 1;
>>> +
>>> +	for (i = 0; i < SIO_USABLE_TAGS; i++) {
>>> +		if (!test_and_set_bit(tag, &tags->allocated))
>>> +			break;
>>> +
>>> +		tag = (tag % SIO_USABLE_TAGS) + 1;
>>> +	}
>>> +
>>> +	WRITE_ONCE(tags->last_tag, tag);
>>> +
>>> +	if (i < SIO_USABLE_TAGS)
>>> +		return tag;
>>> +	else
>>> +		return -EBUSY;
>>> +#undef SIO_USABLE_TAGS
>>> +}
>> 
>> can you use kernel mechanisms like ida to alloc and free the tags...
> 
> I can look into that.

Documentation says IDA is deprecated in favour of Xarray, both look
like they serve to associate a pointer with an ID. I think neither
structure beats a simple bitfield and a static array for the per-tag
data. Agree?

Martin
Vinod Koul Aug. 3, 2023, 11:25 a.m. UTC | #5
On 01-08-23, 23:55, Martin Povišer wrote:

> > can you use virt_dma_chan, that should simplify list handling etc
> 
> I looked into that when I wrote the sister driver apple-admac.c, I don’t
> remember anymore why I decided against it, and I don’t think it came up
> during review. Now that this driver is done, I hope we can take it as is.
> 
> There’s some benefit from the drivers having a similar structure, I sent
> one or two fixes to apple-admac for things I found out because I was
> writing this other driver.

And this would be a chance to covert the other one and get rid of list
handling code in that driver as well

> 
> >> +};
> >> +
> >> +#define SIO_NTAGS		16
> >> +
> >> +typedef void (*sio_ack_callback)(struct sio_chan *, void *, bool);
> > 
> > any reason not to use dmaengine callbacks?
> 
> Not sure what dmaengine callback you mean here. This callback means
> the coprocessor acked a tag, not sure how we can fit something dmaengine
> onto it.

Okay lets understand, how is this one used
Vinod Koul Aug. 3, 2023, 11:34 a.m. UTC | #6
On 03-08-23, 10:32, Martin Povišer wrote:

> >>> +static int sio_alloc_tag(struct sio_data *sio)
> >>> +{
> >>> +	struct sio_tagdata *tags = &sio->tags;
> >>> +	int tag, i;
> >>> +
> >>> +	/*
> >>> +	 * Because tag number 0 is special, the usable tag range
> >>> +	 * is 1...(SIO_NTAGS - 1). So, to pick the next usable tag,
> >>> +	 * we do modulo (SIO_NTAGS - 1) *then* plus one.
> >>> +	 */
> >>> +
> >>> +#define SIO_USABLE_TAGS (SIO_NTAGS - 1)
> >>> +	tag = (READ_ONCE(tags->last_tag) % SIO_USABLE_TAGS) + 1;
> >>> +
> >>> +	for (i = 0; i < SIO_USABLE_TAGS; i++) {
> >>> +		if (!test_and_set_bit(tag, &tags->allocated))
> >>> +			break;
> >>> +
> >>> +		tag = (tag % SIO_USABLE_TAGS) + 1;
> >>> +	}
> >>> +
> >>> +	WRITE_ONCE(tags->last_tag, tag);
> >>> +
> >>> +	if (i < SIO_USABLE_TAGS)
> >>> +		return tag;
> >>> +	else
> >>> +		return -EBUSY;
> >>> +#undef SIO_USABLE_TAGS
> >>> +}
> >> 
> >> can you use kernel mechanisms like ida to alloc and free the tags...
> > 
> > I can look into that.
> 
> Documentation says IDA is deprecated in favour of Xarray, both look
> like they serve to associate a pointer with an ID. I think neither
> structure beats a simple bitfield and a static array for the per-tag
> data. Agree?

yeah xarray am not too sure. I would still go with ida, we will see when
it is relly removed.

If you need a bitfield why not use bitmap apis.
I dont like drivers implementing the basic logic which kernel provides
Martin Povišer Aug. 24, 2023, 3:25 p.m. UTC | #7
> On 3. 8. 2023, at 13:34, Vinod Koul <vkoul@kernel.org> wrote:
> 
> On 03-08-23, 10:32, Martin Povišer wrote:
> 
>>>>> +static int sio_alloc_tag(struct sio_data *sio)
>>>>> +{
>>>>> +	struct sio_tagdata *tags = &sio->tags;
>>>>> +	int tag, i;
>>>>> +
>>>>> +	/*
>>>>> +	 * Because tag number 0 is special, the usable tag range
>>>>> +	 * is 1...(SIO_NTAGS - 1). So, to pick the next usable tag,
>>>>> +	 * we do modulo (SIO_NTAGS - 1) *then* plus one.
>>>>> +	 */
>>>>> +
>>>>> +#define SIO_USABLE_TAGS (SIO_NTAGS - 1)
>>>>> +	tag = (READ_ONCE(tags->last_tag) % SIO_USABLE_TAGS) + 1;
>>>>> +
>>>>> +	for (i = 0; i < SIO_USABLE_TAGS; i++) {
>>>>> +		if (!test_and_set_bit(tag, &tags->allocated))
>>>>> +			break;
>>>>> +
>>>>> +		tag = (tag % SIO_USABLE_TAGS) + 1;
>>>>> +	}
>>>>> +
>>>>> +	WRITE_ONCE(tags->last_tag, tag);
>>>>> +
>>>>> +	if (i < SIO_USABLE_TAGS)
>>>>> +		return tag;
>>>>> +	else
>>>>> +		return -EBUSY;
>>>>> +#undef SIO_USABLE_TAGS
>>>>> +}
>>>> 
>>>> can you use kernel mechanisms like ida to alloc and free the tags...
>>> 
>>> I can look into that.
>> 
>> Documentation says IDA is deprecated in favour of Xarray, both look
>> like they serve to associate a pointer with an ID. I think neither
>> structure beats a simple bitfield and a static array for the per-tag
>> data. Agree?
> 
> yeah xarray am not too sure. I would still go with ida, we will see when
> it is relly removed.

Sorry for letting this sleep for a while.

I don’t like the idea of submitting a new driver to use a deprecated
interface. For all I know someone can come along later and mark the driver
as broken in the process of finally removing IDA, with good excuse to do so.

> If you need a bitfield why not use bitmap apis.
> I dont like drivers implementing the basic logic which kernel provides

I think one improvement to take up is to use the DECLARE_BITMAP macro for
the `allocated` bitmap. Other than that this already uses the bitmap.h/
bitops.h functions to the degree it can if the goal is to

 (1) allocate and free the tags reliably under SMP with atomic ops

 (2) in best-effort manner (but without locking of the counter) make
     the tag numbers consecutive

The latter behaviour is there to make traces easier to read.

Martin

> -- 
> ~Vinod
Martin Povišer Aug. 24, 2023, 3:34 p.m. UTC | #8
Sorry I missed this message before.

> On 3. 8. 2023, at 13:25, Vinod Koul <vkoul@kernel.org> wrote:
> 
> On 01-08-23, 23:55, Martin Povišer wrote:
> 
>>> can you use virt_dma_chan, that should simplify list handling etc
>> 
>> I looked into that when I wrote the sister driver apple-admac.c, I don’t
>> remember anymore why I decided against it, and I don’t think it came up
>> during review. Now that this driver is done, I hope we can take it as is.
>> 
>> There’s some benefit from the drivers having a similar structure, I sent
>> one or two fixes to apple-admac for things I found out because I was
>> writing this other driver.
> 
> And this would be a chance to covert the other one and get rid of list
> handling code in that driver as well

I guess...

>>>> +};
>>>> +
>>>> +#define SIO_NTAGS		16
>>>> +
>>>> +typedef void (*sio_ack_callback)(struct sio_chan *, void *, bool);
>>> 
>>> any reason not to use dmaengine callbacks?
>> 
>> Not sure what dmaengine callback you mean here. This callback means
>> the coprocessor acked a tag, not sure how we can fit something dmaengine
>> onto it.
> 
> Okay lets understand, how is this one used

This one is used to signal completion of IPC calls to the coprocessor when
that call is made from atomic context. Only user (currently) is issuing of
coproc descriptors. I can provide more detail but not sure in what
direction.

Martin

> -- 
> ~Vinod
>