mbox series

[v9,0/9] Add the I3C subsystem

Message ID 20181022133404.2061-1-boris.brezillon@bootlin.com
Headers show
Series Add the I3C subsystem | expand

Message

Boris Brezillon Oct. 22, 2018, 1:33 p.m. UTC
Hi Greg,

I think we've reached a point where we can eventually consider the I3C
framework for inclusion in 4.20 (5.0?). Indeed, v8 has received
a R-b from Arnd, and I addressed the concerned raised by Joe on the
I3C/I2C_DEV() macros. I hope to get an Ack from Rob on the adjusted
DT bindings anytime soon.

The only remaining concern raised by Arnd is the fact that both hosts
and slaves share the same bus type and are differentiated thanks to
their device_type, which IMHO is fine since this is what other
subsystems do (plus I don't see other solutions to have both I3C
devices and I3C buses represented under /sys/bus/i3c/).

I'd really like to get this series merged in 4.20 so that other
contributors can work on top of that to add

1/ new host drivers
2/ support for advanced features
3/ new device drivers

So, unless you have strong reasons to reject this request, and,
assuming I get Rob's ack soon enough, I plan to send a PR for this
framework next week.

Here is the usual description copy&pasted from the previous versions:

This patch series is adding a new subsystem to support I3C devices.

This is just adding support for basic features. Extra features will
be added afterwards.

There are a few design choices that are worth mentioning because they
impact the way I3C device drivers can interact with their devices:

- all functions used to send I3C/I2C frames must be called in
  non-atomic context. Mainly done this way to ease implementation, but
  this is still open to discussion. Please let me know if you think it's
  worth considering an asynchronous model here
- the I3C bus and I3C master controller are now tightly coupled even
  though they're still allocated separately. There's now a 1:1
  relationship between these objects, and the I3C master is no longer
  represented under the I3C bus object.
  Arnd, let me know if you had something different in mind, and I'll
  rework the implementation accordingly.

- I2C backward compatibility has been designed to be transparent to I2C
  drivers and the I2C subsystem. The I3C master just registers an I2C
  adapter which creates a new I2C bus. I'd say that, from a
  representation PoV it's not ideal because what should appear as a
  single I3C bus exposing I3C and I2C devices here appears as 2
  different busses connected to each other through the parenting (the
  I3C master is the parent of the I2C and I3C busses).
  On the other hand, I don't see a better solution if we want something
  that is not invasive.

Missing features (will be added in separate patch series after initial
support has been accepted/merged):
- support for HDR modes (has been removed because of lack of real users)
- no support for multi-master and the associated concepts (mastership
  handover, support for secondary masters, ...)
- I2C devices can only be described using DT because this is the only
  use case I have. However, the framework can easily be extended with
  ACPI and board info support
- I3C slave framework. This has been completely omitted, but shouldn't
  have a huge impact on the I3C framework because I3C slaves don't see
  the whole bus, it's only about handling master requests and generating
  IBIs. Some of the struct, constant and enum definitions could be
  shared, but most of the I3C slave framework logic will be different

Note that this patchset is available on the linux-i3c repo[1].

Main change between v8 and v9:
- The DT bindings have been simplified to get rid of the I3C/I3C_DEV()
  macros

Main change between v7 and v8:
- The bus object is now embedded in the master_controller object (as
  suggested by Arnd)

Main changes between v6 and v7:
- I3C bus/master representations have been reworked to match what other
  subsystems are doing (master implicitly represented by the bus
  object)
- I3C dev registration has been fixed
- I3C bus mode selection has been fixed
- Calls to readsl/writesl() in the Cadence I3C master driver have been
  fixed

Main changes between v5 and v6:
- Introduce {i3c,i2c}_dev_desc structures to better match how I3C
  master controllers (reservation of one HW slot for each device
  attached to the bus). With this solution, the resource migration
  that happens when a device lose its dynamic address and is
  re-assigned a different address is simplified on the driver side,
  because most of it is now handled in the core (reserve a new dev
  slot, reserve IBI resources and free all resources attached to the
  old slot)
- Add I3C error codes (M0 to M2) so that the core and device drivers
  can have fine grained information on what caused an EIO error.

Only minor things happened between v3 and v5 (you can go check the
changelog in each patch for more details).

Main changes between v2 and v3 are:
- Reworked the DT bindings as suggested by Rob
- Reworked the bus initialization step as suggested by Vitor
- Added a driver for an I3C GPIO expander

Main changes between the initial RFC and this v2 are:
- Add a generic infrastructure to support IBIs. It's worth mentioning
  that I tried exposing IBIs as a regular IRQs, but after several
  attempts and a discussion with Mark Zyngier, it appeared that it was
  not really fitting in the Linux IRQ model (the fact that you have
  payload attached to IBIs, the fact that most of the time an IBI will
  generate a transfer on the bus which has to be done in an atomic
  context, ...)
  The counterpart of this decision is the latency induced by the
  workqueue approach, but since I don't have real use cases, I don't
  know if this can be a problem or not. 
- Add helpers to support Hot Join
- Add support for IBIs and Hot Join in Cadence I3C master driver
- Address several issues in how I was using the device model

Thanks,

Boris

[1]https://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux.git/log/?h=i3c/next

Boris Brezillon (9):
  i3c: Add core I3C infrastructure
  docs: driver-api: Add I3C documentation
  i3c: Add sysfs ABI spec
  dt-bindings: i3c: Document core bindings
  MAINTAINERS: Add myself as the I3C subsystem maintainer
  i3c: master: Add driver for Cadence IP
  dt-bindings: i3c: Document Cadence I3C master bindings
  gpio: Add a driver for Cadence I3C GPIO expander
  dt-bindings: gpio: Add bindings for Cadence I3C gpio expander

 Documentation/ABI/testing/sysfs-bus-i3c       |  146 +
 .../bindings/gpio/gpio-cdns-i3c.txt           |   39 +
 .../bindings/i3c/cdns,i3c-master.txt          |   44 +
 Documentation/devicetree/bindings/i3c/i3c.txt |  138 +
 .../driver-api/i3c/device-driver-api.rst      |    9 +
 Documentation/driver-api/i3c/index.rst        |   11 +
 .../driver-api/i3c/master-driver-api.rst      |   10 +
 Documentation/driver-api/i3c/protocol.rst     |  203 ++
 Documentation/driver-api/index.rst            |    1 +
 MAINTAINERS                                   |   12 +
 drivers/Kconfig                               |    2 +
 drivers/Makefile                              |    2 +-
 drivers/gpio/Kconfig                          |   11 +
 drivers/gpio/Makefile                         |    1 +
 drivers/gpio/gpio-cdns-i3c.c                  |  411 +++
 drivers/i3c/Kconfig                           |   24 +
 drivers/i3c/Makefile                          |    4 +
 drivers/i3c/device.c                          |  233 ++
 drivers/i3c/internals.h                       |   26 +
 drivers/i3c/master.c                          | 2648 +++++++++++++++++
 drivers/i3c/master/Kconfig                    |    6 +
 drivers/i3c/master/Makefile                   |    1 +
 drivers/i3c/master/i3c-master-cdns.c          | 1671 +++++++++++
 include/linux/i3c/ccc.h                       |  385 +++
 include/linux/i3c/device.h                    |  331 +++
 include/linux/i3c/master.h                    |  643 ++++
 include/linux/mod_devicetable.h               |   17 +
 27 files changed, 7028 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-i3c
 create mode 100644 Documentation/devicetree/bindings/gpio/gpio-cdns-i3c.txt
 create mode 100644 Documentation/devicetree/bindings/i3c/cdns,i3c-master.txt
 create mode 100644 Documentation/devicetree/bindings/i3c/i3c.txt
 create mode 100644 Documentation/driver-api/i3c/device-driver-api.rst
 create mode 100644 Documentation/driver-api/i3c/index.rst
 create mode 100644 Documentation/driver-api/i3c/master-driver-api.rst
 create mode 100644 Documentation/driver-api/i3c/protocol.rst
 create mode 100644 drivers/gpio/gpio-cdns-i3c.c
 create mode 100644 drivers/i3c/Kconfig
 create mode 100644 drivers/i3c/Makefile
 create mode 100644 drivers/i3c/device.c
 create mode 100644 drivers/i3c/internals.h
 create mode 100644 drivers/i3c/master.c
 create mode 100644 drivers/i3c/master/Kconfig
 create mode 100644 drivers/i3c/master/Makefile
 create mode 100644 drivers/i3c/master/i3c-master-cdns.c
 create mode 100644 include/linux/i3c/ccc.h
 create mode 100644 include/linux/i3c/device.h
 create mode 100644 include/linux/i3c/master.h

Comments

Boris Brezillon Oct. 24, 2018, 6:20 p.m. UTC | #1
Hi Arnd,

On Mon, 22 Oct 2018 15:34:01 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:


> +
> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master,
> +					    u8 *bytes, int nbytes)
> +{
> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);

Vitor reported a problem with readsl(): this function expects the 2nd
argument to be aligned on 32-bit, which is not guaranteed here. Unless
you see a better solution, I'll switch back to a loop doing:

	for (i = 0; i < nbytes; i += 4) {
		u32 tmp = __raw_readl(...);
		memcpy(bytes + i, &tmp,
		       nbytes - i  > 4 ? 4 : nbytes - i);
	}

> +	if (nbytes & 3) {
> +		u32 tmp;
> +
> +		readsl(master->regs + RX_FIFO, &tmp, 1);
> +		memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3);
> +	}
> +}

Regards,

Boris
Grygorii Strashko Oct. 24, 2018, 8:25 p.m. UTC | #2
On 10/24/18 1:20 PM, Boris Brezillon wrote:
> Hi Arnd,
> 
> On Mon, 22 Oct 2018 15:34:01 +0200
> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> 
> 
>> +
>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master,
>> +					    u8 *bytes, int nbytes)
>> +{
>> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);
> 
> Vitor reported a problem with readsl(): this function expects the 2nd
> argument to be aligned on 32-bit, which is not guaranteed here. Unless
> you see a better solution, I'll switch back to a loop doing:
> 
> 	for (i = 0; i < nbytes; i += 4) {
> 		u32 tmp = __raw_readl(...);

Pls, do not use __raw io.

> 		memcpy(bytes + i, &tmp,
> 		       nbytes - i  > 4 ? 4 : nbytes - i);
> 	}
> 
>> +	if (nbytes & 3) {
>> +		u32 tmp;
>> +
>> +		readsl(master->regs + RX_FIFO, &tmp, 1);
>> +		memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3);
>> +	}
>> +}
> 
> Regards,
> 
> Boris
>
Boris Brezillon Oct. 24, 2018, 9:04 p.m. UTC | #3
On Wed, 24 Oct 2018 15:25:17 -0500
Grygorii Strashko <grygorii.strashko@ti.com> wrote:

> On 10/24/18 1:20 PM, Boris Brezillon wrote:
> > Hi Arnd,
> > 
> > On Mon, 22 Oct 2018 15:34:01 +0200
> > Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > 
> >   
> >> +
> >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master,
> >> +					    u8 *bytes, int nbytes)
> >> +{
> >> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);  
> > 
> > Vitor reported a problem with readsl(): this function expects the 2nd
> > argument to be aligned on 32-bit, which is not guaranteed here. Unless
> > you see a better solution, I'll switch back to a loop doing:
> > 
> > 	for (i = 0; i < nbytes; i += 4) {
> > 		u32 tmp = __raw_readl(...);  
> 
> Pls, do not use __raw io.

Except this is exactly what I want here, unless you have a
replacement for "readl() without a mem-barrier and without endianness
conversion"

> 
> > 		memcpy(bytes + i, &tmp,
> > 		       nbytes - i  > 4 ? 4 : nbytes - i);
> > 	}
> >   
> >> +	if (nbytes & 3) {
> >> +		u32 tmp;
> >> +
> >> +		readsl(master->regs + RX_FIFO, &tmp, 1);
> >> +		memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3);
> >> +	}
> >> +}  
> > 
> > Regards,
> > 
> > Boris
> >   
>
Grygorii Strashko Oct. 24, 2018, 10:43 p.m. UTC | #4
On 10/24/18 4:04 PM, Boris Brezillon wrote:
> On Wed, 24 Oct 2018 15:25:17 -0500
> Grygorii Strashko <grygorii.strashko@ti.com> wrote:
> 
>> On 10/24/18 1:20 PM, Boris Brezillon wrote:
>>> Hi Arnd,
>>>
>>> On Mon, 22 Oct 2018 15:34:01 +0200
>>> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
>>>
>>>    
>>>> +
>>>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master,
>>>> +					    u8 *bytes, int nbytes)
>>>> +{
>>>> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);
>>>
>>> Vitor reported a problem with readsl(): this function expects the 2nd
>>> argument to be aligned on 32-bit, which is not guaranteed here. Unless
>>> you see a better solution, I'll switch back to a loop doing:
>>>
>>> 	for (i = 0; i < nbytes; i += 4) {
>>> 		u32 tmp = __raw_readl(...);
>>
>> Pls, do not use __raw io.
> 
> Except this is exactly what I want here, unless you have a
> replacement for "readl() without a mem-barrier and without endianness
> conversion"
> 

Not sure why endianness is the problem. readl_relaxed?
Sry, I've missed that this is part of the driver not i3c core,
so minor/ignore.
Boris Brezillon Oct. 24, 2018, 10:52 p.m. UTC | #5
On Wed, 24 Oct 2018 17:43:00 -0500
Grygorii Strashko <grygorii.strashko@ti.com> wrote:

> On 10/24/18 4:04 PM, Boris Brezillon wrote:
> > On Wed, 24 Oct 2018 15:25:17 -0500
> > Grygorii Strashko <grygorii.strashko@ti.com> wrote:
> >   
> >> On 10/24/18 1:20 PM, Boris Brezillon wrote:  
> >>> Hi Arnd,
> >>>
> >>> On Mon, 22 Oct 2018 15:34:01 +0200
> >>> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> >>>
> >>>      
> >>>> +
> >>>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master,
> >>>> +					    u8 *bytes, int nbytes)
> >>>> +{
> >>>> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);  
> >>>
> >>> Vitor reported a problem with readsl(): this function expects the 2nd
> >>> argument to be aligned on 32-bit, which is not guaranteed here. Unless
> >>> you see a better solution, I'll switch back to a loop doing:
> >>>
> >>> 	for (i = 0; i < nbytes; i += 4) {
> >>> 		u32 tmp = __raw_readl(...);  
> >>
> >> Pls, do not use __raw io.  
> > 
> > Except this is exactly what I want here, unless you have a
> > replacement for "readl() without a mem-barrier and without endianness
> > conversion"
> >   
> 
> Not sure why endianness is the problem. readl_relaxed?

Because we want to read a stream of bytes, and, if we have a CPU that is
operating in big-endian (ARM kernels can configured in BE or LE), byte
ordering will be messed up (the controller is LE).

If I use readl_relaxed(), I'll then have to call cpu_to_le32(), and
finally copy the result to the buffer.

> Sry, I've missed that this is part of the driver not i3c core,
> so minor/ignore.
>
Arnd Bergmann Oct. 25, 2018, 3:30 p.m. UTC | #6
On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> Hi Arnd,
>
> On Mon, 22 Oct 2018 15:34:01 +0200
> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
>
>
>> +
>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master
>> *master,
>> +					    u8 *bytes, int nbytes)
>> +{
>> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);
>
> Vitor reported a problem with readsl(): this function expects the 2nd
> argument to be aligned on 32-bit, which is not guaranteed here. Unless
> you see a better solution, I'll switch back to a loop doing:
>
> 	for (i = 0; i < nbytes; i += 4) {
> 		u32 tmp = __raw_readl(...);
> 		memcpy(bytes + i, &tmp,
> 		       nbytes - i  > 4 ? 4 : nbytes - i);
> 	}

Could we maybe mandate that the buffer itself must be aligned here?
What would be a reason why we see an unaligned target buffer?

The open-coded loop should generally work (maybe a little slower),
but it does seem error-prone to use __raw_readl() in general.

      Arnd
Boris Brezillon Oct. 25, 2018, 4:07 p.m. UTC | #7
On Thu, 25 Oct 2018 17:30:26 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > Hi Arnd,
> >
> > On Mon, 22 Oct 2018 15:34:01 +0200
> > Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> >
> >  
> >> +
> >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master
> >> *master,
> >> +					    u8 *bytes, int nbytes)
> >> +{
> >> +	readsl(master->regs + RX_FIFO, bytes, nbytes / 4);  
> >
> > Vitor reported a problem with readsl(): this function expects the 2nd
> > argument to be aligned on 32-bit, which is not guaranteed here. Unless
> > you see a better solution, I'll switch back to a loop doing:
> >
> > 	for (i = 0; i < nbytes; i += 4) {
> > 		u32 tmp = __raw_readl(...);
> > 		memcpy(bytes + i, &tmp,
> > 		       nbytes - i  > 4 ? 4 : nbytes - i);
> > 	}  
> 
> Could we maybe mandate that the buffer itself must be aligned here?
> What would be a reason why we see an unaligned target buffer?

Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily
aligned because they're not dynamically allocated (allocated on the
stack) and are not naturally aligned on 32-bits (either because they
are smaller than 32bits or because the struct is declared __packed).

I guess I could dynamically allocate the payload, but that requires
going over all users of i3c_send_ccc_cmd() to patch them.
Arnd Bergmann Oct. 25, 2018, 4:13 p.m. UTC | #8
On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
>
> On Thu, 25 Oct 2018 17:30:26 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
>
> > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > Hi Arnd,
> > >
> > > On Mon, 22 Oct 2018 15:34:01 +0200
> > > Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > >
> > >
> > >> +
> > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master
> > >> *master,
> > >> +                                      u8 *bytes, int nbytes)
> > >> +{
> > >> +  readsl(master->regs + RX_FIFO, bytes, nbytes / 4);
> > >
> > > Vitor reported a problem with readsl(): this function expects the 2nd
> > > argument to be aligned on 32-bit, which is not guaranteed here. Unless
> > > you see a better solution, I'll switch back to a loop doing:
> > >
> > >     for (i = 0; i < nbytes; i += 4) {
> > >             u32 tmp = __raw_readl(...);
> > >             memcpy(bytes + i, &tmp,
> > >                    nbytes - i  > 4 ? 4 : nbytes - i);
> > >     }
> >
> > Could we maybe mandate that the buffer itself must be aligned here?
> > What would be a reason why we see an unaligned target buffer?
>
> Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily
> aligned because they're not dynamically allocated (allocated on the
> stack) and are not naturally aligned on 32-bits (either because they
> are smaller than 32bits or because the struct is declared __packed).
>
> I guess I could dynamically allocate the payload, but that requires
> going over all users of i3c_send_ccc_cmd() to patch them.

This reminds me that Wolfram mentioned in his ELC talk that the
buffers on i3c should all be DMA capable to make life easier for
i3c master drivers that want to implement DMA transfers.

If we have buffers here that are not aligned to cache lines
(or even just 32 bit words), doesn't that also mean that the
same buffers are not DMA capable either?

        Arnd
Boris Brezillon Oct. 25, 2018, 4:30 p.m. UTC | #9
On Thu, 25 Oct 2018 18:13:51 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> >
> > On Thu, 25 Oct 2018 17:30:26 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >  
> > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:  
> > > > Hi Arnd,
> > > >
> > > > On Mon, 22 Oct 2018 15:34:01 +0200
> > > > Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > >
> > > >  
> > > >> +
> > > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master
> > > >> *master,
> > > >> +                                      u8 *bytes, int nbytes)
> > > >> +{
> > > >> +  readsl(master->regs + RX_FIFO, bytes, nbytes / 4);  
> > > >
> > > > Vitor reported a problem with readsl(): this function expects the 2nd
> > > > argument to be aligned on 32-bit, which is not guaranteed here. Unless
> > > > you see a better solution, I'll switch back to a loop doing:
> > > >
> > > >     for (i = 0; i < nbytes; i += 4) {
> > > >             u32 tmp = __raw_readl(...);
> > > >             memcpy(bytes + i, &tmp,
> > > >                    nbytes - i  > 4 ? 4 : nbytes - i);
> > > >     }  
> > >
> > > Could we maybe mandate that the buffer itself must be aligned here?
> > > What would be a reason why we see an unaligned target buffer?  
> >
> > Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily
> > aligned because they're not dynamically allocated (allocated on the
> > stack) and are not naturally aligned on 32-bits (either because they
> > are smaller than 32bits or because the struct is declared __packed).
> >
> > I guess I could dynamically allocate the payload, but that requires
> > going over all users of i3c_send_ccc_cmd() to patch them.  
> 
> This reminds me that Wolfram mentioned in his ELC talk that the
> buffers on i3c should all be DMA capable to make life easier for
> i3c master drivers that want to implement DMA transfers.

And this is the case for all buffers passed to
i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()),
but I did not enforce that for the internal
i3c_master_send_ccc_cmd_locked() helper, maybe I should...
It was just convenient to place the object to be transmitted/received on
the stack.

> 
> If we have buffers here that are not aligned to cache lines
> (or even just 32 bit words), doesn't that also mean that the
> same buffers are not DMA capable either?

Yep, if it's not cache-line-aligned (and on the stack), it's not
DMA-able.
Arnd Bergmann Oct. 26, 2018, 7:43 a.m. UTC | #10
On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > On Thu, 25 Oct 2018 17:30:26 +0200
> > > Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > > > On Mon, 22 Oct 2018 15:34:01 +0200
> > > I guess I could dynamically allocate the payload, but that requires
> > > going over all users of i3c_send_ccc_cmd() to patch them.
> >
> > This reminds me that Wolfram mentioned in his ELC talk that the
> > buffers on i3c should all be DMA capable to make life easier for
> > i3c master drivers that want to implement DMA transfers.
>
> And this is the case for all buffers passed to
> i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()),
> but I did not enforce that for the internal
> i3c_master_send_ccc_cmd_locked() helper, maybe I should...
> It was just convenient to place the object to be transmitted/received on
> the stack.

Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public
interfaces then, or is this something else?

If you place a buffer on the stack, it is not DMA capable, but
it is guaranteed to be at least 32-bit word aligned, and should
not cause an exception in readsl(), unless it starts with a couple of
(not multiple of four)  extra bytes that are not sent to the devices.
Is that what happens here?

> > If we have buffers here that are not aligned to cache lines
> > (or even just 32 bit words), doesn't that also mean that the
> > same buffers are not DMA capable either?
>
> Yep, if it's not cache-line-aligned (and on the stack), it's not
> DMA-able.

This sounds like a more fundamental problem to solve first
then. Obviously it is incredibly /useful/ to be able to put short
i2c or i3c messages on the stack, but allowing that in general
also prevents the use of DMA without bounce buffers.

One way to address this might be to always bounce any
messages that are less than a cache line through a
(pre-)kmallocated buffer, and require any longer messages
to be cache capable. This could also solve the issue with
readsl(), but it would be a rather confusing user interface.

Another option might be to have separate interfaces for
"short" and "long" messages at the API level and have
distinct rules for those: short would always be bounced
by the i3c code, and long puts restrictions on the buffer
location.

       Arnd
Boris Brezillon Oct. 26, 2018, 7:57 a.m. UTC | #11
Hi Arnd,

On Fri, 26 Oct 2018 09:43:25 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote:  
> > > > On Thu, 25 Oct 2018 17:30:26 +0200
> > > > Arnd Bergmann <arnd@arndb.de> wrote:  
> > > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote:  
> > > > > > On Mon, 22 Oct 2018 15:34:01 +0200  
> > > > I guess I could dynamically allocate the payload, but that requires
> > > > going over all users of i3c_send_ccc_cmd() to patch them.  
> > >
> > > This reminds me that Wolfram mentioned in his ELC talk that the
> > > buffers on i3c should all be DMA capable to make life easier for
> > > i3c master drivers that want to implement DMA transfers.  
> >
> > And this is the case for all buffers passed to
> > i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()),
> > but I did not enforce that for the internal
> > i3c_master_send_ccc_cmd_locked() helper, maybe I should...
> > It was just convenient to place the object to be transmitted/received on
> > the stack.  
> 
> Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public
> interfaces then, or is this something else?

i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so
it's part of the master controller interface.

> 
> If you place a buffer on the stack, it is not DMA capable, but
> it is guaranteed to be at least 32-bit word aligned, and should
> not cause an exception in readsl(), unless it starts with a couple of
> (not multiple of four)  extra bytes that are not sent to the devices.
> Is that what happens here?

Here is the report I received from Vitor:

"
	Hi Boris,


	I'm trying this new patch-set version but I get some issues when use 
	readsl() function.

	Basically the system complain about memory alignment.

	As exemple when I try to read the PID from the device

	> +static int i3c_master_getpid_locked(struct i3c_master_controller *master,
	> +				    struct i3c_device_info *info)
	> +{
	> +	struct i3c_ccc_getpid getpid;  

	at this point the getpid struct it is already unaligned with

	i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a

	> +	struct i3c_ccc_cmd_dest dest = {
	> +		.addr = info->dyn_addr,
	> +		.payload.len = sizeof(struct i3c_ccc_getpid),
	> +		.payload.data = &getpid,
	> +	};
	> +	struct i3c_ccc_cmd cmd = {
	> +		.rnw = true,
	> +		.id = I3C_CCC_GETPID,
	> +		.dests = &dest,
	> +		.ndests = 1,
	> +	};
	> +	int ret, i;
	> +
	> +	ret = i3c_master_send_ccc_cmd_locked(master, &cmd);
	> +	if (ret)
	> +		return ret;
	> +
	> +	info->pid = 0;
	> +	for (i = 0; i < sizeof(getpid.pid); i++) {
	> +		int sft = (sizeof(getpid.pid) - i - 1) * 8;
	> +
	> +		info->pid |= (u64)getpid.pid[i] << sft;
	> +	}
	> +
	> +	return 0;
	> +}
	> +  

	and them when

	static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master,
	                        u8 *bytes, int nbytes)
	{
	     readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4);
	...
	}

	the system crash.

	Misaligned Access
	Path: (null)
	CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1 #88
	
	[ECR   ]: 0x00230400 => Misaligned r/w from 0x9a249c7a
	[EFA   ]: 0x9a249c7a
	[BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master]
	[ERET  ]: dw_i3c_master_irq_handler+0x224/0x2fc [dw_i3c_master]
	[STAT32]: 0x00000a4c : K DE     A1 E2
	BTA: 0x70038e44  SP: 0x8071fe58  FP: 0x00000000
	LPS: 0x8060e63e LPE: 0x8060e642 LPC: 0x00000000
	r00: 0x00000033 r01: 0x00000004 r02: 0x00000000
	r03: 0xd0002014 r04: 0x00000006 r05: 0x00000000
	r06: 0x9a249c7a r07: 0x39307260 r08: 0xe10b6900
	r09: 0x00000013 r10: 0x00000000 r11: 0x000000c9
	r12: 0x0a613763

	Do you have any idea about this?


	Best regards,

	Vitor Soares
"

> 
> > > If we have buffers here that are not aligned to cache lines
> > > (or even just 32 bit words), doesn't that also mean that the
> > > same buffers are not DMA capable either?  
> >
> > Yep, if it's not cache-line-aligned (and on the stack), it's not
> > DMA-able.  
> 
> This sounds like a more fundamental problem to solve first
> then. Obviously it is incredibly /useful/ to be able to put short
> i2c or i3c messages on the stack, but allowing that in general
> also prevents the use of DMA without bounce buffers.

Actually, we have the same problem in MTD (UBI passes vmalloced
buffers to the MTD stack), so I understand this concern very well,
and I agree that enforcing all buffers passed to the controller to
be DMA capable is the right thing to do.

I guess I just didn't think about internal APIs when I made this
modification which explains why CCC cmds were left behind.

> 
> One way to address this might be to always bounce any
> messages that are less than a cache line through a
> (pre-)kmallocated buffer, and require any longer messages
> to be cache capable. This could also solve the issue with
> readsl(), but it would be a rather confusing user interface.
> 
> Another option might be to have separate interfaces for
> "short" and "long" messages at the API level and have
> distinct rules for those: short would always be bounced
> by the i3c code, and long puts restrictions on the buffer
> location.

Hm, let's keep the API simple. I'll just mandate that all payload bufs
passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated.

Thanks for your feedback.

Boris
Arnd Bergmann Oct. 26, 2018, 10:01 a.m. UTC | #12
On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> On Fri, 26 Oct 2018 09:43:25 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
>
> > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > > > On Thu, 25 Oct 2018 17:30:26 +0200
> > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public
> > interfaces then, or is this something else?
>
> i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so
> it's part of the master controller interface.
>
> >
> > If you place a buffer on the stack, it is not DMA capable, but
> > it is guaranteed to be at least 32-bit word aligned, and should
> > not cause an exception in readsl(), unless it starts with a couple of
> > (not multiple of four)  extra bytes that are not sent to the devices.
> > Is that what happens here?
>
> Here is the report I received from Vitor:
>
> "
>         Hi Boris,
>
>
>         I'm trying this new patch-set version but I get some issues when use
>         readsl() function.
>
>         Basically the system complain about memory alignment.
>

>         > +static int i3c_master_getpid_locked(struct i3c_master_controller *master,
>         > +                                 struct i3c_device_info *info)
>         > +{
>         > +     struct i3c_ccc_getpid getpid;
>
>         at this point the getpid struct it is already unaligned with
>
>         i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a
>
>         > +     struct i3c_ccc_cmd_dest dest = {
>         > +             .addr = info->dyn_addr,
>         > +             .payload.len = sizeof(struct i3c_ccc_getpid),
>         > +             .payload.data = &getpid,
>         > +     };

>         > +}
>         > +
>
>         and them when
>
>         static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master,
>                                 u8 *bytes, int nbytes)
>         {
>              readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4);
>         ...
>         }

Ok, I spent an hour chasing the ARM implementation and finding
no way this could go wrong here. I see that 'struct i3c_ccc_getpid'
may be misaligned on the stack (it normally won't be), and that
the ARM readsl() has a lot of extra code to handle unaligned
output. However, the dump that Vitor reports

>         [ECR   ]: 0x00230400 => Misaligned r/w from 0x9a249c7a
>         [EFA   ]: 0x9a249c7a
>        [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master]

Is from an arch/arc kernel that uses asm-generic/io.h, and
that stores the output using a u32 pointer:

static inline void readsl(const volatile void __iomem *addr, void *buffer,
                          unsigned int count)
{
        if (count) {
                u32 *buf = buffer;

                do {
                        u32 x = __raw_readl(addr);
                        *buf++ = x;
                } while (--count);
        }
}

This is apparently not allowed on ARC when 'buffer' is
unaligned. I think what we need here is to use
put_unaligned() instead of the pointer dereference.
For architectures that can do unaligned accesses,
the result is the same, but for ARC it will fix the problem.

> > One way to address this might be to always bounce any
> > messages that are less than a cache line through a
> > (pre-)kmallocated buffer, and require any longer messages
> > to be cache capable. This could also solve the issue with
> > readsl(), but it would be a rather confusing user interface.
> >
> > Another option might be to have separate interfaces for
> > "short" and "long" messages at the API level and have
> > distinct rules for those: short would always be bounced
> > by the i3c code, and long puts restrictions on the buffer
> > location.
>
> Hm, let's keep the API simple. I'll just mandate that all payload bufs
> passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated.

Ok. What about i2c commands sent to the same i3c controller
then? Do we need to copy those to satisfy the requirements
of the i3c layer?

      Arnd
Boris Brezillon Oct. 26, 2018, 12:46 p.m. UTC | #13
On Fri, 26 Oct 2018 12:01:52 +0200
Arnd Bergmann <arnd@arndb.de> wrote:

> On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> > On Fri, 26 Oct 2018 09:43:25 +0200
> > Arnd Bergmann <arnd@arndb.de> wrote:
> >  
> > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
> > > <boris.brezillon@bootlin.com> wrote:  
> > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote:  
> > > > > > On Thu, 25 Oct 2018 17:30:26 +0200  
> > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public
> > > interfaces then, or is this something else?  
> >
> > i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so
> > it's part of the master controller interface.
> >  
> > >
> > > If you place a buffer on the stack, it is not DMA capable, but
> > > it is guaranteed to be at least 32-bit word aligned, and should
> > > not cause an exception in readsl(), unless it starts with a couple of
> > > (not multiple of four)  extra bytes that are not sent to the devices.
> > > Is that what happens here?  
> >
> > Here is the report I received from Vitor:
> >
> > "
> >         Hi Boris,
> >
> >
> >         I'm trying this new patch-set version but I get some issues when use
> >         readsl() function.
> >
> >         Basically the system complain about memory alignment.
> >  
> 
> >         > +static int i3c_master_getpid_locked(struct i3c_master_controller *master,
> >         > +                                 struct i3c_device_info *info)
> >         > +{
> >         > +     struct i3c_ccc_getpid getpid;  
> >
> >         at this point the getpid struct it is already unaligned with
> >
> >         i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a
> >  
> >         > +     struct i3c_ccc_cmd_dest dest = {
> >         > +             .addr = info->dyn_addr,
> >         > +             .payload.len = sizeof(struct i3c_ccc_getpid),
> >         > +             .payload.data = &getpid,
> >         > +     };  
> 
> >         > +}
> >         > +  
> >
> >         and them when
> >
> >         static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master,
> >                                 u8 *bytes, int nbytes)
> >         {
> >              readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4);
> >         ...
> >         }  
> 
> Ok, I spent an hour chasing the ARM implementation and finding
> no way this could go wrong here. I see that 'struct i3c_ccc_getpid'
> may be misaligned on the stack (it normally won't be), and that
> the ARM readsl() has a lot of extra code to handle unaligned
> output.

I didn't have this problem on xtensa either.

> However, the dump that Vitor reports
> 
> >         [ECR   ]: 0x00230400 => Misaligned r/w from 0x9a249c7a
> >         [EFA   ]: 0x9a249c7a
> >        [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master]  
> 
> Is from an arch/arc kernel that uses asm-generic/io.h, and
> that stores the output using a u32 pointer:
> 
> static inline void readsl(const volatile void __iomem *addr, void *buffer,
>                           unsigned int count)
> {
>         if (count) {
>                 u32 *buf = buffer;
> 
>                 do {
>                         u32 x = __raw_readl(addr);
>                         *buf++ = x;
>                 } while (--count);
>         }
> }
> 
> This is apparently not allowed on ARC when 'buffer' is
> unaligned. I think what we need here is to use
> put_unaligned() instead of the pointer dereference.
> For architectures that can do unaligned accesses,
> the result is the same, but for ARC it will fix the problem.

Okay, so writesl()/readsl() should deal with unaligned pointers, and
default implementations should be fixed. I guess you'll send a patch to
use put/get_unaligned().

> 
> > > One way to address this might be to always bounce any
> > > messages that are less than a cache line through a
> > > (pre-)kmallocated buffer, and require any longer messages
> > > to be cache capable. This could also solve the issue with
> > > readsl(), but it would be a rather confusing user interface.
> > >
> > > Another option might be to have separate interfaces for
> > > "short" and "long" messages at the API level and have
> > > distinct rules for those: short would always be bounced
> > > by the i3c code, and long puts restrictions on the buffer
> > > location.  
> >
> > Hm, let's keep the API simple. I'll just mandate that all payload bufs
> > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated.  
> 
> Ok. What about i2c commands sent to the same i3c controller
> then?

Still not taken care of.

> Do we need to copy those to satisfy the requirements
> of the i3c layer?

I guess we should. The question is, should we do that unconditionally
or should we try to optimize thins with something like:

	if (!virt_addr_valid(xfer->buf) ||
	    object_is_on_stack(xfer->buf))
		/* Alloc bounce buf. */
	else
		/* Use provided buf. */
Arnd Bergmann Oct. 26, 2018, 1:21 p.m. UTC | #14
On Fri, Oct 26, 2018 at 2:46 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> On Fri, 26 Oct 2018 12:01:52 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
> > On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> > > On Fri, 26 Oct 2018 09:43:25 +0200
> > > Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
> > > > <boris.brezillon@bootlin.com> wrote:
> > > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote:
> > > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> > > > > > > On Thu, 25 Oct 2018 17:30:26 +0200
> >
> > This is apparently not allowed on ARC when 'buffer' is
> > unaligned. I think what we need here is to use
> > put_unaligned() instead of the pointer dereference.
> > For architectures that can do unaligned accesses,
> > the result is the same, but for ARC it will fix the problem.
>
> Okay, so writesl()/readsl() should deal with unaligned pointers, and
> default implementations should be fixed. I guess you'll send a patch to
> use put/get_unaligned().

That's one way of doing it, though thinking about it some more,
this can also introduce overhead on machines that don't support
unaligned buffers and only work on drivers that are guaranteed
to see fully aligned data.

We could also override these specifically for ARC, and risk
running into the same problem elsewhere, rather than be sure
to fix everyone while risking to introduce noticeable performance
regressions in existing drivers.

> > > > One way to address this might be to always bounce any
> > > > messages that are less than a cache line through a
> > > > (pre-)kmallocated buffer, and require any longer messages
> > > > to be cache capable. This could also solve the issue with
> > > > readsl(), but it would be a rather confusing user interface.
> > > >
> > > > Another option might be to have separate interfaces for
> > > > "short" and "long" messages at the API level and have
> > > > distinct rules for those: short would always be bounced
> > > > by the i3c code, and long puts restrictions on the buffer
> > > > location.
> > >
> > > Hm, let's keep the API simple. I'll just mandate that all payload bufs
> > > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated.
> >
> > Ok. What about i2c commands sent to the same i3c controller
> > then?
>
> Still not taken care of.
>
> > Do we need to copy those to satisfy the requirements
> > of the i3c layer?
>
> I guess we should. The question is, should we do that unconditionally
> or should we try to optimize thins with something like:
>
>         if (!virt_addr_valid(xfer->buf) ||
>             object_is_on_stack(xfer->buf))
>                 /* Alloc bounce buf. */
>         else
>                 /* Use provided buf. */

There may be too many cases that we need to handle here that
are not DMA capable. To be on the safe side, I'd probably always
copy all data that is not a multiple of fully aligned cache lines,
as well as pointers that fails to meet some other requirements
(stack, vmalloc, kmap, ...)

        Arnd