Message ID | 20181022133404.2061-1-boris.brezillon@bootlin.com |
---|---|
Headers | show |
Series | Add the I3C subsystem | expand |
Hi Arnd, On Mon, 22 Oct 2018 15:34:01 +0200 Boris Brezillon <boris.brezillon@bootlin.com> wrote: > + > +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master, > + u8 *bytes, int nbytes) > +{ > + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); Vitor reported a problem with readsl(): this function expects the 2nd argument to be aligned on 32-bit, which is not guaranteed here. Unless you see a better solution, I'll switch back to a loop doing: for (i = 0; i < nbytes; i += 4) { u32 tmp = __raw_readl(...); memcpy(bytes + i, &tmp, nbytes - i > 4 ? 4 : nbytes - i); } > + if (nbytes & 3) { > + u32 tmp; > + > + readsl(master->regs + RX_FIFO, &tmp, 1); > + memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3); > + } > +} Regards, Boris
On 10/24/18 1:20 PM, Boris Brezillon wrote: > Hi Arnd, > > On Mon, 22 Oct 2018 15:34:01 +0200 > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > >> + >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master, >> + u8 *bytes, int nbytes) >> +{ >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > Vitor reported a problem with readsl(): this function expects the 2nd > argument to be aligned on 32-bit, which is not guaranteed here. Unless > you see a better solution, I'll switch back to a loop doing: > > for (i = 0; i < nbytes; i += 4) { > u32 tmp = __raw_readl(...); Pls, do not use __raw io. > memcpy(bytes + i, &tmp, > nbytes - i > 4 ? 4 : nbytes - i); > } > >> + if (nbytes & 3) { >> + u32 tmp; >> + >> + readsl(master->regs + RX_FIFO, &tmp, 1); >> + memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3); >> + } >> +} > > Regards, > > Boris >
On Wed, 24 Oct 2018 15:25:17 -0500 Grygorii Strashko <grygorii.strashko@ti.com> wrote: > On 10/24/18 1:20 PM, Boris Brezillon wrote: > > Hi Arnd, > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > >> + > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master, > >> + u8 *bytes, int nbytes) > >> +{ > >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > > > Vitor reported a problem with readsl(): this function expects the 2nd > > argument to be aligned on 32-bit, which is not guaranteed here. Unless > > you see a better solution, I'll switch back to a loop doing: > > > > for (i = 0; i < nbytes; i += 4) { > > u32 tmp = __raw_readl(...); > > Pls, do not use __raw io. Except this is exactly what I want here, unless you have a replacement for "readl() without a mem-barrier and without endianness conversion" > > > memcpy(bytes + i, &tmp, > > nbytes - i > 4 ? 4 : nbytes - i); > > } > > > >> + if (nbytes & 3) { > >> + u32 tmp; > >> + > >> + readsl(master->regs + RX_FIFO, &tmp, 1); > >> + memcpy(bytes + (nbytes & ~3), &tmp, nbytes & 3); > >> + } > >> +} > > > > Regards, > > > > Boris > > >
On 10/24/18 4:04 PM, Boris Brezillon wrote: > On Wed, 24 Oct 2018 15:25:17 -0500 > Grygorii Strashko <grygorii.strashko@ti.com> wrote: > >> On 10/24/18 1:20 PM, Boris Brezillon wrote: >>> Hi Arnd, >>> >>> On Mon, 22 Oct 2018 15:34:01 +0200 >>> Boris Brezillon <boris.brezillon@bootlin.com> wrote: >>> >>> >>>> + >>>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master, >>>> + u8 *bytes, int nbytes) >>>> +{ >>>> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); >>> >>> Vitor reported a problem with readsl(): this function expects the 2nd >>> argument to be aligned on 32-bit, which is not guaranteed here. Unless >>> you see a better solution, I'll switch back to a loop doing: >>> >>> for (i = 0; i < nbytes; i += 4) { >>> u32 tmp = __raw_readl(...); >> >> Pls, do not use __raw io. > > Except this is exactly what I want here, unless you have a > replacement for "readl() without a mem-barrier and without endianness > conversion" > Not sure why endianness is the problem. readl_relaxed? Sry, I've missed that this is part of the driver not i3c core, so minor/ignore.
On Wed, 24 Oct 2018 17:43:00 -0500 Grygorii Strashko <grygorii.strashko@ti.com> wrote: > On 10/24/18 4:04 PM, Boris Brezillon wrote: > > On Wed, 24 Oct 2018 15:25:17 -0500 > > Grygorii Strashko <grygorii.strashko@ti.com> wrote: > > > >> On 10/24/18 1:20 PM, Boris Brezillon wrote: > >>> Hi Arnd, > >>> > >>> On Mon, 22 Oct 2018 15:34:01 +0200 > >>> Boris Brezillon <boris.brezillon@bootlin.com> wrote: > >>> > >>> > >>>> + > >>>> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master *master, > >>>> + u8 *bytes, int nbytes) > >>>> +{ > >>>> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > >>> > >>> Vitor reported a problem with readsl(): this function expects the 2nd > >>> argument to be aligned on 32-bit, which is not guaranteed here. Unless > >>> you see a better solution, I'll switch back to a loop doing: > >>> > >>> for (i = 0; i < nbytes; i += 4) { > >>> u32 tmp = __raw_readl(...); > >> > >> Pls, do not use __raw io. > > > > Except this is exactly what I want here, unless you have a > > replacement for "readl() without a mem-barrier and without endianness > > conversion" > > > > Not sure why endianness is the problem. readl_relaxed? Because we want to read a stream of bytes, and, if we have a CPU that is operating in big-endian (ARM kernels can configured in BE or LE), byte ordering will be messed up (the controller is LE). If I use readl_relaxed(), I'll then have to call cpu_to_le32(), and finally copy the result to the buffer. > Sry, I've missed that this is part of the driver not i3c core, > so minor/ignore. >
On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > Hi Arnd, > > On Mon, 22 Oct 2018 15:34:01 +0200 > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > >> + >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master >> *master, >> + u8 *bytes, int nbytes) >> +{ >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > Vitor reported a problem with readsl(): this function expects the 2nd > argument to be aligned on 32-bit, which is not guaranteed here. Unless > you see a better solution, I'll switch back to a loop doing: > > for (i = 0; i < nbytes; i += 4) { > u32 tmp = __raw_readl(...); > memcpy(bytes + i, &tmp, > nbytes - i > 4 ? 4 : nbytes - i); > } Could we maybe mandate that the buffer itself must be aligned here? What would be a reason why we see an unaligned target buffer? The open-coded loop should generally work (maybe a little slower), but it does seem error-prone to use __raw_readl() in general. Arnd
On Thu, 25 Oct 2018 17:30:26 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > Hi Arnd, > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > >> + > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master > >> *master, > >> + u8 *bytes, int nbytes) > >> +{ > >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > > > Vitor reported a problem with readsl(): this function expects the 2nd > > argument to be aligned on 32-bit, which is not guaranteed here. Unless > > you see a better solution, I'll switch back to a loop doing: > > > > for (i = 0; i < nbytes; i += 4) { > > u32 tmp = __raw_readl(...); > > memcpy(bytes + i, &tmp, > > nbytes - i > 4 ? 4 : nbytes - i); > > } > > Could we maybe mandate that the buffer itself must be aligned here? > What would be a reason why we see an unaligned target buffer? Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily aligned because they're not dynamically allocated (allocated on the stack) and are not naturally aligned on 32-bits (either because they are smaller than 32bits or because the struct is declared __packed). I guess I could dynamically allocate the payload, but that requires going over all users of i3c_send_ccc_cmd() to patch them.
On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > On Thu, 25 Oct 2018 17:30:26 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > Hi Arnd, > > > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > > > > >> + > > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master > > >> *master, > > >> + u8 *bytes, int nbytes) > > >> +{ > > >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > > > > > Vitor reported a problem with readsl(): this function expects the 2nd > > > argument to be aligned on 32-bit, which is not guaranteed here. Unless > > > you see a better solution, I'll switch back to a loop doing: > > > > > > for (i = 0; i < nbytes; i += 4) { > > > u32 tmp = __raw_readl(...); > > > memcpy(bytes + i, &tmp, > > > nbytes - i > 4 ? 4 : nbytes - i); > > > } > > > > Could we maybe mandate that the buffer itself must be aligned here? > > What would be a reason why we see an unaligned target buffer? > > Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily > aligned because they're not dynamically allocated (allocated on the > stack) and are not naturally aligned on 32-bits (either because they > are smaller than 32bits or because the struct is declared __packed). > > I guess I could dynamically allocate the payload, but that requires > going over all users of i3c_send_ccc_cmd() to patch them. This reminds me that Wolfram mentioned in his ELC talk that the buffers on i3c should all be DMA capable to make life easier for i3c master drivers that want to implement DMA transfers. If we have buffers here that are not aligned to cache lines (or even just 32 bit words), doesn't that also mean that the same buffers are not DMA capable either? Arnd
On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon > <boris.brezillon@bootlin.com> wrote: > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > Hi Arnd, > > > > > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > > > Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > > > > > > > >> + > > > >> +static void cdns_i3c_master_rd_from_rx_fifo(struct cdns_i3c_master > > > >> *master, > > > >> + u8 *bytes, int nbytes) > > > >> +{ > > > >> + readsl(master->regs + RX_FIFO, bytes, nbytes / 4); > > > > > > > > Vitor reported a problem with readsl(): this function expects the 2nd > > > > argument to be aligned on 32-bit, which is not guaranteed here. Unless > > > > you see a better solution, I'll switch back to a loop doing: > > > > > > > > for (i = 0; i < nbytes; i += 4) { > > > > u32 tmp = __raw_readl(...); > > > > memcpy(bytes + i, &tmp, > > > > nbytes - i > 4 ? 4 : nbytes - i); > > > > } > > > > > > Could we maybe mandate that the buffer itself must be aligned here? > > > What would be a reason why we see an unaligned target buffer? > > > > Well, the buffers we pass to i3c_send_ccc_cmd() are not necessarily > > aligned because they're not dynamically allocated (allocated on the > > stack) and are not naturally aligned on 32-bits (either because they > > are smaller than 32bits or because the struct is declared __packed). > > > > I guess I could dynamically allocate the payload, but that requires > > going over all users of i3c_send_ccc_cmd() to patch them. > > This reminds me that Wolfram mentioned in his ELC talk that the > buffers on i3c should all be DMA capable to make life easier for > i3c master drivers that want to implement DMA transfers. And this is the case for all buffers passed to i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()), but I did not enforce that for the internal i3c_master_send_ccc_cmd_locked() helper, maybe I should... It was just convenient to place the object to be transmitted/received on the stack. > > If we have buffers here that are not aligned to cache lines > (or even just 32 bit words), doesn't that also mean that the > same buffers are not DMA capable either? Yep, if it's not cache-line-aligned (and on the stack), it's not DMA-able.
On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > > I guess I could dynamically allocate the payload, but that requires > > > going over all users of i3c_send_ccc_cmd() to patch them. > > > > This reminds me that Wolfram mentioned in his ELC talk that the > > buffers on i3c should all be DMA capable to make life easier for > > i3c master drivers that want to implement DMA transfers. > > And this is the case for all buffers passed to > i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()), > but I did not enforce that for the internal > i3c_master_send_ccc_cmd_locked() helper, maybe I should... > It was just convenient to place the object to be transmitted/received on > the stack. Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public interfaces then, or is this something else? If you place a buffer on the stack, it is not DMA capable, but it is guaranteed to be at least 32-bit word aligned, and should not cause an exception in readsl(), unless it starts with a couple of (not multiple of four) extra bytes that are not sent to the devices. Is that what happens here? > > If we have buffers here that are not aligned to cache lines > > (or even just 32 bit words), doesn't that also mean that the > > same buffers are not DMA capable either? > > Yep, if it's not cache-line-aligned (and on the stack), it's not > DMA-able. This sounds like a more fundamental problem to solve first then. Obviously it is incredibly /useful/ to be able to put short i2c or i3c messages on the stack, but allowing that in general also prevents the use of DMA without bounce buffers. One way to address this might be to always bounce any messages that are less than a cache line through a (pre-)kmallocated buffer, and require any longer messages to be cache capable. This could also solve the issue with readsl(), but it would be a rather confusing user interface. Another option might be to have separate interfaces for "short" and "long" messages at the API level and have distinct rules for those: short would always be bounced by the i3c code, and long puts restrictions on the buffer location. Arnd
Hi Arnd, On Fri, 26 Oct 2018 09:43:25 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon > <boris.brezillon@bootlin.com> wrote: > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On 10/24/18, Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > > On Mon, 22 Oct 2018 15:34:01 +0200 > > > > I guess I could dynamically allocate the payload, but that requires > > > > going over all users of i3c_send_ccc_cmd() to patch them. > > > > > > This reminds me that Wolfram mentioned in his ELC talk that the > > > buffers on i3c should all be DMA capable to make life easier for > > > i3c master drivers that want to implement DMA transfers. > > > > And this is the case for all buffers passed to > > i3c_device_do_priv_xfers() (and soon i3c_device_send_hdr_cmd()), > > but I did not enforce that for the internal > > i3c_master_send_ccc_cmd_locked() helper, maybe I should... > > It was just convenient to place the object to be transmitted/received on > > the stack. > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public > interfaces then, or is this something else? i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so it's part of the master controller interface. > > If you place a buffer on the stack, it is not DMA capable, but > it is guaranteed to be at least 32-bit word aligned, and should > not cause an exception in readsl(), unless it starts with a couple of > (not multiple of four) extra bytes that are not sent to the devices. > Is that what happens here? Here is the report I received from Vitor: " Hi Boris, I'm trying this new patch-set version but I get some issues when use readsl() function. Basically the system complain about memory alignment. As exemple when I try to read the PID from the device > +static int i3c_master_getpid_locked(struct i3c_master_controller *master, > + struct i3c_device_info *info) > +{ > + struct i3c_ccc_getpid getpid; at this point the getpid struct it is already unaligned with i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a > + struct i3c_ccc_cmd_dest dest = { > + .addr = info->dyn_addr, > + .payload.len = sizeof(struct i3c_ccc_getpid), > + .payload.data = &getpid, > + }; > + struct i3c_ccc_cmd cmd = { > + .rnw = true, > + .id = I3C_CCC_GETPID, > + .dests = &dest, > + .ndests = 1, > + }; > + int ret, i; > + > + ret = i3c_master_send_ccc_cmd_locked(master, &cmd); > + if (ret) > + return ret; > + > + info->pid = 0; > + for (i = 0; i < sizeof(getpid.pid); i++) { > + int sft = (sizeof(getpid.pid) - i - 1) * 8; > + > + info->pid |= (u64)getpid.pid[i] << sft; > + } > + > + return 0; > +} > + and them when static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master, u8 *bytes, int nbytes) { readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4); ... } the system crash. Misaligned Access Path: (null) CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1 #88 [ECR ]: 0x00230400 => Misaligned r/w from 0x9a249c7a [EFA ]: 0x9a249c7a [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master] [ERET ]: dw_i3c_master_irq_handler+0x224/0x2fc [dw_i3c_master] [STAT32]: 0x00000a4c : K DE A1 E2 BTA: 0x70038e44 SP: 0x8071fe58 FP: 0x00000000 LPS: 0x8060e63e LPE: 0x8060e642 LPC: 0x00000000 r00: 0x00000033 r01: 0x00000004 r02: 0x00000000 r03: 0xd0002014 r04: 0x00000006 r05: 0x00000000 r06: 0x9a249c7a r07: 0x39307260 r08: 0xe10b6900 r09: 0x00000013 r10: 0x00000000 r11: 0x000000c9 r12: 0x0a613763 Do you have any idea about this? Best regards, Vitor Soares " > > > > If we have buffers here that are not aligned to cache lines > > > (or even just 32 bit words), doesn't that also mean that the > > > same buffers are not DMA capable either? > > > > Yep, if it's not cache-line-aligned (and on the stack), it's not > > DMA-able. > > This sounds like a more fundamental problem to solve first > then. Obviously it is incredibly /useful/ to be able to put short > i2c or i3c messages on the stack, but allowing that in general > also prevents the use of DMA without bounce buffers. Actually, we have the same problem in MTD (UBI passes vmalloced buffers to the MTD stack), so I understand this concern very well, and I agree that enforcing all buffers passed to the controller to be DMA capable is the right thing to do. I guess I just didn't think about internal APIs when I made this modification which explains why CCC cmds were left behind. > > One way to address this might be to always bounce any > messages that are less than a cache line through a > (pre-)kmallocated buffer, and require any longer messages > to be cache capable. This could also solve the issue with > readsl(), but it would be a rather confusing user interface. > > Another option might be to have separate interfaces for > "short" and "long" messages at the API level and have > distinct rules for those: short would always be bounced > by the i3c code, and long puts restrictions on the buffer > location. Hm, let's keep the API simple. I'll just mandate that all payload bufs passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated. Thanks for your feedback. Boris
On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > On Fri, 26 Oct 2018 09:43:25 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon > > <boris.brezillon@bootlin.com> wrote: > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public > > interfaces then, or is this something else? > > i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so > it's part of the master controller interface. > > > > > If you place a buffer on the stack, it is not DMA capable, but > > it is guaranteed to be at least 32-bit word aligned, and should > > not cause an exception in readsl(), unless it starts with a couple of > > (not multiple of four) extra bytes that are not sent to the devices. > > Is that what happens here? > > Here is the report I received from Vitor: > > " > Hi Boris, > > > I'm trying this new patch-set version but I get some issues when use > readsl() function. > > Basically the system complain about memory alignment. > > > +static int i3c_master_getpid_locked(struct i3c_master_controller *master, > > + struct i3c_device_info *info) > > +{ > > + struct i3c_ccc_getpid getpid; > > at this point the getpid struct it is already unaligned with > > i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a > > > + struct i3c_ccc_cmd_dest dest = { > > + .addr = info->dyn_addr, > > + .payload.len = sizeof(struct i3c_ccc_getpid), > > + .payload.data = &getpid, > > + }; > > +} > > + > > and them when > > static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master, > u8 *bytes, int nbytes) > { > readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4); > ... > } Ok, I spent an hour chasing the ARM implementation and finding no way this could go wrong here. I see that 'struct i3c_ccc_getpid' may be misaligned on the stack (it normally won't be), and that the ARM readsl() has a lot of extra code to handle unaligned output. However, the dump that Vitor reports > [ECR ]: 0x00230400 => Misaligned r/w from 0x9a249c7a > [EFA ]: 0x9a249c7a > [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master] Is from an arch/arc kernel that uses asm-generic/io.h, and that stores the output using a u32 pointer: static inline void readsl(const volatile void __iomem *addr, void *buffer, unsigned int count) { if (count) { u32 *buf = buffer; do { u32 x = __raw_readl(addr); *buf++ = x; } while (--count); } } This is apparently not allowed on ARC when 'buffer' is unaligned. I think what we need here is to use put_unaligned() instead of the pointer dereference. For architectures that can do unaligned accesses, the result is the same, but for ARC it will fix the problem. > > One way to address this might be to always bounce any > > messages that are less than a cache line through a > > (pre-)kmallocated buffer, and require any longer messages > > to be cache capable. This could also solve the issue with > > readsl(), but it would be a rather confusing user interface. > > > > Another option might be to have separate interfaces for > > "short" and "long" messages at the API level and have > > distinct rules for those: short would always be bounced > > by the i3c code, and long puts restrictions on the buffer > > location. > > Hm, let's keep the API simple. I'll just mandate that all payload bufs > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated. Ok. What about i2c commands sent to the same i3c controller then? Do we need to copy those to satisfy the requirements of the i3c layer? Arnd
On Fri, 26 Oct 2018 12:01:52 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon > <boris.brezillon@bootlin.com> wrote: > > On Fri, 26 Oct 2018 09:43:25 +0200 > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon > > > <boris.brezillon@bootlin.com> wrote: > > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public > > > interfaces then, or is this something else? > > > > i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so > > it's part of the master controller interface. > > > > > > > > If you place a buffer on the stack, it is not DMA capable, but > > > it is guaranteed to be at least 32-bit word aligned, and should > > > not cause an exception in readsl(), unless it starts with a couple of > > > (not multiple of four) extra bytes that are not sent to the devices. > > > Is that what happens here? > > > > Here is the report I received from Vitor: > > > > " > > Hi Boris, > > > > > > I'm trying this new patch-set version but I get some issues when use > > readsl() function. > > > > Basically the system complain about memory alignment. > > > > > > +static int i3c_master_getpid_locked(struct i3c_master_controller *master, > > > + struct i3c_device_info *info) > > > +{ > > > + struct i3c_ccc_getpid getpid; > > > > at this point the getpid struct it is already unaligned with > > > > i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a > > > > > + struct i3c_ccc_cmd_dest dest = { > > > + .addr = info->dyn_addr, > > > + .payload.len = sizeof(struct i3c_ccc_getpid), > > > + .payload.data = &getpid, > > > + }; > > > > +} > > > + > > > > and them when > > > > static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master, > > u8 *bytes, int nbytes) > > { > > readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4); > > ... > > } > > Ok, I spent an hour chasing the ARM implementation and finding > no way this could go wrong here. I see that 'struct i3c_ccc_getpid' > may be misaligned on the stack (it normally won't be), and that > the ARM readsl() has a lot of extra code to handle unaligned > output. I didn't have this problem on xtensa either. > However, the dump that Vitor reports > > > [ECR ]: 0x00230400 => Misaligned r/w from 0x9a249c7a > > [EFA ]: 0x9a249c7a > > [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master] > > Is from an arch/arc kernel that uses asm-generic/io.h, and > that stores the output using a u32 pointer: > > static inline void readsl(const volatile void __iomem *addr, void *buffer, > unsigned int count) > { > if (count) { > u32 *buf = buffer; > > do { > u32 x = __raw_readl(addr); > *buf++ = x; > } while (--count); > } > } > > This is apparently not allowed on ARC when 'buffer' is > unaligned. I think what we need here is to use > put_unaligned() instead of the pointer dereference. > For architectures that can do unaligned accesses, > the result is the same, but for ARC it will fix the problem. Okay, so writesl()/readsl() should deal with unaligned pointers, and default implementations should be fixed. I guess you'll send a patch to use put/get_unaligned(). > > > > One way to address this might be to always bounce any > > > messages that are less than a cache line through a > > > (pre-)kmallocated buffer, and require any longer messages > > > to be cache capable. This could also solve the issue with > > > readsl(), but it would be a rather confusing user interface. > > > > > > Another option might be to have separate interfaces for > > > "short" and "long" messages at the API level and have > > > distinct rules for those: short would always be bounced > > > by the i3c code, and long puts restrictions on the buffer > > > location. > > > > Hm, let's keep the API simple. I'll just mandate that all payload bufs > > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated. > > Ok. What about i2c commands sent to the same i3c controller > then? Still not taken care of. > Do we need to copy those to satisfy the requirements > of the i3c layer? I guess we should. The question is, should we do that unconditionally or should we try to optimize thins with something like: if (!virt_addr_valid(xfer->buf) || object_is_on_stack(xfer->buf)) /* Alloc bounce buf. */ else /* Use provided buf. */
On Fri, Oct 26, 2018 at 2:46 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > On Fri, 26 Oct 2018 12:01:52 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon > > <boris.brezillon@bootlin.com> wrote: > > > On Fri, 26 Oct 2018 09:43:25 +0200 > > > Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon > > > > <boris.brezillon@bootlin.com> wrote: > > > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > > > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > > > > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > > > This is apparently not allowed on ARC when 'buffer' is > > unaligned. I think what we need here is to use > > put_unaligned() instead of the pointer dereference. > > For architectures that can do unaligned accesses, > > the result is the same, but for ARC it will fix the problem. > > Okay, so writesl()/readsl() should deal with unaligned pointers, and > default implementations should be fixed. I guess you'll send a patch to > use put/get_unaligned(). That's one way of doing it, though thinking about it some more, this can also introduce overhead on machines that don't support unaligned buffers and only work on drivers that are guaranteed to see fully aligned data. We could also override these specifically for ARC, and risk running into the same problem elsewhere, rather than be sure to fix everyone while risking to introduce noticeable performance regressions in existing drivers. > > > > One way to address this might be to always bounce any > > > > messages that are less than a cache line through a > > > > (pre-)kmallocated buffer, and require any longer messages > > > > to be cache capable. This could also solve the issue with > > > > readsl(), but it would be a rather confusing user interface. > > > > > > > > Another option might be to have separate interfaces for > > > > "short" and "long" messages at the API level and have > > > > distinct rules for those: short would always be bounced > > > > by the i3c code, and long puts restrictions on the buffer > > > > location. > > > > > > Hm, let's keep the API simple. I'll just mandate that all payload bufs > > > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated. > > > > Ok. What about i2c commands sent to the same i3c controller > > then? > > Still not taken care of. > > > Do we need to copy those to satisfy the requirements > > of the i3c layer? > > I guess we should. The question is, should we do that unconditionally > or should we try to optimize thins with something like: > > if (!virt_addr_valid(xfer->buf) || > object_is_on_stack(xfer->buf)) > /* Alloc bounce buf. */ > else > /* Use provided buf. */ There may be too many cases that we need to handle here that are not DMA capable. To be on the safe side, I'd probably always copy all data that is not a multiple of fully aligned cache lines, as well as pointers that fails to meet some other requirements (stack, vmalloc, kmap, ...) Arnd