Message ID | 1469761153-85576-1-git-send-email-apronin@chromium.org |
---|---|
State | New |
Headers | show |
On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote: > Annotate buffers used in spi transactions as ____cacheline_aligned > to use in DMA transfers. > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c > index 9f5a011..0e9aad9 100644 > +++ b/drivers/char/tpm/st33zp24/spi.c > @@ -70,8 +70,8 @@ > struct st33zp24_spi_phy { > struct spi_device *spi_device; > > - u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > - u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > + u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > + u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > int io_lpcpd; > int latency; Hurm, this still looks wrong to me. Aligning the start of buffers is not enough, the DMA'able space must also end on a cache line as well. So, the buffers must also always be placed at the end of the struct. IMHO It would be cleaner and safer to always kmalloc the DMA buffer alone than to try and optimize like this. Jason ------------------------------------------------------------------------------
On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe < jgunthorpe@obsidianresearch.com> wrote: > On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote: > > Annotate buffers used in spi transactions as ____cacheline_aligned > > to use in DMA transfers. > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/char/tpm/st33zp24/spi.c > b/drivers/char/tpm/st33zp24/spi.c > > index 9f5a011..0e9aad9 100644 > > +++ b/drivers/char/tpm/st33zp24/spi.c > > @@ -70,8 +70,8 @@ > > struct st33zp24_spi_phy { > > struct spi_device *spi_device; > > > > - u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > - u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > + u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > + u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > > > int io_lpcpd; > > int latency; > > Hurm, this still looks wrong to me. Aligning the start of buffers is > not enough, the DMA'able space must also end on a cache line as well. > > So, the buffers must also always be placed at the end of the struct. > > IMHO It would be cleaner and safer to always kmalloc the DMA buffer > alone than to try and optimize like this. > In this case moving them to the end of the structure and commenting why they have to be at the end might be less invasive change. More performance-efficient and resilient in low memory situations too. Thanks, Dmitry ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote: > On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote: > > Annotate buffers used in spi transactions as ____cacheline_aligned > > to use in DMA transfers. > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/char/tpm/st33zp24/spi.c > b/drivers/char/tpm/st33zp24/spi.c > > index 9f5a011..0e9aad9 100644 > > +++ b/drivers/char/tpm/st33zp24/spi.c > > @@ -70,8 +70,8 @@ > > struct st33zp24_spi_phy { > >    struct spi_device *spi_device; > > > > -   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > -   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > +   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > +   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > > >    int io_lpcpd; > >    int latency; > > Hurm, this still looks wrong to me. Aligning the start of buffers is > not enough, the DMA'able space must also end on a cache line as well. > > So, the buffers must also always be placed at the end of the struct. > > IMHO It would be cleaner and safer to always kmalloc the DMA buffer > alone than to try and optimize like this. > > In this case moving them to the end of the structure and commenting why > they have to be at the end might be less invasive change. More > performance-efficient and resilient in low memory situations too. kmallocs would be done in the driver initialization: * you rarely are in low memory situation * performance gain/loss is insignificant I really don't see your point. > Thanks, > Dmitry /Jarkko ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote: > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote: > > On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe > > <jgunthorpe@obsidianresearch.com> wrote: > > > > On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote: > > > Annotate buffers used in spi transactions as ____cacheline_aligned > > > to use in DMA transfers. > > > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > > > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/char/tpm/st33zp24/spi.c > > b/drivers/char/tpm/st33zp24/spi.c > > > index 9f5a011..0e9aad9 100644 > > > +++ b/drivers/char/tpm/st33zp24/spi.c > > > @@ -70,8 +70,8 @@ > > > struct st33zp24_spi_phy { > > >    struct spi_device *spi_device; > > > > > > -   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > > -   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > > +   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > > +   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; > > > > > >    int io_lpcpd; > > >    int latency; > > > > Hurm, this still looks wrong to me. Aligning the start of buffers is > > not enough, the DMA'able space must also end on a cache line as well. > > > > So, the buffers must also always be placed at the end of the struct. > > > > IMHO It would be cleaner and safer to always kmalloc the DMA buffer > > alone than to try and optimize like this. > > > > In this case moving them to the end of the structure and commenting why > > they have to be at the end might be less invasive change. More > > performance-efficient and resilient in low memory situations too. > > kmallocs would be done in the driver initialization: > > * you rarely are in low memory situation > * performance gain/loss is insignificant > > I really don't see your point. I'm fine having them at the end of the structure mainly for simplicity reasons but those arguments just didn't hold at all. /Jarkko ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen < jarkko.sakkinen@linux.intel.com> wrote: > On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote: > > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote: > > > On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe > > > <jgunthorpe@obsidianresearch.com> wrote: > > > > > > On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin wrote: > > > > Annotate buffers used in spi transactions as > ____cacheline_aligned > > > > to use in DMA transfers. > > > > > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > > > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > > > > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > > > > 2 files changed, 4 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/char/tpm/st33zp24/spi.c > > > b/drivers/char/tpm/st33zp24/spi.c > > > > index 9f5a011..0e9aad9 100644 > > > > +++ b/drivers/char/tpm/st33zp24/spi.c > > > > @@ -70,8 +70,8 @@ > > > > struct st33zp24_spi_phy { > > > > struct spi_device *spi_device; > > > > > > > > - u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > > > - u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > > > + u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] > ____cacheline_aligned; > > > > + u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] > ____cacheline_aligned; > > > > > > > > int io_lpcpd; > > > > int latency; > > > > > > Hurm, this still looks wrong to me. Aligning the start of buffers > is > > > not enough, the DMA'able space must also end on a cache line as > well. > > > > > > So, the buffers must also always be placed at the end of the > struct. > > > > > > IMHO It would be cleaner and safer to always kmalloc the DMA > buffer > > > alone than to try and optimize like this. > > > > > > In this case moving them to the end of the structure and commenting > why > > > they have to be at the end might be less invasive change. More > > > performance-efficient and resilient in low memory situations too. > > > > kmallocs would be done in the driver initialization: > > > > * you rarely are in low memory situation > > * performance gain/loss is insignificant > > > > I really don't see your point. > > I'm fine having them at the end of the structure mainly for simplicity > reasons but those arguments just didn't hold at all. > Well, the main reason was simplicity and invasiveness of the change. But I still maintain that doing 3 memory allocations instead of 1 is less performant and puts more pressure on the kernel. Yes, it is at bind time, but you do not have to do 3 times work when one allocation will suffice. Also, driver binding does not necessarily happen at boot time. I can always unbind and rebind the driver or reload the module. Thanks, Dmitry ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote: > Well, the main reason was simplicity and invasiveness of the > change. Well, it isn't simple, because the proposed patches have had subtle problems with DMA. Simple is to use a guaranteed dma-able allocation for DMA memory and stop trying to over optimize. Jason ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
On Tue, Aug 09, 2016 at 08:18:00AM -0700, Dmitry Torokhov wrote: > On Tue, Aug 9, 2016 at 8:01 AM, Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > On Tue, Aug 09, 2016 at 12:46:10PM +0300, Jarkko Sakkinen wrote: > > On Fri, Jul 29, 2016 at 10:30:22AM -0700, Dmitry Torokhov wrote: > > >  On Fri, Jul 29, 2016 at 10:27 AM, Jason Gunthorpe > > >  <jgunthorpe@obsidianresearch.com> wrote: > > > > > >   On Thu, Jul 28, 2016 at 07:59:13PM -0700, Andrey Pronin > wrote: > > >   > Annotate buffers used in spi transactions as > ____cacheline_aligned > > >   > to use in DMA transfers. > > >   > > > >   > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > >   > drivers/char/tpm/st33zp24/spi.c | 4 ++-- > > >   > drivers/char/tpm/tpm_tis_spi.c | 4 ++-- > > >   > 2 files changed, 4 insertions(+), 4 deletions(-) > > >   > > > >   > diff --git a/drivers/char/tpm/st33zp24/spi.c > > >   b/drivers/char/tpm/st33zp24/spi.c > > >   > index 9f5a011..0e9aad9 100644 > > >   > +++ b/drivers/char/tpm/st33zp24/spi.c > > >   > @@ -70,8 +70,8 @@ > > >   > struct st33zp24_spi_phy { > > >   >    struct spi_device *spi_device; > > >   > > > >   > -   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > >   > -   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; > > >   > +   u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] > ____cacheline_aligned; > > >   > +   u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] > ____cacheline_aligned; > > >   > > > >   >    int io_lpcpd; > > >   >    int latency; > > > > > >   Hurm, this still looks wrong to me. Aligning the start of > buffers is > > >   not enough, the DMA'able space must also end on a cache line > as well. > > > > > >   So, the buffers must also always be placed at the end of the > struct. > > > > > >   IMHO It would be cleaner and safer to always kmalloc the DMA > buffer > > >   alone than to try and optimize like this. > > > > > >  In this case moving them to the end of the structure and > commenting why > > >  they have to be at the end might be less invasive change. More > > >  performance-efficient and resilient in low memory situations > too. > > > > kmallocs would be done in the driver initialization: > > > > * you rarely are in low memory situation > > * performance gain/loss is insignificant > > > > I really don't see your point. > > I'm fine having them at the end of the structure mainly for simplicity > reasons but those arguments just didn't hold at all. > > Well, the main reason was simplicity and invasiveness of the change. > But I still maintain that doing 3 memory allocations instead of 1 is less > performant and puts more pressure on the kernel. Yes, it is at bind time, > but you do not have to do 3 times work when one allocation will suffice. > Also, driver binding does not necessarily happen at boot time. I can > always unbind and rebind the driver or reload the module. I'm fine with either approach. > Thanks, > Dmitry /Jarkko ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. http://sdm.link/zohodev2dev
diff --git a/drivers/char/tpm/st33zp24/spi.c b/drivers/char/tpm/st33zp24/spi.c index 9f5a011..0e9aad9 100644 --- a/drivers/char/tpm/st33zp24/spi.c +++ b/drivers/char/tpm/st33zp24/spi.c @@ -70,8 +70,8 @@ struct st33zp24_spi_phy { struct spi_device *spi_device; - u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE]; - u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE]; + u8 tx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; + u8 rx_buf[ST33ZP24_SPI_BUFFER_SIZE] ____cacheline_aligned; int io_lpcpd; int latency; diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c index dbaad9c..58d7758 100644 --- a/drivers/char/tpm/tpm_tis_spi.c +++ b/drivers/char/tpm/tpm_tis_spi.c @@ -48,8 +48,8 @@ struct tpm_tis_spi_phy { struct tpm_tis_data priv; struct spi_device *spi_device; - u8 tx_buf[MAX_SPI_FRAMESIZE + 4]; - u8 rx_buf[MAX_SPI_FRAMESIZE + 4]; + u8 tx_buf[MAX_SPI_FRAMESIZE + 4] ____cacheline_aligned; + u8 rx_buf[MAX_SPI_FRAMESIZE + 4] ____cacheline_aligned; }; static inline struct tpm_tis_spi_phy *to_tpm_tis_spi_phy(struct tpm_tis_data *data)
Annotate buffers used in spi transactions as ____cacheline_aligned to use in DMA transfers. Signed-off-by: Andrey Pronin <apronin@chromium.org> --- drivers/char/tpm/st33zp24/spi.c | 4 ++-- drivers/char/tpm/tpm_tis_spi.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)