[v3,08/10] ARM: mxs: add ocotp read function

Message ID	1294236457-17476-9-git-send-email-shawn.guo@freescale.com
State	Changes Requested, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Shawn Guo <shawn.guo@freescale.com> To: <davem@davemloft.net>, <gerg@snapgear.com>, <baruch@tkos.co.il>, <eric@eukrea.com>, <bryan.wu@canonical.com>, <r64343@freescale.com>, <B32542@freescale.com>, <u.kleine-koenig@pengutronix.de>, <lw@karo-electronics.de>, <w.sang@pengutronix.de>, <s.hauer@pengutronix.de>, <netdev@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org> CC: Shawn Guo <shawn.guo@freescale.com> Subject: [PATCH v3 08/10] ARM: mxs: add ocotp read function Date: Wed, 5 Jan 2011 22:07:35 +0800 Message-ID: <1294236457-17476-9-git-send-email-shawn.guo@freescale.com> In-Reply-To: <1294236457-17476-1-git-send-email-shawn.guo@freescale.com> References: <1294236457-17476-1-git-send-email-shawn.guo@freescale.com> MIME-Version: 1.0 Content-Type: text/plain Sender: netdev-owner@vger.kernel.org Precedence: bulk

Shawn Guo Jan. 5, 2011, 2:07 p.m. UTC

Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
---
Changes for v2:
 - Add mutex locking for mxs_read_ocotp()
 - Use type size_t for count and i
 - Add comment for clk_enable/disable skipping
 - Add ERROR bit clearing and polling step

 arch/arm/mach-mxs/Makefile              |    2 +-
 arch/arm/mach-mxs/include/mach/common.h |    1 +
 arch/arm/mach-mxs/ocotp.c               |   79 +++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 1 deletions(-)
 create mode 100644 arch/arm/mach-mxs/ocotp.c

Jamie Iles Jan. 5, 2011, 4:16 p.m. UTC | #1

Hi Shawn,

On Wed, Jan 05, 2011 at 10:07:35PM +0800, Shawn Guo wrote:
> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
> ---
> Changes for v2:
>  - Add mutex locking for mxs_read_ocotp()
>  - Use type size_t for count and i
>  - Add comment for clk_enable/disable skipping
>  - Add ERROR bit clearing and polling step
> 
>  arch/arm/mach-mxs/Makefile              |    2 +-
>  arch/arm/mach-mxs/include/mach/common.h |    1 +
>  arch/arm/mach-mxs/ocotp.c               |   79 +++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+), 1 deletions(-)
>  create mode 100644 arch/arm/mach-mxs/ocotp.c
> 
[...]
> diff --git a/arch/arm/mach-mxs/ocotp.c b/arch/arm/mach-mxs/ocotp.c
> new file mode 100644
> index 0000000..902ef59
> --- /dev/null
> +++ b/arch/arm/mach-mxs/ocotp.c
> @@ -0,0 +1,79 @@
> +/*
> + * Copyright 2010 Freescale Semiconductor, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/mutex.h>
> +
> +#include <mach/mxs.h>
> +
> +#define BM_OCOTP_CTRL_BUSY		(1 << 8)
> +#define BM_OCOTP_CTRL_ERROR		(1 << 9)
> +#define BM_OCOTP_CTRL_RD_BANK_OPEN	(1 << 12)
> +
> +static DEFINE_MUTEX(ocotp_mutex);
> +
> +int mxs_read_ocotp(unsigned offset, size_t count, u32 *values)
> +{
> +	void __iomem *ocotp_base = MXS_IO_ADDRESS(MXS_OCOTP_BASE_ADDR);
> +	int timeout = 0x400;
> +	size_t i;
> +
> +	mutex_lock(&ocotp_mutex);
> +
> +	/*
> +	 * clk_enable(hbus_clk) for ocotp can be skipped
> +	 * as it must be on when system is running.
> +	 */
> +
> +	/* try to clear ERROR bit */
> +	__mxs_clrl(BM_OCOTP_CTRL_ERROR, ocotp_base);
> +
> +	/* check both BUSY and ERROR cleared */
> +	while ((__raw_readl(ocotp_base) &
> +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> +		/* nothing */;

Is it worth using cpu_relax() in these polling loops?

Jamie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Uwe Kleine-König Jan. 5, 2011, 4:44 p.m. UTC | #2

Hello Jamie,

On Wed, Jan 05, 2011 at 04:16:46PM +0000, Jamie Iles wrote:
> On Wed, Jan 05, 2011 at 10:07:35PM +0800, Shawn Guo wrote:
> > Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
> > ---
> > Changes for v2:
> >  - Add mutex locking for mxs_read_ocotp()
> >  - Use type size_t for count and i
> >  - Add comment for clk_enable/disable skipping
> >  - Add ERROR bit clearing and polling step
> > 
> >  arch/arm/mach-mxs/Makefile              |    2 +-
> >  arch/arm/mach-mxs/include/mach/common.h |    1 +
> >  arch/arm/mach-mxs/ocotp.c               |   79 +++++++++++++++++++++++++++++++
> >  3 files changed, 81 insertions(+), 1 deletions(-)
> >  create mode 100644 arch/arm/mach-mxs/ocotp.c
> > 
> [...]
> > diff --git a/arch/arm/mach-mxs/ocotp.c b/arch/arm/mach-mxs/ocotp.c
> > new file mode 100644
> > index 0000000..902ef59
> > --- /dev/null
> > +++ b/arch/arm/mach-mxs/ocotp.c
> > @@ -0,0 +1,79 @@
> > +/*
> > + * Copyright 2010 Freescale Semiconductor, Inc. All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +
> > +#include <linux/delay.h>
> > +#include <linux/err.h>
> > +#include <linux/mutex.h>
> > +
> > +#include <mach/mxs.h>
> > +
> > +#define BM_OCOTP_CTRL_BUSY		(1 << 8)
> > +#define BM_OCOTP_CTRL_ERROR		(1 << 9)
> > +#define BM_OCOTP_CTRL_RD_BANK_OPEN	(1 << 12)
> > +
> > +static DEFINE_MUTEX(ocotp_mutex);
> > +
> > +int mxs_read_ocotp(unsigned offset, size_t count, u32 *values)
> > +{
> > +	void __iomem *ocotp_base = MXS_IO_ADDRESS(MXS_OCOTP_BASE_ADDR);
> > +	int timeout = 0x400;
> > +	size_t i;
> > +
> > +	mutex_lock(&ocotp_mutex);
> > +
> > +	/*
> > +	 * clk_enable(hbus_clk) for ocotp can be skipped
> > +	 * as it must be on when system is running.
> > +	 */
> > +
> > +	/* try to clear ERROR bit */
> > +	__mxs_clrl(BM_OCOTP_CTRL_ERROR, ocotp_base);
> > +
> > +	/* check both BUSY and ERROR cleared */
> > +	while ((__raw_readl(ocotp_base) &
> > +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> > +		/* nothing */;
> 
> Is it worth using cpu_relax() in these polling loops?
I don't know what cpu_relax does for other platforms, but on ARM it's
just a memory barrier which AFAICT doesn't help here at all (which
doesn't need to be correct).  Why do you think it would be better?

Best regards
Uwe

Jamie Iles Jan. 5, 2011, 5:25 p.m. UTC | #3

On Wed, Jan 05, 2011 at 05:44:09PM +0100, Uwe Kleine-König wrote:
> Hello Jamie,
> On Wed, Jan 05, 2011 at 04:16:46PM +0000, Jamie Iles wrote:
> > On Wed, Jan 05, 2011 at 10:07:35PM +0800, Shawn Guo wrote:
> > > +	/* check both BUSY and ERROR cleared */
> > > +	while ((__raw_readl(ocotp_base) &
> > > +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> > > +		/* nothing */;
> > 
> > Is it worth using cpu_relax() in these polling loops?
> I don't know what cpu_relax does for other platforms, but on ARM it's
> just a memory barrier which AFAICT doesn't help here at all (which
> doesn't need to be correct).  Why do you think it would be better?

Well I don't see that there's anything broken without cpu_relax() but 
using cpu_relax() seems to be the most common way of doing busy polling 
loops that I've seen. It's also a bit easier to read than a comment and 
semi-colon. Perhaps it's just personal preference.

Jamie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jamie Lokier Jan. 5, 2011, 5:56 p.m. UTC | #4

Jamie Iles wrote:
> On Wed, Jan 05, 2011 at 05:44:09PM +0100, Uwe Kleine-König wrote:
> > Hello Jamie,
> > On Wed, Jan 05, 2011 at 04:16:46PM +0000, Jamie Iles wrote:
> > > On Wed, Jan 05, 2011 at 10:07:35PM +0800, Shawn Guo wrote:
> > > > +	/* check both BUSY and ERROR cleared */
> > > > +	while ((__raw_readl(ocotp_base) &
> > > > +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> > > > +		/* nothing */;
> > > 
> > > Is it worth using cpu_relax() in these polling loops?
> > I don't know what cpu_relax does for other platforms, but on ARM it's
> > just a memory barrier which AFAICT doesn't help here at all (which
> > doesn't need to be correct).  Why do you think it would be better?
> 
> Well I don't see that there's anything broken without cpu_relax() but 
> using cpu_relax() seems to be the most common way of doing busy polling 
> loops that I've seen. It's also a bit easier to read than a comment and 
> semi-colon. Perhaps it's just personal preference.

cpu_relax() is a hint to the CPU to, for example, save power or be
less aggressive on the memory bus (to save power or be fairer).

Currently these architectures do more than just a barrier in cpu_relax():
x86, IA64, PowerPC, Tile and S390.

Although it's just a hint on ARM at the moment, it might change in
future - especially with power mattering on so many ARM systems.
(Even now, just changing it to a very short udelay might save power
on existing ARMs without breaking drivers.)

By the way, I see ARM defines cpu_relax as smp_mb() on arch >= 6.  Is
that correct and useful?  On other architectures*, barrier() is enough
of a barrier, but it's conceivable that smp_mb() would have some
ARM-specific fairness or bus activity benefit - in which case it
should probably be mb().

* - except Blackfin, which historically derived a lot from ARM headers.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Russell King - ARM Linux Jan. 5, 2011, 6:35 p.m. UTC | #5

On Wed, Jan 05, 2011 at 05:56:17PM +0000, Jamie Lokier wrote:
> cpu_relax() is a hint to the CPU to, for example, save power or be
> less aggressive on the memory bus (to save power or be fairer).
> 
> Currently these architectures do more than just a barrier in cpu_relax():
> x86, IA64, PowerPC, Tile and S390.
> 
> Although it's just a hint on ARM at the moment, it might change in
> future - especially with power mattering on so many ARM systems.
> (Even now, just changing it to a very short udelay might save power
> on existing ARMs without breaking drivers.)

I think that's a matter for what the loop is doing.  If it's polling
a memory location then it probably has no effect what so ever.  If
the loop is spinning on a device, then the CPU will have to wait for
the read to complete which will slow it down.

It's something that would need very careful evaluation, and is probably
something that's very platform and loop specific.

> By the way, I see ARM defines cpu_relax as smp_mb() on arch >= 6.  Is
> that correct and useful?  On other architectures*, barrier() is enough
> of a barrier, but it's conceivable that smp_mb() would have some
> ARM-specific fairness or bus activity benefit - in which case it
> should probably be mb().

See a discussion last year with Linus.  It's there to ensure that one CPU
spinning on a variable can see a write by another CPU to that same
variable.  Without the barrier, the visibility effects are unbounded on
ARMv6 - and it's only like that for ARMv6, not >= ARMv6.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jamie Lokier Jan. 5, 2011, 7:44 p.m. UTC | #6

Russell King - ARM Linux wrote:
> > By the way, I see ARM defines cpu_relax as smp_mb() on arch >= 6.  Is
> > that correct and useful?  On other architectures*, barrier() is enough
> > of a barrier, but it's conceivable that smp_mb() would have some
> > ARM-specific fairness or bus activity benefit - in which case it
> > should probably be mb().
> 
> See a discussion last year with Linus.  It's there to ensure that one CPU
> spinning on a variable can see a write by another CPU to that same
> variable.  Without the barrier, the visibility effects are unbounded on
> ARMv6 - and it's only like that for ARMv6, not >= ARMv6.

Ah, thanks.

It might be relevant to this thread; I'm not sure.

'git show 534be1d5' explains how it works: cpu_relax() flushes buffered
writes from _this_ CPU, so that other CPUs which are polling can make
progress, which avoids this CPU getting stuck if there is an indirect
dependency (no matter how convoluted) between what it's polling and which
it wrote just before.

So cpu_relax() is *essential* in some polling loops, not a hint.

In principle that could happen for I/O polling, if (a) buffered memory
writes are delayed by I/O read transactions, and (b) the device state we're
waiting on depends on I/O yet to be done on another CPU, which could be
polling memory first (e.g. a spinlock).

I doubt (a) in practice - but what about buses that block during I/O read?
(I have a chip like that here, but it's ARMv4T.)

If so, readl() doesn't have a barrier that addresses the issue.

(b) is conceivable, and if the memory is a spinlock, spin_unlock() does
_not_ deal with this on ARMv6, as it has no barrier after the write.
('git show c5113b61' doesn't really explain why.)

It's convoluted and unlikely, but (imho) suggests cpu_relax() is good
practice in polling loops, even if not needed.  It doesn't cost anything.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Russell King - ARM Linux Jan. 5, 2011, 8:15 p.m. UTC | #7

On Wed, Jan 05, 2011 at 07:44:18PM +0000, Jamie Lokier wrote:
> 'git show 534be1d5' explains how it works: cpu_relax() flushes buffered
> writes from _this_ CPU, so that other CPUs which are polling can make
> progress, which avoids this CPU getting stuck if there is an indirect
> dependency (no matter how convoluted) between what it's polling and which
> it wrote just before.
> 
> So cpu_relax() is *essential* in some polling loops, not a hint.
> 
> In principle that could happen for I/O polling, if (a) buffered memory
> writes are delayed by I/O read transactions, and (b) the device state we're
> waiting on depends on I/O yet to be done on another CPU, which could be
> polling memory first (e.g. a spinlock).
> 
> I doubt (a) in practice - but what about buses that block during I/O read?
> (I have a chip like that here, but it's ARMv4T.)

Let's be clear - ARMv5 and below generally are well ordered architectures
within the limits of caching.  There are cases where the write buffer
allows two writes to pass each other.  However, for IO we generally map
these - especially for ARMv4 and below - as 'uncacheable unbufferable'.
So on these, if the program says "read this location" the pipeline will
stall until the read has been issued - and if you use the result in the
next instruction, it will stall until the data is available.  So really,
it's not a problem here.

ARMv6 and above have a weakly ordered memory model with speculative
prefetching, so memory reads/writes can be completely unordered.  Device
accesses can pass memory accesses, but device accesses are always visible
in program order with respect to each other.

So, if you're spinning in a loop reading an IO device, all previous IO
accesses will be completed (in all ARM architectures) before the result
of your read is evaluated.

(But, let's make you squirm some more - mb() on ARMv6 and above may
equate to a CPU memory barrier _plus_ a few IO accesses to the external
L2 cache controller - which will be ordered wrt other IO accesses of
course.)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jamie Lokier Jan. 6, 2011, 12:50 a.m. UTC | #8

Russell King - ARM Linux wrote:
> On Wed, Jan 05, 2011 at 07:44:18PM +0000, Jamie Lokier wrote:
> > 'git show 534be1d5' explains how it works: cpu_relax() flushes buffered
> > writes from _this_ CPU, so that other CPUs which are polling can make
> > progress, which avoids this CPU getting stuck if there is an indirect
> > dependency (no matter how convoluted) between what it's polling and which
> > it wrote just before.
> > 
> > So cpu_relax() is *essential* in some polling loops, not a hint.
> > 
> > In principle that could happen for I/O polling, if (a) buffered memory
> > writes are delayed by I/O read transactions, and (b) the device state we're
> > waiting on depends on I/O yet to be done on another CPU, which could be
> > polling memory first (e.g. a spinlock).
> > 
> > I doubt (a) in practice - but what about buses that block during I/O read?
> > (I have a chip like that here, but it's ARMv4T.)
> 
> Let's be clear - ARMv5 and below generally are well ordered architectures
> within the limits of caching.  There are cases where the write buffer
> allows two writes to pass each other.  However, for IO we generally map
> these - especially for ARMv4 and below - as 'uncacheable unbufferable'.
> So on these, if the program says "read this location" the pipeline will
> stall until the read has been issued - and if you use the result in the
> next instruction, it will stall until the data is available.  So really,
> it's not a problem here.
> 
> ARMv6 and above have a weakly ordered memory model with speculative
> prefetching, so memory reads/writes can be completely unordered.  Device
> accesses can pass memory accesses, but device accesses are always visible
> in program order with respect to each other.
> 
> So, if you're spinning in a loop reading an IO device, all previous IO
> accesses will be completed (in all ARM architectures) before the result
> of your read is evaluated.

No, that wasn't the scenario - it was:

You're spinning reading an IO device, whose state depends indirectly
on a *CPU memory* write that is forever buffered.

(Go and re-read 'git show 534be1d5' if you haven't already.)

The indirect dependence is that another CPU needs to see that write
before it can tell the device to change state in whatever way the
first CPU is polling for.

It's probably clearer in code:

CPU #1

    spin_lock(&mydev->lock);
    /* Look at state. */
    spin_unlock(&mydev->lock);       <-- THIS MEMORY WRITE BUFFERED FOREVER

    /* We expect this to be quick enough that polling is cool. */
    while (readl(mydev->reg_status) & MYDEV_STATUS_BUSY) {
        /* If only we had cpu_relax() */
    }

CPU #2

    spin_lock(&mydev->lock);         <-- STUCK HERE
    /* Look at state. */
    spin_unlock(&mydev->lock);

    writel(MYDEV_TRIGGER, mydev->reg_go);   /* Device is BUSY until this. */

The deadlock in this code (might) happen when CPU #2 is waiting for
the spinlock, and CPU #1's memory write remains in its write buffer
during CPU #1's polling loop.

If that can happen, it's fixed by adding cpu_relax() - to generic
driver code with polling loops.

It can only happen if any CPUs (i.e. ARMv6) that buffer writes due to
prioritising continuous memory reads also have that effect for
continuous IO reads.  This might even apply to non-ARM archs with
non-trivial cpu_relax() definitions; I don't know as they don't always
explain why.

The above driver style isn't particularly obvious, but there are a lot
of drivers with almost every conceivable access pattern.  If you use
your imagination, especially if the second code is an interrupt
handler, it's plausible.  Even though this example would be better
sleeping and waiting normally - there's nothing inherently forbidden
about the above pattern (except that cpu_relax() is needed).

> (But, let's make you squirm some more - mb() on ARMv6 and above may
> equate to a CPU memory barrier _plus_ a few IO accesses to the external
> L2 cache controller - which will be ordered wrt other IO accesses of
> course.)

I squirm at all modern ARM architectures.  Omit the slightest highly
version-specific thing, or run a kernel built with slightly wrong
config options, and it's fine except for random, very rare memory or
I/O corruption.  The workarounds and special bits seem to get more and
more convoluted with each version.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shawn Guo Jan. 6, 2011, 1:45 a.m. UTC | #9

On Wed, Jan 05, 2011 at 05:56:17PM +0000, Jamie Lokier wrote:
> Jamie Iles wrote:
> > On Wed, Jan 05, 2011 at 05:44:09PM +0100, Uwe Kleine-König wrote:
> > > Hello Jamie,
> > > On Wed, Jan 05, 2011 at 04:16:46PM +0000, Jamie Iles wrote:
> > > > On Wed, Jan 05, 2011 at 10:07:35PM +0800, Shawn Guo wrote:
> > > > > +	/* check both BUSY and ERROR cleared */
> > > > > +	while ((__raw_readl(ocotp_base) &
> > > > > +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> > > > > +		/* nothing */;
> > > > 
> > > > Is it worth using cpu_relax() in these polling loops?
> > > I don't know what cpu_relax does for other platforms, but on ARM it's
> > > just a memory barrier which AFAICT doesn't help here at all (which
> > > doesn't need to be correct).  Why do you think it would be better?
> > 
> > Well I don't see that there's anything broken without cpu_relax() but 
> > using cpu_relax() seems to be the most common way of doing busy polling 
> > loops that I've seen. It's also a bit easier to read than a comment and 
> > semi-colon. Perhaps it's just personal preference.
> 
> cpu_relax() is a hint to the CPU to, for example, save power or be
> less aggressive on the memory bus (to save power or be fairer).
> 
> Currently these architectures do more than just a barrier in cpu_relax():
> x86, IA64, PowerPC, Tile and S390.
> 
> Although it's just a hint on ARM at the moment, it might change in
> future - especially with power mattering on so many ARM systems.
> (Even now, just changing it to a very short udelay might save power
> on existing ARMs without breaking drivers.)
> 
Sounds reasonable.  I would take the suggestion.  Thanks, both Jamie.

Russell King - ARM Linux Jan. 6, 2011, 9:13 a.m. UTC | #10

On Thu, Jan 06, 2011 at 12:50:52AM +0000, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > On Wed, Jan 05, 2011 at 07:44:18PM +0000, Jamie Lokier wrote:
> > > 'git show 534be1d5' explains how it works: cpu_relax() flushes buffered
> > > writes from _this_ CPU, so that other CPUs which are polling can make
> > > progress, which avoids this CPU getting stuck if there is an indirect
> > > dependency (no matter how convoluted) between what it's polling and which
> > > it wrote just before.
> > > 
> > > So cpu_relax() is *essential* in some polling loops, not a hint.
> > > 
> > > In principle that could happen for I/O polling, if (a) buffered memory
> > > writes are delayed by I/O read transactions, and (b) the device state we're
> > > waiting on depends on I/O yet to be done on another CPU, which could be
> > > polling memory first (e.g. a spinlock).
> > > 
> > > I doubt (a) in practice - but what about buses that block during I/O read?
> > > (I have a chip like that here, but it's ARMv4T.)
> > 
> > Let's be clear - ARMv5 and below generally are well ordered architectures
> > within the limits of caching.  There are cases where the write buffer
> > allows two writes to pass each other.  However, for IO we generally map
> > these - especially for ARMv4 and below - as 'uncacheable unbufferable'.
> > So on these, if the program says "read this location" the pipeline will
> > stall until the read has been issued - and if you use the result in the
> > next instruction, it will stall until the data is available.  So really,
> > it's not a problem here.
> > 
> > ARMv6 and above have a weakly ordered memory model with speculative
> > prefetching, so memory reads/writes can be completely unordered.  Device
> > accesses can pass memory accesses, but device accesses are always visible
> > in program order with respect to each other.
> > 
> > So, if you're spinning in a loop reading an IO device, all previous IO
> > accesses will be completed (in all ARM architectures) before the result
> > of your read is evaluated.
> 
> No, that wasn't the scenario - it was:
> 
> You're spinning reading an IO device, whose state depends indirectly
> on a *CPU memory* write that is forever buffered.
> 
> (Go and re-read 'git show 534be1d5' if you haven't already.)

I know what that's about, and it's about memory based accesses _only_.

What you're talking about is a programming error.  Such errors cause
data corruption if you're talking about DMA stuff.

At the moment, the solution to that is to put whatever's necessary into
readl/writel to ensure that they behave as ordered operations with
respect to everything else.  You'll find that on ARM, writel has a
barrier before it to ensure memory writes are visible before the device
write, and on readl there's a barrier to ensure that no memory read can
happen before the IO device read.

cpu_relax() has nothing to do with ensuring ordering with devices.

With relaxed IO operations, the responsibility for ensuring proper ordering
between memory and IO falls to the programmer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[v3,08/10] ARM: mxs: add ocotp read function

Commit Message

Comments

Patch