Message ID | CAMPhdO82ho2WPrJ3irwLgfPOycQt=kyqK3hhtrLCjOw1vptTrg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote: > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 08 September 2011, Dave Martin wrote: > >> The iwmmxt code contains some code to implement a pseudo-ISB, but > >> this is not buildable for Thumb-2. > >> > >> This patch replaces the pseudo-ISB with a real one for Thumb-2 > >> kernels. > >> > >> Signed-off-by: Dave Martin <dave.martin@linaro.org> > >> --- > >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ > >> 1 files changed, 9 insertions(+), 0 deletions(-) > > > > Acked-by: Arnd Bergmann <arnd@arndb.de> > > > > Maybe it'll be much simpler to have something like below: > > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S > index a087838..5998f7d 100644 > --- a/arch/arm/kernel/iwmmxt.S > +++ b/arch/arm/kernel/iwmmxt.S > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) > PJ4(eor r1, r1, #0xf) > PJ4(mcr p15, 0, r1, c1, c0, 2) > > - mrc p15, 0, r1, c2, c0, 0 > - sub pc, lr, r1, lsr #32 @ cpwait and return > + XSC(mrc p15, 0, r1, c2, c0, 0) > + PJ4(isb) > + mov pc, lr @ cpwait and return Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case? I had assumed this was to introduce a data dependency. I.e. PC depends on value of R1 which is loaded by the MRC, so it can't complete until that has.
On Thu, Sep 8, 2011 at 10:13 AM, Jon Medhurst (Tixy) <jon.medhurst@linaro.org> wrote: > On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote: >> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Thursday 08 September 2011, Dave Martin wrote: >> >> The iwmmxt code contains some code to implement a pseudo-ISB, but >> >> this is not buildable for Thumb-2. >> >> >> >> This patch replaces the pseudo-ISB with a real one for Thumb-2 >> >> kernels. >> >> >> >> Signed-off-by: Dave Martin <dave.martin@linaro.org> >> >> --- >> >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ >> >> 1 files changed, 9 insertions(+), 0 deletions(-) >> > >> > Acked-by: Arnd Bergmann <arnd@arndb.de> >> > >> >> Maybe it'll be much simpler to have something like below: >> >> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S >> index a087838..5998f7d 100644 >> --- a/arch/arm/kernel/iwmmxt.S >> +++ b/arch/arm/kernel/iwmmxt.S >> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) >> PJ4(eor r1, r1, #0xf) >> PJ4(mcr p15, 0, r1, c1, c0, 2) >> >> - mrc p15, 0, r1, c2, c0, 0 >> - sub pc, lr, r1, lsr #32 @ cpwait and return >> + XSC(mrc p15, 0, r1, c2, c0, 0) >> + PJ4(isb) >> + mov pc, lr @ cpwait and return > > Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case? > I had assumed this was to introduce a data dependency. I.e. PC depends > on value of R1 which is loaded by the MRC, so it can't complete until > that has. Indeed, that's actually a good point.
On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote: > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 08 September 2011, Dave Martin wrote: > >> The iwmmxt code contains some code to implement a pseudo-ISB, but > >> this is not buildable for Thumb-2. > >> > >> This patch replaces the pseudo-ISB with a real one for Thumb-2 > >> kernels. > >> > >> Signed-off-by: Dave Martin <dave.martin@linaro.org> > >> --- > >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ > >> 1 files changed, 9 insertions(+), 0 deletions(-) > > > > Acked-by: Arnd Bergmann <arnd@arndb.de> > > > > Maybe it'll be much simpler to have something like below: > > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S > index a087838..5998f7d 100644 > --- a/arch/arm/kernel/iwmmxt.S > +++ b/arch/arm/kernel/iwmmxt.S > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) > PJ4(eor r1, r1, #0xf) > PJ4(mcr p15, 0, r1, c1, c0, 2) > > - mrc p15, 0, r1, c2, c0, 0 > - sub pc, lr, r1, lsr #32 @ cpwait and return > + XSC(mrc p15, 0, r1, c2, c0, 0) > + PJ4(isb) > + mov pc, lr @ cpwait and return This won't allow the building of this code for a v7 ARM kernel with current tools. I think this is likely to be too much dirsuption: requiring really new tools for Thumb-2 is unavoidable, but we should allow people using existing tools to build this to carry on with the current situation for the time being. This is because of the fact that not all current tools can support "isb" and the iwmmxt mnemonics in the same file. One thing we could do is to code the ISB using .long, since we only need this for Thumb kernels. This would allow us to get rid of the Makefile warning, since the generated code would be the preferred v7 code in that case. Any comments on this? Cheers ---Dave
On Thu, 8 Sep 2011, Dave Martin wrote: > On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote: > > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thursday 08 September 2011, Dave Martin wrote: > > >> The iwmmxt code contains some code to implement a pseudo-ISB, but > > >> this is not buildable for Thumb-2. > > >> > > >> This patch replaces the pseudo-ISB with a real one for Thumb-2 > > >> kernels. > > >> > > >> Signed-off-by: Dave Martin <dave.martin@linaro.org> > > >> --- > > >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ > > >> 1 files changed, 9 insertions(+), 0 deletions(-) > > > > > > Acked-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > Maybe it'll be much simpler to have something like below: > > > > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S > > index a087838..5998f7d 100644 > > --- a/arch/arm/kernel/iwmmxt.S > > +++ b/arch/arm/kernel/iwmmxt.S > > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) > > PJ4(eor r1, r1, #0xf) > > PJ4(mcr p15, 0, r1, c1, c0, 2) > > > > - mrc p15, 0, r1, c2, c0, 0 > > - sub pc, lr, r1, lsr #32 @ cpwait and return > > + XSC(mrc p15, 0, r1, c2, c0, 0) > > + PJ4(isb) > > + mov pc, lr @ cpwait and return > > This won't allow the building of this code for a v7 ARM kernel with > current tools. > > I think this is likely to be too much dirsuption: requiring really > new tools for Thumb-2 is unavoidable, but we should allow people > using existing tools to build this to carry on with the current > situation for the time being. This is because of the fact that > not all current tools can support "isb" and the iwmmxt mnemonics > in the same file. > > One thing we could do is to code the ISB using .long, since we > only need this for Thumb kernels. You would have an ARM and a Thumb variant then, plus the legacy variant, right? > This would allow us to get rid of the Makefile warning, since the > generated code would be the preferred v7 code in that case. Makes sense. Nicolas
On Thu, 8 Sep 2011, Jon Medhurst (Tixy) wrote: > On Thu, 2011-09-08 at 10:03 -0700, Eric Miao wrote: > > On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thursday 08 September 2011, Dave Martin wrote: > > >> The iwmmxt code contains some code to implement a pseudo-ISB, but > > >> this is not buildable for Thumb-2. > > >> > > >> This patch replaces the pseudo-ISB with a real one for Thumb-2 > > >> kernels. > > >> > > >> Signed-off-by: Dave Martin <dave.martin@linaro.org> > > >> --- > > >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ > > >> 1 files changed, 9 insertions(+), 0 deletions(-) > > > > > > Acked-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > Maybe it'll be much simpler to have something like below: > > > > diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S > > index a087838..5998f7d 100644 > > --- a/arch/arm/kernel/iwmmxt.S > > +++ b/arch/arm/kernel/iwmmxt.S > > @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) > > PJ4(eor r1, r1, #0xf) > > PJ4(mcr p15, 0, r1, c1, c0, 2) > > > > - mrc p15, 0, r1, c2, c0, 0 > > - sub pc, lr, r1, lsr #32 @ cpwait and return > > + XSC(mrc p15, 0, r1, c2, c0, 0) > > + PJ4(isb) > > + mov pc, lr @ cpwait and return > > Don't we still need "sub pc, lr, r1, lsr #32" in the XSC case? > I had assumed this was to introduce a data dependency. I.e. PC depends > on value of R1 which is loaded by the MRC, so it can't complete until > that has. Yes, that's exactly the reason why this is coded that way. Nicolas
On Thu, Sep 8, 2011 at 10:20 AM, Dave Martin <dave.martin@linaro.org> wrote: > On Thu, Sep 08, 2011 at 10:03:52AM -0700, Eric Miao wrote: >> On Thu, Sep 8, 2011 at 9:45 AM, Arnd Bergmann <arnd@arndb.de> wrote: >> > On Thursday 08 September 2011, Dave Martin wrote: >> >> The iwmmxt code contains some code to implement a pseudo-ISB, but >> >> this is not buildable for Thumb-2. >> >> >> >> This patch replaces the pseudo-ISB with a real one for Thumb-2 >> >> kernels. >> >> >> >> Signed-off-by: Dave Martin <dave.martin@linaro.org> >> >> --- >> >> arch/arm/kernel/iwmmxt.S | 9 +++++++++ >> >> 1 files changed, 9 insertions(+), 0 deletions(-) >> > >> > Acked-by: Arnd Bergmann <arnd@arndb.de> >> > >> >> Maybe it'll be much simpler to have something like below: >> >> diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S >> index a087838..5998f7d 100644 >> --- a/arch/arm/kernel/iwmmxt.S >> +++ b/arch/arm/kernel/iwmmxt.S >> @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) >> PJ4(eor r1, r1, #0xf) >> PJ4(mcr p15, 0, r1, c1, c0, 2) >> >> - mrc p15, 0, r1, c2, c0, 0 >> - sub pc, lr, r1, lsr #32 @ cpwait and return >> + XSC(mrc p15, 0, r1, c2, c0, 0) >> + PJ4(isb) >> + mov pc, lr @ cpwait and return > > This won't allow the building of this code for a v7 ARM kernel with > current tools. Ah I missed that point. So the problem is really when compiling this file with existing toolchain, it's downgrading to v5 compatible mode, and the instruction below sub pc, lr, r1, lsr #32 wouldn't be encoded when building a THUMB2 kernel. Considering the r1, lsr #32 is actually to create an explicit data dependency of the previous co-processor instruction, would it be one option to rewrite this as something like: mov r1, r1 mov pc, lr
On Thu, 8 Sep 2011, Eric Miao wrote: > So the problem is really when compiling this file with existing toolchain, > it's downgrading to v5 compatible mode, and the instruction below > > sub pc, lr, r1, lsr #32 > > wouldn't be encoded when building a THUMB2 kernel. Considering the > r1, lsr #32 is actually to create an explicit data dependency of the previous > co-processor instruction, would it be one option to rewrite this as something > like: > > mov r1, r1 > mov pc, lr That is certainly a pretty simple, obvious and straight forward way to deal with this issue. I think I like this even more. Nicolas
On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > So the problem is really when compiling this file with existing toolchain, > it's downgrading to v5 compatible mode, and the instruction below > > sub pc, lr, r1, lsr #32 > > wouldn't be encoded when building a THUMB2 kernel. Considering the > r1, lsr #32 is actually to create an explicit data dependency of the previous > co-processor instruction, would it be one option to rewrite this as something > like: > > mov r1, r1 > mov pc, lr That doesn't include a data dependency of PC on R1, so it's possible for MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 has completed. We would want... add lr, lr, r1, lsr #32 mov pc, lr
On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > So the problem is really when compiling this file with existing toolchain, > > it's downgrading to v5 compatible mode, and the instruction below > > > > sub pc, lr, r1, lsr #32 > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > co-processor instruction, would it be one option to rewrite this as something > > like: > > > > mov r1, r1 > > mov pc, lr > > That doesn't include a data dependency of PC on R1, so it's possible for > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > has completed. We would want... > > add lr, lr, r1, lsr #32 > mov pc, lr But isn't the first insn unavailable with Thumb2? Maybe something like: sub r1, r1, r1 add pc, lr, r1 Nicolas
On Fri, 2011-09-09 at 09:21 -0400, Nicolas Pitre wrote: > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > So the problem is really when compiling this file with existing toolchain, > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > sub pc, lr, r1, lsr #32 > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > co-processor instruction, would it be one option to rewrite this as something > > > like: > > > > > > mov r1, r1 > > > mov pc, lr > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > has completed. We would want... > > > > add lr, lr, r1, lsr #32 > > mov pc, lr > > But isn't the first insn unavailable with Thumb2? It is available in Thumb2. > Maybe something like: > > sub r1, r1, r1 > add pc, lr, r1 That's not allowed in Thumb2. ;-) There aren't many instruction forms which are allowed to set PC.
On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote: > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > So the problem is really when compiling this file with existing toolchain, > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > sub pc, lr, r1, lsr #32 > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > co-processor instruction, would it be one option to rewrite this as something > > > like: > > > > > > mov r1, r1 > > > mov pc, lr > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > has completed. We would want... > > > > add lr, lr, r1, lsr #32 > > mov pc, lr > > But isn't the first insn unavailable with Thumb2? > > Maybe something like: > > sub r1, r1, r1 > add pc, lr, r1 Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB. So for the Thumb-2 case, this should always become isb Ideally, we should do this even for ARM v7 kernels, but in order not to break older tools which don't understand -march=armv7-a+iwmmxt we may need to use a .long to encode the ISB manually. I'll give that a try... Cheers ---Dave
On Fri, 9 Sep 2011, Dave Martin wrote: > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote: > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > > So the problem is really when compiling this file with existing toolchain, > > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > > > sub pc, lr, r1, lsr #32 > > > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > > co-processor instruction, would it be one option to rewrite this as something > > > > like: > > > > > > > > mov r1, r1 > > > > mov pc, lr > > > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > > has completed. We would want... > > > > > > add lr, lr, r1, lsr #32 > > > mov pc, lr > > > > But isn't the first insn unavailable with Thumb2? > > > > Maybe something like: > > > > sub r1, r1, r1 > > add pc, lr, r1 > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB. So for the > Thumb-2 case, this should always become > > isb Don't forget that here we are talking about the Marvell implementation. Since I've no doubt that the real isb certainly works, so far the advertized isb has always been implemented with a CP readback with a data dependency. Don't forget that this CPU core can also pretend to be an ARMv6 and they probably didn't duplicate the whole instruction decoder for each modes. So I really doubt the Marvell implementation would behave any differently if this special isb is expressed in terms of Thumb2 rather than ARM instructions as the backend is certainly common to all the modes. If we can use the same implementation for all modes then we'll just simplify maintenance of this code. Nicolas
On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote: > On Fri, 9 Sep 2011, Dave Martin wrote: > > > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote: > > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > > > So the problem is really when compiling this file with existing toolchain, > > > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > > > > > sub pc, lr, r1, lsr #32 > > > > > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > > > co-processor instruction, would it be one option to rewrite this as something > > > > > like: > > > > > > > > > > mov r1, r1 > > > > > mov pc, lr > > > > > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > > > has completed. We would want... > > > > > > > > add lr, lr, r1, lsr #32 > > > > mov pc, lr > > > > > > But isn't the first insn unavailable with Thumb2? > > > > > > Maybe something like: > > > > > > sub r1, r1, r1 > > > add pc, lr, r1 > > > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB. So for the > > Thumb-2 case, this should always become > > > > isb > > Don't forget that here we are talking about the Marvell implementation. > Since I've no doubt that the real isb certainly works, so far the > advertized isb has always been implemented with a CP readback with a > data dependency. Don't forget that this CPU core can also pretend to be > an ARMv6 and they probably didn't duplicate the whole instruction > decoder for each modes. If the isb instruction doesn't work for this in v7 mode, I think that would be an architecture violation, but you're right that this may not be available, or may not behave identically, when the CPU is behaving like a v6 part. > So I really doubt the Marvell implementation would behave any > differently if this special isb is expressed in terms of Thumb2 rather > than ARM instructions as the backend is certainly common to all the > modes. If we can use the same implementation for all modes then we'll > just simplify maintenance of this code. Agreed, but the trouble is that what constitutes a suitable data dependency is completely microarchitecture dependent -- we're relying on side-effects outside the architecture here. We can't code _exactly_ the same thing in Thumb-2 because of restrictions in the instruction set, so we have to settle for something that "looks similar" -- but there's no guarantee it will work. Do you feel your judgement on this is authoritative? If so, then we can go ahead with your suggestion; otherwise we might need input from someone else. Cheers ---Dave
On Mon, Sep 12, 2011 at 03:37:08PM +0100, Dave Martin wrote: > Agreed, but the trouble is that what constitutes a suitable data > dependency is completely microarchitecture dependent -- we're relying > on side-effects outside the architecture here. > > We can't code _exactly_ the same thing in Thumb-2 because of restrictions > in the instruction set, so we have to settle for something that "looks > similar" -- but there's no guarantee it will work. > > Do you feel your judgement on this is authoritative? If so, then > we can go ahead with your suggestion; otherwise we might need input > from someone else. How about we ask Haojian Zhuang what the recommended sequence is for ensuring that writes hit CP15 when executing in Thumb-2 mode on PJ4, instead of blindly beating around the issue without really knowing what we're doing?
On Mon, 12 Sep 2011, Dave Martin wrote: > On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote: > > On Fri, 9 Sep 2011, Dave Martin wrote: > > > > > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote: > > > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > > > > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > > > > So the problem is really when compiling this file with existing toolchain, > > > > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > > > > > > > sub pc, lr, r1, lsr #32 > > > > > > > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > > > > co-processor instruction, would it be one option to rewrite this as something > > > > > > like: > > > > > > > > > > > > mov r1, r1 > > > > > > mov pc, lr > > > > > > > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > > > > has completed. We would want... > > > > > > > > > > add lr, lr, r1, lsr #32 > > > > > mov pc, lr > > > > > > > > But isn't the first insn unavailable with Thumb2? > > > > > > > > Maybe something like: > > > > > > > > sub r1, r1, r1 > > > > add pc, lr, r1 > > > > > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB. So for the > > > Thumb-2 case, this should always become > > > > > > isb > > > > Don't forget that here we are talking about the Marvell implementation. > > Since I've no doubt that the real isb certainly works, so far the > > advertized isb has always been implemented with a CP readback with a > > data dependency. Don't forget that this CPU core can also pretend to be > > an ARMv6 and they probably didn't duplicate the whole instruction > > decoder for each modes. > > If the isb instruction doesn't work for this in v7 mode, I think that > would be an architecture violation, but you're right that this may > not be available, or may not behave identically, when the CPU is > behaving like a v6 part. I'm sure isb is available in ARMv7 mode. What I'm saying is that the advertised trick for ARMv6 mode certainly must work just as well here in ARMv7 mode. > > So I really doubt the Marvell implementation would behave any > > differently if this special isb is expressed in terms of Thumb2 rather > > than ARM instructions as the backend is certainly common to all the > > modes. If we can use the same implementation for all modes then we'll > > just simplify maintenance of this code. > > Agreed, but the trouble is that what constitutes a suitable data > dependency is completely microarchitecture dependent -- we're relying > on side-effects outside the architecture here. Sure. But what is there now is what is documented by Marvell docs i.e. create a data dependency on the register used to read back the CP value. I've seen a few implementation variations on that theme too. > We can't code _exactly_ the same thing in Thumb-2 because of restrictions > in the instruction set, so we have to settle for something that "looks > similar" -- but there's no guarantee it will work. > > Do you feel your judgement on this is authoritative? If so, then > we can go ahead with your suggestion; otherwise we might need input > from someone else. My suggestion would be what Tixy also proposed: mrc p15, 0, r1, c2, c0, 0 add lr, lr, r1, lsr #32 mov pc, lr Running this past people still at Marvell (I'm not anymore) wouldn't hurt of course. Nicolas
On Mon, Sep 12, 2011 at 10:55:30AM -0400, Nicolas Pitre wrote: > On Mon, 12 Sep 2011, Dave Martin wrote: > > > On Fri, Sep 09, 2011 at 01:51:30PM -0400, Nicolas Pitre wrote: > > > On Fri, 9 Sep 2011, Dave Martin wrote: > > > > > > > On Fri, Sep 09, 2011 at 09:21:28AM -0400, Nicolas Pitre wrote: > > > > > On Fri, 9 Sep 2011, Jon Medhurst (Tixy) wrote: > > > > > > > > > > > On Thu, 2011-09-08 at 11:49 -0700, Eric Miao wrote: > > > > > > > So the problem is really when compiling this file with existing toolchain, > > > > > > > it's downgrading to v5 compatible mode, and the instruction below > > > > > > > > > > > > > > sub pc, lr, r1, lsr #32 > > > > > > > > > > > > > > wouldn't be encoded when building a THUMB2 kernel. Considering the > > > > > > > r1, lsr #32 is actually to create an explicit data dependency of the previous > > > > > > > co-processor instruction, would it be one option to rewrite this as something > > > > > > > like: > > > > > > > > > > > > > > mov r1, r1 > > > > > > > mov pc, lr > > > > > > > > > > > > That doesn't include a data dependency of PC on R1, so it's possible for > > > > > > MOV PC, LR and subsequent instructions to be executed before MOV R1, R1 > > > > > > has completed. We would want... > > > > > > > > > > > > add lr, lr, r1, lsr #32 > > > > > > mov pc, lr > > > > > > > > > > But isn't the first insn unavailable with Thumb2? > > > > > > > > > > Maybe something like: > > > > > > > > > > sub r1, r1, r1 > > > > > add pc, lr, r1 > > > > > > > > Thumb-2 implies v7, the v7 implies that we have (and need) a real ISB. So for the > > > > Thumb-2 case, this should always become > > > > > > > > isb > > > > > > Don't forget that here we are talking about the Marvell implementation. > > > Since I've no doubt that the real isb certainly works, so far the > > > advertized isb has always been implemented with a CP readback with a > > > data dependency. Don't forget that this CPU core can also pretend to be > > > an ARMv6 and they probably didn't duplicate the whole instruction > > > decoder for each modes. > > > > If the isb instruction doesn't work for this in v7 mode, I think that > > would be an architecture violation, but you're right that this may > > not be available, or may not behave identically, when the CPU is > > behaving like a v6 part. > > I'm sure isb is available in ARMv7 mode. > > What I'm saying is that the advertised trick for ARMv6 mode certainly > must work just as well here in ARMv7 mode. > > > > So I really doubt the Marvell implementation would behave any > > > differently if this special isb is expressed in terms of Thumb2 rather > > > than ARM instructions as the backend is certainly common to all the > > > modes. If we can use the same implementation for all modes then we'll > > > just simplify maintenance of this code. > > > > Agreed, but the trouble is that what constitutes a suitable data > > dependency is completely microarchitecture dependent -- we're relying > > on side-effects outside the architecture here. > > Sure. But what is there now is what is documented by Marvell docs i.e. > create a data dependency on the register used to read back the CP value. > I've seen a few implementation variations on that theme too. > > > We can't code _exactly_ the same thing in Thumb-2 because of restrictions > > in the instruction set, so we have to settle for something that "looks > > similar" -- but there's no guarantee it will work. > > > > Do you feel your judgement on this is authoritative? If so, then > > we can go ahead with your suggestion; otherwise we might need input > > from someone else. > > My suggestion would be what Tixy also proposed: > > mrc p15, 0, r1, c2, c0, 0 > add lr, lr, r1, lsr #32 > mov pc, lr > > Running this past people still at Marvell (I'm not anymore) wouldn't > hurt of course. Sure, that might work, and if we have a sequence which works for all build configurations, that's great. But it needs to work for the right reasons, otherwise we may be storing up nasty surprises for later. As you/Russell suggest, I think Haojian or someone should comment. >From an architectural point of view, I don't think that sequence has any guaranteed barrier behaviour at all, although it will have a barrier effect on many implementations. As we get into v7-land, this kind of thing becomre more and more unsafe as new implementations push the architectural envelope. Haojian, can you comment on whether the proposed sequence will work in this case of synchronising CPACR updates? If we can use the same sequence for all v6/v7/ARM/Thumb-2 kernels, that would be preferable to #ifdefs. Cheers ---Dave
diff --git a/arch/arm/kernel/iwmmxt.S b/arch/arm/kernel/iwmmxt.S index a087838..5998f7d 100644 --- a/arch/arm/kernel/iwmmxt.S +++ b/arch/arm/kernel/iwmmxt.S @@ -319,8 +319,9 @@ ENTRY(iwmmxt_task_switch) PJ4(eor r1, r1, #0xf) PJ4(mcr p15, 0, r1, c1, c0, 2) - mrc p15, 0, r1, c2, c0, 0 - sub pc, lr, r1, lsr #32 @ cpwait and return + XSC(mrc p15, 0, r1, c2, c0, 0) + PJ4(isb) + mov pc, lr @ cpwait and return /* * Remove Concan ownership of given task