diff mbox

[10/10] ARC: [plat-eznps] Handle memory error as an exception

Message ID 1495679660-9598-11-git-send-email-noamca@mellanox.com
State New
Headers show

Commit Message

Noam Camus May 25, 2017, 2:34 a.m. UTC
From: Noam Camus <noamca@mellanox.com>

This commit adds the configuration CONFIG_EZNPS_MEM_ERROR.
If set, it will cause the kernel to handle user memory error
as a machine check exception.
It is required in order to align the NPS simulator memory
error handling to the one of the NPS400 real chip behavior.

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
---
 arch/arc/kernel/entry-compact.S |   11 +++++++++++
 arch/arc/plat-eznps/Kconfig     |   11 +++++++++++
 2 files changed, 22 insertions(+), 0 deletions(-)

Comments

Alexey Brodkin May 25, 2017, 11:14 a.m. UTC | #1
Hi Noam,

On Thu, 2017-05-25 at 05:34 +0300, Noam Camus wrote:
> From: Noam Camus <noamca@mellanox.com>

> 

> This commit adds the configuration CONFIG_EZNPS_MEM_ERROR.

> If set, it will cause the kernel to handle user memory error

> as a machine check exception.

> It is required in order to align the NPS simulator memory

> error handling to the one of the NPS400 real chip behavior.

> 

> Signed-off-by: Elad Kanfi <eladkan@mellanox.com>

> Signed-off-by: Noam Camus <noamca@mellanox.com>

> ---

>  arch/arc/kernel/entry-compact.S |   11 +++++++++++

>  arch/arc/plat-eznps/Kconfig     |   11 +++++++++++

>  2 files changed, 22 insertions(+), 0 deletions(-)

> 

> diff --git a/arch/arc/kernel/entry-compact.S b/arch/arc/kernel/entry-compact.S

> index f285dbb..d152d36 100644

> --- a/arch/arc/kernel/entry-compact.S

> +++ b/arch/arc/kernel/entry-compact.S

> @@ -203,6 +203,17 @@ END(handle_interrupt_level2)

>  ; ---------------------------------------------

>  ENTRY(mem_service)

>  

> +#if defined(CONFIG_EZNPS_MEM_ERROR)

> +        ; SW workaround to cover up on a difference between

> +        ; NPS real chip and simulator behaviors.

> +        ; NPS real chip will activate a machine check exception

> +        ; in case of memory error, while the simulator will

> +        ; trigger a level 2 interrupt. Therefor this code section

> +        ; should be reached only in simulation mode.

> +        ; DEAD END: display Regs and HALT


I'm not really buying that.

Why don't you just make simulator behaving exactly as your real chip?

Adding those stubs for some corner-cases here and there complicate code,
affect maintainability etc.

-Alexey
Noam Camus May 25, 2017, 11:26 a.m. UTC | #2
> From: Alexey Brodkin [mailto:Alexey.Brodkin@synopsys.com] 

> Sent: Thursday, May 25, 2017 14:15 PM


>> 

>> diff --git a/arch/arc/kernel/entry-compact.S 

>> b/arch/arc/kernel/entry-compact.S index f285dbb..d152d36 100644

>> --- a/arch/arc/kernel/entry-compact.S

>> +++ b/arch/arc/kernel/entry-compact.S

>> @@ -203,6 +203,17 @@ END(handle_interrupt_level2)

>>  ; ---------------------------------------------

>>  ENTRY(mem_service)

>>  

>> +#if defined(CONFIG_EZNPS_MEM_ERROR)

>> +        ; SW workaround to cover up on a difference between

>> +        ; NPS real chip and simulator behaviors.

>> +        ; NPS real chip will activate a machine check exception

>> +        ; in case of memory error, while the simulator will

>> +        ; trigger a level 2 interrupt. Therefor this code section

>> +        ; should be reached only in simulation mode.

>> +        ; DEAD END: display Regs and HALT


>I'm not really buying that.


>Why don't you just make simulator behaving exactly as your real chip?

I can't change simulator core behavior. nSIM is a Synopsys proprietary code.

>Adding those stubs for some corner-cases here and there complicate code, affect maintainability etc.

I agree, any suggestions to still have this but with reduced cost?

-Noam
Alexey Brodkin May 25, 2017, 11:30 a.m. UTC | #3
Hi Noam,

On Thu, 2017-05-25 at 11:26 +0000, Noam Camus wrote:
> > 

> > From: Alexey Brodkin [mailto:Alexey.Brodkin@synopsys.com] 

> > Sent: Thursday, May 25, 2017 14:15 PM

> 

> > 

> > > 

> > > 

> > > diff --git a/arch/arc/kernel/entry-compact.S 

> > > b/arch/arc/kernel/entry-compact.S index f285dbb..d152d36 100644

> > > --- a/arch/arc/kernel/entry-compact.S

> > > +++ b/arch/arc/kernel/entry-compact.S

> > > @@ -203,6 +203,17 @@ END(handle_interrupt_level2)

> > >  ; ---------------------------------------------

> > >  ENTRY(mem_service)

> > >  

> > > +#if defined(CONFIG_EZNPS_MEM_ERROR)

> > > +        ; SW workaround to cover up on a difference between

> > > +        ; NPS real chip and simulator behaviors.

> > > +        ; NPS real chip will activate a machine check exception

> > > +        ; in case of memory error, while the simulator will

> > > +        ; trigger a level 2 interrupt. Therefor this code section

> > > +        ; should be reached only in simulation mode.

> > > +        ; DEAD END: display Regs and HALT

> 

> > 

> > I'm not really buying that.

> 

> > 

> > Why don't you just make simulator behaving exactly as your real chip?

> I can't change simulator core behavior. nSIM is a Synopsys proprietary code.


Well probably it worth discussing with nSIM team if they may have any suggestions
on how to align nSIM behavior with your real HW?

-Alexey
Noam Camus May 25, 2017, 12:03 p.m. UTC | #4
>From: Alexey Brodkin [mailto:Alexey.Brodkin@synopsys.com] 
>Sent: Thursday, May 25, 2017 14:31 PM
...
>> > Why don't you just make simulator behaving exactly as your real chip?
>> I can't change simulator core behavior. nSIM is a Synopsys proprietary code.

>Well probably it worth discussing with nSIM team if they may have any suggestions on how to align nSIM behavior with your real HW?
We already talked with them, nothing we can do at this point.
What about turning mem_service into weak symbol and have my own platform copy (just like we do with res_service)?

-Noam
Alexey Brodkin May 25, 2017, 12:05 p.m. UTC | #5
Hi Noam,

On Thu, 2017-05-25 at 12:03 +0000, Noam Camus wrote:
> > 

> > From: Alexey Brodkin [mailto:Alexey.Brodkin@synopsys.com] 

> > Sent: Thursday, May 25, 2017 14:31 PM

> ...

> > 

> > > 

> > > > 

> > > > Why don't you just make simulator behaving exactly as your real chip?

> > > I can't change simulator core behavior. nSIM is a Synopsys proprietary code.

> 

> > 

> > Well probably it worth discussing with nSIM team if they may have any suggestions on how to align nSIM behavior with your real HW?

> We already talked with them, nothing we can do at this point.

> What about turning mem_service into weak symbol and have my own platform copy (just like we do with res_service)?


That looks nicer to me.
Let's wait for Vineet's opinion on that.

-Alexey
Vineet Gupta June 6, 2017, 9:57 p.m. UTC | #6
On 05/25/2017 04:30 AM, Alexey Brodkin wrote:
> Hi Noam,
> 
> On Thu, 2017-05-25 at 11:26 +0000, Noam Camus wrote:
>>> From: Alexey Brodkin [mailto:Alexey.Brodkin@synopsys.com]
>>> Sent: Thursday, May 25, 2017 14:15 PM
>>>>
>>>> diff --git a/arch/arc/kernel/entry-compact.S
>>>> b/arch/arc/kernel/entry-compact.S index f285dbb..d152d36 100644
>>>> --- a/arch/arc/kernel/entry-compact.S
>>>> +++ b/arch/arc/kernel/entry-compact.S
>>>> @@ -203,6 +203,17 @@ END(handle_interrupt_level2)
>>>>   ; ---------------------------------------------
>>>>   ENTRY(mem_service)
>>>>   
>>>> +#if defined(CONFIG_EZNPS_MEM_ERROR)
>>>> +        ; SW workaround to cover up on a difference between
>>>> +        ; NPS real chip and simulator behaviors.
>>>> +        ; NPS real chip will activate a machine check exception
>>>> +        ; in case of memory error, while the simulator will
>>>> +        ; trigger a level 2 interrupt. Therefor this code section
>>>> +        ; should be reached only in simulation mode.
>>>> +        ; DEAD END: display Regs and HALT
>>> I'm not really buying that.
>>> Why don't you just make simulator behaving exactly as your real chip?
>> I can't change simulator core behavior. nSIM is a Synopsys proprietary code.
> Well probably it worth discussing with nSIM team if they may have any suggestions
> on how to align nSIM behavior with your real HW?

For the record we can't change nSIM since the NPS behavioral is not aligned with 
stock ARC700.

stock ARC700 triggers an L2 interrupt for user space bus errors - weird but that 
is what it is and what kernel currently supports (as verified by IPPK folks when 
doing DDR controller testing from user space).

NPS triggers does machine check which is not correct hence this workaround !

-Vineet
diff mbox

Patch

diff --git a/arch/arc/kernel/entry-compact.S b/arch/arc/kernel/entry-compact.S
index f285dbb..d152d36 100644
--- a/arch/arc/kernel/entry-compact.S
+++ b/arch/arc/kernel/entry-compact.S
@@ -203,6 +203,17 @@  END(handle_interrupt_level2)
 ; ---------------------------------------------
 ENTRY(mem_service)
 
+#if defined(CONFIG_EZNPS_MEM_ERROR)
+        ; SW workaround to cover up on a difference between
+        ; NPS real chip and simulator behaviors.
+        ; NPS real chip will activate a machine check exception
+        ; in case of memory error, while the simulator will
+        ; trigger a level 2 interrupt. Therefor this code section
+        ; should be reached only in simulation mode.
+        ; DEAD END: display Regs and HALT
+
+	j EV_MachineCheck
+#endif
 	INTERRUPT_PROLOGUE 2
 
 	mov r0, ilink2
diff --git a/arch/arc/plat-eznps/Kconfig b/arch/arc/plat-eznps/Kconfig
index feaa471..c5f946c 100644
--- a/arch/arc/plat-eznps/Kconfig
+++ b/arch/arc/plat-eznps/Kconfig
@@ -32,3 +32,14 @@  config EZNPS_MTM_EXT
 	  any of them seem like CPU from Linux point of view.
 	  All threads within same core share the execution unit of the
 	  core and HW scheduler round robin between them.
+
+config EZNPS_MEM_ERROR
+       bool "ARC-EZchip Memory error as an exception"
+       depends on ARC_PLAT_EZNPS
+       default n
+       help
+         On the real chip of the NPS, user memory errors are handled
+         as a machine check exception, whereas on simulator platform
+         for NPS, it handled as an interrupt level 2 (like legacy arc
+         real chip architecture).This configuration will cause the kernel
+         to handle memory error as a machine check exception.