Message ID | 20111201124645.24c6e54f@kryten (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Thu, 2011-12-01 at 12:46 +1100, Anton Blanchard wrote: > When issuing a system reset we almost always oops in the oops_to_nvram > code because multiple CPUs are using the deflate work area. Add a > spinlock to protect it. > > To play it safe I'm using trylock to avoid locking up if the NVRAM > code oopses. This means we might miss multiple CPUs oopsing at exactly > the same time but I think it's best to play it safe for now. Once we > are happy with the reliability we can change it to a full spinlock. How would we miss ? trylock does loop on stwcx. failure, it doesn't loop if the lock is -taken-, so if the lock is only used for actually dealing with the oops the only "miss" is because somebody already got it... or am I missing something ? Cheers, Ben.
Hi Ben, > How would we miss ? > > trylock does loop on stwcx. failure, it doesn't loop if the lock is > -taken-, so if the lock is only used for actually dealing with the > oops the only "miss" is because somebody already got it... or am I > missing something ? I'm thinking of two CPUs that enter at exactly the same time either through a system reset or an ugly bug (writing junk at 0x900 so the decrementer exception gets an oops). Probably unlikely enough that we don't care. Anton
On Thu, 2011-12-01 at 12:46 +1100, Anton Blanchard wrote: > When issuing a system reset we almost always oops in the oops_to_nvram > code because multiple CPUs are using the deflate work area. Add a > spinlock to protect it. > > To play it safe I'm using trylock to avoid locking up if the NVRAM > code oopses. This means we might miss multiple CPUs oopsing at exactly > the same time but I think it's best to play it safe for now. Once we > are happy with the reliability we can change it to a full spinlock. > > Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Jim Keniston <jkenisto@us.ibm.com> > --- > > Index: linux-build/arch/powerpc/platforms/pseries/nvram.c > =================================================================== > --- linux-build.orig/arch/powerpc/platforms/pseries/nvram.c 2011-12-01 09:44:27.205568463 +1100 > +++ linux-build/arch/powerpc/platforms/pseries/nvram.c 2011-12-01 12:36:49.334478156 +1100 > @@ -634,6 +634,8 @@ static void oops_to_nvram(struct kmsg_du > { > static unsigned int oops_count = 0; > static bool panicking = false; > + static DEFINE_SPINLOCK(lock); > + unsigned long flags; > size_t text_len; > unsigned int err_type = ERR_TYPE_KERNEL_PANIC_GZ; > int rc = -1; > @@ -664,6 +666,9 @@ static void oops_to_nvram(struct kmsg_du > if (clobbering_unread_rtas_event()) > return; > > + if (!spin_trylock_irqsave(&lock, flags)) > + return; > + > if (big_oops_buf) { > text_len = capture_last_msgs(old_msgs, old_len, > new_msgs, new_len, big_oops_buf, big_oops_buf_sz); > @@ -679,4 +684,6 @@ static void oops_to_nvram(struct kmsg_du > > (void) nvram_write_os_partition(&oops_log_partition, oops_buf, > (int) (sizeof(*oops_len) + *oops_len), err_type, ++oops_count); > + > + spin_unlock_irqrestore(&lock, flags); > } >
Index: linux-build/arch/powerpc/platforms/pseries/nvram.c =================================================================== --- linux-build.orig/arch/powerpc/platforms/pseries/nvram.c 2011-12-01 09:44:27.205568463 +1100 +++ linux-build/arch/powerpc/platforms/pseries/nvram.c 2011-12-01 12:36:49.334478156 +1100 @@ -634,6 +634,8 @@ static void oops_to_nvram(struct kmsg_du { static unsigned int oops_count = 0; static bool panicking = false; + static DEFINE_SPINLOCK(lock); + unsigned long flags; size_t text_len; unsigned int err_type = ERR_TYPE_KERNEL_PANIC_GZ; int rc = -1; @@ -664,6 +666,9 @@ static void oops_to_nvram(struct kmsg_du if (clobbering_unread_rtas_event()) return; + if (!spin_trylock_irqsave(&lock, flags)) + return; + if (big_oops_buf) { text_len = capture_last_msgs(old_msgs, old_len, new_msgs, new_len, big_oops_buf, big_oops_buf_sz); @@ -679,4 +684,6 @@ static void oops_to_nvram(struct kmsg_du (void) nvram_write_os_partition(&oops_log_partition, oops_buf, (int) (sizeof(*oops_len) + *oops_len), err_type, ++oops_count); + + spin_unlock_irqrestore(&lock, flags); }
When issuing a system reset we almost always oops in the oops_to_nvram code because multiple CPUs are using the deflate work area. Add a spinlock to protect it. To play it safe I'm using trylock to avoid locking up if the NVRAM code oopses. This means we might miss multiple CPUs oopsing at exactly the same time but I think it's best to play it safe for now. Once we are happy with the reliability we can change it to a full spinlock. Signed-off-by: Anton Blanchard <anton@samba.org> ---