Message ID | 550F5852.5020405@oracle.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
From: David Ahern <david.ahern@oracle.com> Date: Sun, 22 Mar 2015 18:03:30 -0600 > On 3/22/15 5:54 PM, David Miller wrote: >>> I just put it on 4.0.0-rc4 and ditto -- problem goes away, so it >>> clearly suggests the memcpy or memmove are the root cause. >> >> Thanks, didn't notice that. >> >> So, something is amuck. > > to continue to refine the problem ... I modified only the memmove > lines (not the memcpy) and it works fine. So its the memmove. > > I'm sure this will get whitespaced damaged on the copy and paste but > to be clear this is the patch I am currently running and system is > stable. On Friday it failed on every single; with this patch I have > allyesconfig builds with -j 128 in a loop (clean in between) and > nothing -- no panics. Can you just try calling memcpy(), that should work because I think we agree that if the memcpy() implementation copies from low to high it should work. I wonder if the triggering factor is configuring for a high number of cpus. I always have NR_CPUS=128 since that's the largest machine I have. I'll give NR_CPUS=1024 a spin. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Nevermind I think I figured out the problem. It's the cache initializing stores, we can't do overlapping copies where dst <= src in all cases because of them. A store to a address modulo the cache line size (which for these instructions is 64 bytes), clears that whole line. But when we're doing these memmove() calls in SLAB/SLUB, we can clear some bytes at the end of the line before they've been read in. And reading over NG4memcpy, this _can_ happen, the main unrolled loop begins like this: load src + 0x00 load src + 0x08 load src + 0x10 load src + 0x18 load src + 0x20 store dst + 0x00 Assume dst is 64 byte aligned and let's say that dst is src - 8 for this memcpy() call, right? That store at the end there is the one to the first line in the cache line, thus clearing the whole line, which thus clobbers "src + 0x28" before it even gets loaded. I'm pretty sure this is what's happening. And it's only going to trigger if the memcpy() is 128 bytes or larger. I'll work on a fix. -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/slab.c b/mm/slab.c index c4b89ea..f5e9716 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -802,6 +802,16 @@ static inline void ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac, ac->entry[ac->avail++] = objp; } +static void move_entries(void *dest, void *src, int nr) +{ + unsigned long *dp = dest; + unsigned long *sp = src; + + for (; nr; nr--, dp++, sp++) + *dp = *sp; +} + + /* * Transfer objects in one arraycache to another.