Message ID | alpine.DEB.2.00.1111221052130.28197@router.home |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Le mardi 22 novembre 2011 à 10:53 -0600, Christoph Lameter a écrit : > A bit heavy handed locking but this should do the trick. > > Subject: slub: Lockout validation scans during freeing of object > > Slab validation can run right now while the slab free paths prepare > the redzone fields etc around the objects in preparation of the > actual freeing of the object. This can lead to false positives. > > Take the node lock unconditionally during free so that the validation > can examine objects without them being disturbed by freeing operations. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > --- > mm/slub.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > Index: linux-2.6/mm/slub.c > =================================================================== > --- linux-2.6.orig/mm/slub.c 2011-11-22 10:42:19.000000000 -0600 > +++ linux-2.6/mm/slub.c 2011-11-22 10:44:34.000000000 -0600 > @@ -2391,8 +2391,15 @@ static void __slab_free(struct kmem_cach > > stat(s, FREE_SLOWPATH); > > - if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr)) > - return; > + if (kmem_cache_debug(s)) { > + > + /* Lock out any concurrent validate_slab calls */ > + n = get_node(s, page_to_nid(page)); > + spin_lock_irqsave(&n->list_lock, flags); > + > + if (!free_debug_processing(s, page, x, addr)) > + goto out; > + } > > do { > prior = page->freelist; > @@ -2471,6 +2478,7 @@ static void __slab_free(struct kmem_cach > stat(s, FREE_ADD_PARTIAL); > } > } > +out: > spin_unlock_irqrestore(&n->list_lock, flags); > return; > This seems better, but I still have some warnings : [ 162.117574] SLUB: selinux_inode_security 136 slabs counted but counter=137 [ 179.879907] SLUB: task_xstate 1 slabs counted but counter=2 [ 179.881745] SLUB: vm_area_struct 47 slabs counted but counter=48 [ 180.381964] SLUB: kmalloc-64 46 slabs counted but counter=47 [ 192.366437] SLUB: vm_area_struct 82 slabs counted but counter=83 [ 195.016732] SLUB: names_cache 3 slabs counted but counter=4 [ 196.073166] SLUB: dentry 623 slabs counted but counter=624 [ 196.093857] SLUB: names_cache 3 slabs counted but counter=4 [ 196.631420] SLUB: names_cache 5 slabs counted but counter=6 [ 198.760180] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 198.773492] SLUB: names_cache 5 slabs counted but counter=6 [ 198.783637] SLUB: selinux_inode_security 403 slabs counted but counter=404 [ 201.717932] SLUB: filp 53 slabs counted but counter=54 [ 202.984756] SLUB: filp 40 slabs counted but counter=41 [ 203.819525] SLUB: task_xstate 3 slabs counted but counter=4 [ 205.699916] SLUB: cfq_io_context 1 slabs counted but counter=2 [ 206.526646] SLUB: skbuff_head_cache 4 slabs counted but counter=5 [ 208.431951] SLUB: names_cache 3 slabs counted but counter=4 [ 210.672056] SLUB: vm_area_struct 88 slabs counted but counter=89 [ 213.160055] SLUB: vm_area_struct 94 slabs counted but counter=95 [ 215.604856] SLUB: cfq_queue 1 slabs counted but counter=2 [ 217.872494] SLUB: filp 56 slabs counted but counter=57 [ 220.184599] SLUB: names_cache 3 slabs counted but counter=4 [ 221.783732] SLUB: anon_vma_chain 53 slabs counted but counter=54 [ 221.816662] SLUB: kmalloc-16 66 slabs counted but counter=67 [ 221.828582] SLUB: names_cache 3 slabs counted but counter=4 [ 221.848231] SLUB: vm_area_struct 99 slabs counted but counter=100 [ 224.125411] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 224.126313] SLUB: kmalloc-16 67 slabs counted but counter=68 [ 224.138768] SLUB: names_cache 3 slabs counted but counter=4 [ 224.921409] SLUB: anon_vma 36 slabs counted but counter=37 [ 224.927833] SLUB: buffer_head 294 slabs counted but counter=295 [ 226.473891] SLUB: kmalloc-16 67 slabs counted but counter=68 [ 228.801716] SLUB: names_cache 5 slabs counted but counter=6 [ 229.610225] SLUB: filp 47 slabs counted but counter=48 [ 232.050811] SLUB: filp 53 slabs counted but counter=54 [ 235.835888] SLUB: names_cache 3 slabs counted but counter=4 [ 236.625318] SLUB: filp 48 slabs counted but counter=49 [ 236.634563] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 236.635667] SLUB: kmalloc-16 67 slabs counted but counter=68 [ 237.500016] SLUB: radix_tree_node 100 slabs counted but counter=101 [ 238.248677] SLUB: filp 48 slabs counted but counter=49 [ 239.097674] SLUB: filp 49 slabs counted but counter=50 [ 239.975020] SLUB: names_cache 3 slabs counted but counter=4 [ 241.569766] SLUB: vm_area_struct 102 slabs counted but counter=103 [ 242.388502] SLUB: names_cache 5 slabs counted but counter=6 [ 243.152519] SLUB: anon_vma_chain 56 slabs counted but counter=57 [ 245.661970] SLUB: filp 49 slabs counted but counter=50 [ 247.298004] SLUB: filp 48 slabs counted but counter=50 [ 248.851148] SLUB: journal_handle 3 slabs counted but counter=4 [ 249.674320] SLUB: names_cache 3 slabs counted but counter=4 [ 250.414476] SLUB: bio-0 24 slabs counted but counter=25 [ 250.461655] SLUB: kmalloc-96 49 slabs counted but counter=50 [ 250.477188] SLUB: sgpool-16 0 slabs counted but counter=1 [ 251.298554] SLUB: kmalloc-32 9 slabs counted but counter=10 [ 252.096119] SLUB: names_cache 3 slabs counted but counter=4 [ 256.179892] SLUB: filp 58 slabs counted but counter=59 [ 256.188385] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 257.040508] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 258.704236] SLUB: buffer_head 502 slabs counted but counter=503 [ 258.745777] SLUB: kmalloc-32 9 slabs counted but counter=10 [ 258.752285] SLUB: names_cache 3 slabs counted but counter=4 [ 260.412312] SLUB: filp 54 slabs counted but counter=56 [ 261.213526] SLUB: filp 44 slabs counted but counter=45 [ 262.846810] SLUB: kmalloc-16 31 slabs counted but counter=32 [ 262.859062] SLUB: names_cache 5 slabs counted but counter=6 [ 262.881728] SLUB: task_xstate 3 slabs counted but counter=4 [ 263.672055] SLUB: filp 54 slabs counted but counter=55 [ 266.191191] SLUB: kmalloc-16 30 slabs counted but counter=31 [ 266.203799] SLUB: names_cache 5 slabs counted but counter=6 [ 268.486964] SLUB: filp 52 slabs counted but counter=53 [ 268.509446] SLUB: names_cache 5 slabs counted but counter=6 [ 269.365745] SLUB: vm_area_struct 88 slabs counted but counter=89 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 22 Nov 2011, Eric Dumazet wrote:
> This seems better, but I still have some warnings :
Trying to reproduce with a kernel configured to do preempt. This is
actually quite interesting since its always off by 1.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote: > On Tue, 22 Nov 2011, Eric Dumazet wrote: > > > This seems better, but I still have some warnings : > > Trying to reproduce with a kernel configured to do preempt. This is > actually quite interesting since its always off by 1. BTW there are some obvious overflows in the "slabinfo -l" output on my machine: Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg :t-0000024 680 24 16.3K 0/0/4 170 0 0 99 * :t-0000032 768 32 24.5K 0/0/6 128 0 0 100 * :t-0000040 1632 40 65.5K 13/0/3 102 0 0 99 * :t-0000048 4533 48 241.6K 18446744073709551583/9/92 85 0 15 90 * :t-0000064 3392 64 217.0K 30/0/23 64 0 0 100 * :t-0000072 4368 72 319.4K 73/0/5 56 0 0 98 * :t-0000112 720 112 81.9K 0/0/20 36 0 0 98 * :t-0000128 1684 128 233.4K 18446744073709551607/7/66 32 0 12 92 * :t-0000136 120 136 16.3K 0/0/4 30 0 0 99 * :t-0000144 8036 144 1.1M 283/0/4 28 0 0 98 * :t-0000192 4452 192 872.4K 165/0/48 21 0 0 97 * :t-0000216 90 216 20.4K 1/0/4 18 0 0 94 * :t-0000256 539 256 147.4K 18446744073709551609/3/43 16 0 8 93 * :t-0000320 2425 320 794.6K 93/0/4 25 1 0 97 *A :t-0000400 60 400 24.5K 0/0/3 20 1 0 97 * :t-0000512 640 512 327.6K 23/0/17 16 1 0 100 * :t-0000704 483 704 344.0K 0/0/21 23 2 0 98 *A :t-0001024 373 1024 475.1K 15/10/14 16 2 34 80 * :t-0002048 288 2048 589.8K 10/0/8 16 3 0 100 * :t-0004096 119 4096 524.2K 4/3/12 8 3 18 92 * Acpi-State 204 80 16.3K 0/0/4 51 0 0 99 anon_vma 3528 64 258.0K 9/0/54 56 0 0 87 bdev_cache 84 728 65.5K 0/0/4 21 2 0 93 Aa blkdev_queue 38 1664 65.5K 0/0/2 19 3 0 96 blkdev_requests 161 344 57.3K 0/0/7 23 1 0 96 buffer_head 1014 104 106.4K 5/0/21 39 0 0 99 a cfq_queue 102 232 24.5K 1/0/5 17 0 0 96 dentry 15897 192 3.1M 723/0/34 21 0 0 98 a idr_layer_cache 390 544 212.9K 8/0/5 30 2 0 99 inode_cache 6448 512 3.4M 199/0/9 31 2 0 96 a ip_fib_trie 219 56 12.2K 0/0/3 73 0 0 99 kmalloc-16 3072 16 49.1K 7/0/5 256 0 0 100 kmalloc-8 4608 8 36.8K 4/0/5 512 0 0 100 kmalloc-8192 20 8192 163.8K 1/0/4 4 3 0 100 kmalloc-96 840 96 81.9K 7/0/13 42 0 0 98 kmem_cache 21 192 4.0K 0/0/1 21 0 0 98 *A kmem_cache_node 192 64 12.2K 0/0/3 64 0 0 100 *A mm_struct 171 832 147.4K 0/0/9 19 2 0 96 A mqueue_inode_cache 19 800 16.3K 0/0/1 19 2 0 92 A nf_conntrack_ffffffff8199e380 60 264 16.3K 0/0/2 30 1 0 96 proc_inode_cache 420 576 245.7K 2/0/13 28 2 0 98 a radix_tree_node 3472 560 2.0M 120/0/4 28 2 0 95 a RAW 42 712 32.7K 0/0/2 21 2 0 91 A shmem_inode_cache 1036 576 606.2K 21/0/16 28 2 0 98 sighand_cache 210 2088 458.7K 3/0/11 15 3 0 95 A signal_cache 268 920 360.4K 18446744073709551614/7/24 17 2 31 68 A sigqueue 125 160 20.4K 0/0/5 25 0 0 97 skbuff_fclone_cache 72 420 32.7K 0/0/4 18 1 0 92 A sock_inode_cache 196 560 114.6K 0/0/7 28 2 0 95 Aa task_struct 215 1504 557.0K 9/9/8 21 3 52 58 TCP 21 1504 32.7K 0/0/1 21 3 0 96 A UDP 63 736 49.1K 0/0/3 21 2 0 94 A vm_area_struct 3278 168 602.1K 89/33/58 24 0 22 91 xfs_btree_cur 76 208 16.3K 0/0/4 19 0 0 96 xfs_buf_item 108 224 24.5K 0/0/6 18 0 0 98 xfs_da_state 64 488 32.7K 0/0/4 16 1 0 95 xfs_inode 7055 896 6.7M 411/0/4 17 2 0 92 Aa xfs_log_ticket 80 200 16.3K 0/0/4 20 0 0 97 xfs_trans 116 280 32.7K 0/0/4 29 1 0 99
On Tue, 22 Nov 2011, Markus Trippelsdorf wrote: > On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote: > > On Tue, 22 Nov 2011, Eric Dumazet wrote: > > > > > This seems better, but I still have some warnings : > > > > Trying to reproduce with a kernel configured to do preempt. This is > > actually quite interesting since its always off by 1. > > BTW there are some obvious overflows in the "slabinfo -l" output on my machine: Could you get me the value of the "slabs" field for the slabs showing the wierd values. I.e. do cat /sys/kernel/slab/signal_cache/slabs > signal_cache 268 920 360.4K 18446744073709551614/7/24 17 2 31 68 A -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011.11.22 at 13:20 -0600, Christoph Lameter wrote: > On Tue, 22 Nov 2011, Markus Trippelsdorf wrote: > > > On 2011.11.22 at 11:40 -0600, Christoph Lameter wrote: > > > On Tue, 22 Nov 2011, Eric Dumazet wrote: > > > > > > > This seems better, but I still have some warnings : > > > > > > Trying to reproduce with a kernel configured to do preempt. This is > > > actually quite interesting since its always off by 1. > > > > BTW there are some obvious overflows in the "slabinfo -l" output on my machine: > > Could you get me the value of the "slabs" field for the slabs showing the > wierd values. I.e. do > > cat /sys/kernel/slab/signal_cache/slabs > > > signal_cache 268 920 360.4K 18446744073709551614/7/24 17 2 31 68 A > It's quite easy to explain. You're using unsigned ints in: snprintf(dist_str, 40, "%lu/%lu/%d", s->slabs - s->cpu_slabs, s->partial, s->cpu_slabs); and (s->slabs - s->cpu_slabs) can get negative. For example: task_struct 269 1504 557.0K 18446744073709551601/5/32 21 3 29 72 Here s-slabs is 17 and s->cpu_slabs is 32. That gives: 17-32=18446744073709551601.
Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2011-11-22 10:42:19.000000000 -0600 +++ linux-2.6/mm/slub.c 2011-11-22 10:44:34.000000000 -0600 @@ -2391,8 +2391,15 @@ static void __slab_free(struct kmem_cach stat(s, FREE_SLOWPATH); - if (kmem_cache_debug(s) && !free_debug_processing(s, page, x, addr)) - return; + if (kmem_cache_debug(s)) { + + /* Lock out any concurrent validate_slab calls */ + n = get_node(s, page_to_nid(page)); + spin_lock_irqsave(&n->list_lock, flags); + + if (!free_debug_processing(s, page, x, addr)) + goto out; + } do { prior = page->freelist; @@ -2471,6 +2478,7 @@ static void __slab_free(struct kmem_cach stat(s, FREE_ADD_PARTIAL); } } +out: spin_unlock_irqrestore(&n->list_lock, flags); return;
A bit heavy handed locking but this should do the trick. Subject: slub: Lockout validation scans during freeing of object Slab validation can run right now while the slab free paths prepare the redzone fields etc around the objects in preparation of the actual freeing of the object. This can lead to false positives. Take the node lock unconditionally during free so that the validation can examine objects without them being disturbed by freeing operations. Signed-off-by: Christoph Lameter <cl@linux.com> --- mm/slub.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html