Message ID | 20200218172552.215077-1-brianvv@google.com |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [v2,bpf] bpf: Do not grab the bucket spinlock by default on htab batch ops | expand |
On 2/18/20 9:25 AM, Brian Vazquez wrote: > Grabbing the spinlock for every bucket even if it's empty, was causing > significant perfomance cost when traversing htab maps that have only a > few entries. This patch addresses the issue by checking first the > bucket_cnt, if the bucket has some entries then we go and grab the > spinlock and proceed with the batching. > > Tested with a htab of size 50K and different value of populated entries. > > Before: > Benchmark Time(ns) CPU(ns) > --------------------------------------------- > BM_DumpHashMap/1 2759655 2752033 > BM_DumpHashMap/10 2933722 2930825 > BM_DumpHashMap/200 3171680 3170265 > BM_DumpHashMap/500 3639607 3635511 > BM_DumpHashMap/1000 4369008 4364981 > BM_DumpHashMap/5k 11171919 11134028 > BM_DumpHashMap/20k 69150080 69033496 > BM_DumpHashMap/39k 190501036 190226162 > > After: > Benchmark Time(ns) CPU(ns) > --------------------------------------------- > BM_DumpHashMap/1 202707 200109 > BM_DumpHashMap/10 213441 210569 > BM_DumpHashMap/200 478641 472350 > BM_DumpHashMap/500 980061 967102 > BM_DumpHashMap/1000 1863835 1839575 > BM_DumpHashMap/5k 8961836 8902540 > BM_DumpHashMap/20k 69761497 69322756 > BM_DumpHashMap/39k 187437830 186551111 > > Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") > Cc: Yonghong Song <yhs@fb.com> > Signed-off-by: Brian Vazquez <brianvv@google.com> Thanks for the fix. Acked-by: Yonghong Song <yhs@fb.com> > --- > v1 -> v2: Skip hlist_nulls_for_each_entry_safe if lock is not held > > kernel/bpf/hashtab.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-)
On Tue, Feb 18, 2020 at 1:23 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 2/18/20 9:25 AM, Brian Vazquez wrote: > > Grabbing the spinlock for every bucket even if it's empty, was causing > > significant perfomance cost when traversing htab maps that have only a > > few entries. This patch addresses the issue by checking first the > > bucket_cnt, if the bucket has some entries then we go and grab the > > spinlock and proceed with the batching. > > > > Tested with a htab of size 50K and different value of populated entries. > > > > Before: > > Benchmark Time(ns) CPU(ns) > > --------------------------------------------- > > BM_DumpHashMap/1 2759655 2752033 > > BM_DumpHashMap/10 2933722 2930825 > > BM_DumpHashMap/200 3171680 3170265 > > BM_DumpHashMap/500 3639607 3635511 > > BM_DumpHashMap/1000 4369008 4364981 > > BM_DumpHashMap/5k 11171919 11134028 > > BM_DumpHashMap/20k 69150080 69033496 > > BM_DumpHashMap/39k 190501036 190226162 > > > > After: > > Benchmark Time(ns) CPU(ns) > > --------------------------------------------- > > BM_DumpHashMap/1 202707 200109 > > BM_DumpHashMap/10 213441 210569 > > BM_DumpHashMap/200 478641 472350 > > BM_DumpHashMap/500 980061 967102 > > BM_DumpHashMap/1000 1863835 1839575 > > BM_DumpHashMap/5k 8961836 8902540 > > BM_DumpHashMap/20k 69761497 69322756 > > BM_DumpHashMap/39k 187437830 186551111 > > > > Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") > > Cc: Yonghong Song <yhs@fb.com> > > Signed-off-by: Brian Vazquez <brianvv@google.com> > > Thanks for the fix. > Acked-by: Yonghong Song <yhs@fb.com> Applied. Thanks
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 2d182c4ee9d99..ea3bf04a0a7b6 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1260,6 +1260,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, struct hlist_nulls_head *head; struct hlist_nulls_node *n; unsigned long flags; + bool locked = false; struct htab_elem *l; struct bucket *b; int ret = 0; @@ -1319,15 +1320,25 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, dst_val = values; b = &htab->buckets[batch]; head = &b->head; - raw_spin_lock_irqsave(&b->lock, flags); + /* do not grab the lock unless need it (bucket_cnt > 0). */ + if (locked) + raw_spin_lock_irqsave(&b->lock, flags); bucket_cnt = 0; hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) bucket_cnt++; + if (bucket_cnt && !locked) { + locked = true; + goto again_nocopy; + } + if (bucket_cnt > (max_count - total)) { if (total == 0) ret = -ENOSPC; + /* Note that since bucket_cnt > 0 here, it is implicit + * that the locked was grabbed, so release it. + */ raw_spin_unlock_irqrestore(&b->lock, flags); rcu_read_unlock(); this_cpu_dec(bpf_prog_active); @@ -1337,6 +1348,9 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, if (bucket_cnt > bucket_size) { bucket_size = bucket_cnt; + /* Note that since bucket_cnt > 0 here, it is implicit + * that the locked was grabbed, so release it. + */ raw_spin_unlock_irqrestore(&b->lock, flags); rcu_read_unlock(); this_cpu_dec(bpf_prog_active); @@ -1346,6 +1360,10 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, goto alloc; } + /* Next block is only safe to run if you have grabbed the lock */ + if (!locked) + goto next_batch; + hlist_nulls_for_each_entry_safe(l, n, head, hash_node) { memcpy(dst_key, l->key, key_size); @@ -1380,6 +1398,8 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, } raw_spin_unlock_irqrestore(&b->lock, flags); + locked = false; +next_batch: /* If we are not copying data, we can go to next bucket and avoid * unlocking the rcu. */
Grabbing the spinlock for every bucket even if it's empty, was causing significant perfomance cost when traversing htab maps that have only a few entries. This patch addresses the issue by checking first the bucket_cnt, if the bucket has some entries then we go and grab the spinlock and proceed with the batching. Tested with a htab of size 50K and different value of populated entries. Before: Benchmark Time(ns) CPU(ns) --------------------------------------------- BM_DumpHashMap/1 2759655 2752033 BM_DumpHashMap/10 2933722 2930825 BM_DumpHashMap/200 3171680 3170265 BM_DumpHashMap/500 3639607 3635511 BM_DumpHashMap/1000 4369008 4364981 BM_DumpHashMap/5k 11171919 11134028 BM_DumpHashMap/20k 69150080 69033496 BM_DumpHashMap/39k 190501036 190226162 After: Benchmark Time(ns) CPU(ns) --------------------------------------------- BM_DumpHashMap/1 202707 200109 BM_DumpHashMap/10 213441 210569 BM_DumpHashMap/200 478641 472350 BM_DumpHashMap/500 980061 967102 BM_DumpHashMap/1000 1863835 1839575 BM_DumpHashMap/5k 8961836 8902540 BM_DumpHashMap/20k 69761497 69322756 BM_DumpHashMap/39k 187437830 186551111 Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map") Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> --- v1 -> v2: Skip hlist_nulls_for_each_entry_safe if lock is not held kernel/bpf/hashtab.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)