From patchwork Wed Sep 16 22:46:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 1366259 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.a=rsa-sha256 header.s=facebook header.b=RMEM12Df; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BsjKY1WDDz9sRR for ; Fri, 18 Sep 2020 02:36:49 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728516AbgIQQgm (ORCPT ); Thu, 17 Sep 2020 12:36:42 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:64564 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728418AbgIQQfQ (ORCPT ); Thu, 17 Sep 2020 12:35:16 -0400 X-Greylist: delayed 2852 seconds by postgrey-1.27 at vger.kernel.org; Thu, 17 Sep 2020 12:35:16 EDT Received: from pps.filterd (m0042983.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 08HFkcwU007994 for ; Thu, 17 Sep 2020 08:46:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-type : content-transfer-encoding; s=facebook; bh=NBHVH/CzC9RE0f9KItLHPsUJVMCjI2w0gOkNagZilLo=; b=RMEM12DfFinEMt9oLOoQ27+fTdFa9bumVTj/wwWJzlguIpJk2rdpYjDlLJtMkHsb8mFQ gNFNghAR2zcCZP2Zsqwflfl8fK/63jk66Xn9vt7BLuXu6tRE0BreWmfQmR3tIfYmXTR4 vf53wbNui+oD+Vfs/UuvmWwtkXjMlZJiZ4k= Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-00082601.pphosted.com with ESMTP id 33m9wg8hpx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 17 Sep 2020 08:46:47 -0700 Received: from pps.reinject (m0042983.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 08HFk26L003857 for ; Thu, 17 Sep 2020 08:46:07 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com with ESMTP id 33k5nbpn9v-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 16 Sep 2020 15:46:53 -0700 Received: from intmgw004.03.ash8.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Wed, 16 Sep 2020 15:46:50 -0700 Received: by devbig003.ftw2.facebook.com (Postfix, from userid 128203) id 50CEF3700914; Wed, 16 Sep 2020 15:46:45 -0700 (PDT) From: Yonghong Song To: , CC: Alexei Starovoitov , Daniel Borkmann , , Martin KaFai Lau , Song Liu Subject: [PATCH bpf-next v4] bpf: using rcu_read_lock for bpf_sk_storage_map iterator Date: Wed, 16 Sep 2020 15:46:45 -0700 Message-ID: <20200916224645.720172-1-yhs@fb.com> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-16_13:2020-09-16,2020-09-16 signatures=0 X-FB-Internal: deliver X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235,18.0.687 definitions=2020-09-17_10:2020-09-16,2020-09-17 signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 spamscore=0 mlxlogscore=901 mlxscore=0 lowpriorityscore=0 adultscore=0 impostorscore=0 suspectscore=9 phishscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=2 engine=8.12.0-2006250000 definitions=main-2009170120 X-FB-Internal: deliver Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If a bucket contains a lot of sockets, during bpf_iter traversing a bucket, concurrent userspace bpf_map_update_elem() and bpf program bpf_sk_storage_{get,delete}() may experience some undesirable delays as they will compete with bpf_iter for bucket lock. Note that the number of buckets for bpf_sk_storage_map is roughly the same as the number of cpus. So if there are lots of sockets in the system, each bucket could contain lots of sockets. Different actual use cases may experience different delays. Here, using selftest bpf_iter subtest bpf_sk_storage_map, I hacked the kernel with ktime_get_mono_fast_ns() to collect the time when a bucket was locked during bpf_iter prog traversing that bucket. This way, the maximum incurred delay was measured w.r.t. the number of elements in a bucket. # elems in each bucket delay(ns) 64 17000 256 72512 2048 875246 The potential delays will be further increased if we have even more elemnts in a bucket. Using rcu_read_lock() is a reasonable compromise here. It may lose some precision, e.g., access stale sockets, but it will not hurt performance of bpf program or user space application which also tries to get/delete or update map elements. Cc: Martin KaFai Lau Acked-by: Song Liu Signed-off-by: Yonghong Song --- net/core/bpf_sk_storage.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) Changelog: v3 -> v4: - use rcu_dereference/hlist_next_rcu for hlist_entry_safe. (Martin) v2 -> v3: - fix a bug hlist_for_each_entry() => hlist_for_each_entry_rcu(). (Martin) - use rcu_dereference() instead of rcu_dereference_raw() for lockdep checking. (Martin) v1 -> v2: - added some performance number. (Song) - tried to silence some sparse complains. but still has some left like context imbalance in "..." - different lock contexts for basic block which the code is too hard for sparse to analyze. (Jakub) diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 4a86ea34f29e..6b6ba874061c 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -678,6 +678,7 @@ struct bpf_iter_seq_sk_storage_map_info { static struct bpf_local_storage_elem * bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, struct bpf_local_storage_elem *prev_selem) + __acquires(RCU) __releases(RCU) { struct bpf_local_storage *sk_storage; struct bpf_local_storage_elem *selem; @@ -696,16 +697,16 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, selem = prev_selem; count = 0; while (selem) { - selem = hlist_entry_safe(selem->map_node.next, + selem = hlist_entry_safe(rcu_dereference(hlist_next_rcu(&selem->map_node)), struct bpf_local_storage_elem, map_node); if (!selem) { /* not found, unlock and go to the next bucket */ b = &smap->buckets[bucket_id++]; - raw_spin_unlock_bh(&b->lock); + rcu_read_unlock(); skip_elems = 0; break; } - sk_storage = rcu_dereference_raw(selem->local_storage); + sk_storage = rcu_dereference(selem->local_storage); if (sk_storage) { info->skip_elems = skip_elems + count; return selem; @@ -715,10 +716,10 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, for (i = bucket_id; i < (1U << smap->bucket_log); i++) { b = &smap->buckets[i]; - raw_spin_lock_bh(&b->lock); + rcu_read_lock(); count = 0; - hlist_for_each_entry(selem, &b->list, map_node) { - sk_storage = rcu_dereference_raw(selem->local_storage); + hlist_for_each_entry_rcu(selem, &b->list, map_node) { + sk_storage = rcu_dereference(selem->local_storage); if (sk_storage && count >= skip_elems) { info->bucket_id = i; info->skip_elems = count; @@ -726,7 +727,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, } count++; } - raw_spin_unlock_bh(&b->lock); + rcu_read_unlock(); skip_elems = 0; } @@ -785,7 +786,7 @@ static int __bpf_sk_storage_map_seq_show(struct seq_file *seq, ctx.meta = &meta; ctx.map = info->map; if (selem) { - sk_storage = rcu_dereference_raw(selem->local_storage); + sk_storage = rcu_dereference(selem->local_storage); ctx.sk = sk_storage->owner; ctx.value = SDATA(selem)->data; } @@ -801,18 +802,12 @@ static int bpf_sk_storage_map_seq_show(struct seq_file *seq, void *v) } static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v) + __releases(RCU) { - struct bpf_iter_seq_sk_storage_map_info *info = seq->private; - struct bpf_local_storage_map *smap; - struct bpf_local_storage_map_bucket *b; - - if (!v) { + if (!v) (void)__bpf_sk_storage_map_seq_show(seq, v); - } else { - smap = (struct bpf_local_storage_map *)info->map; - b = &smap->buckets[info->bucket_id]; - raw_spin_unlock_bh(&b->lock); - } + else + rcu_read_unlock(); } static int bpf_iter_init_sk_storage_map(void *priv_data,