From patchwork Tue Dec 15 11:21:02 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 556927 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 6C72B1402D6 for ; Tue, 15 Dec 2015 22:22:42 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=xfOWs8pF; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965168AbbLOLVp (ORCPT ); Tue, 15 Dec 2015 06:21:45 -0500 Received: from mail-qg0-f45.google.com ([209.85.192.45]:32789 "EHLO mail-qg0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964921AbbLOLVn (ORCPT ); Tue, 15 Dec 2015 06:21:43 -0500 Received: by mail-qg0-f45.google.com with SMTP id v16so3702611qge.0; Tue, 15 Dec 2015 03:21:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SXDi/RE2zXW85U8X+VLk1nSvu186fO+ALuiVZeEXkFs=; b=xfOWs8pFVi8SThF+6yg20/ZOzI5M5A5zIPWEoMOp+SMmHQGqYv6HVlfn2+F8XE+TTz yDwds/Zv39o1t09D5eQzNWH1XU/ee5SkJUTnpY/QWhyedo8imzwuaJfjCJvVRtih13ym ZQur7EVNfT73O2vgtYaUf0QLe72l29cWmJxhdAqdtaW0V7lG4B2xWghiLgZ35MaHdHT5 xYRqxyG8jf3XEodOe9xxBGLDtV4dIWE9oX1K6m6+MsQQCY9IiJhGtN44nINkWeQbXYEO BV9c1s5i3nUycU33ntMAaaq3F10KiAkroIZTRHUp4uQPk8IJrMPuhfg+Mx78d/4NwSLf l97A== X-Received: by 10.140.27.228 with SMTP id 91mr51669212qgx.78.1450178502845; Tue, 15 Dec 2015 03:21:42 -0800 (PST) Received: from localhost (56.34.213.162.lcy-01.canonistack.canonical.com. [162.213.34.56]) by smtp.gmail.com with ESMTPSA id t107sm287122qgd.0.2015.12.15.03.21.39 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 15 Dec 2015 03:21:41 -0800 (PST) From: Ming Lei To: linux-kernel@vger.kernel.org, Alexei Starovoitov Cc: "David S. Miller" , netdev@vger.kernel.org, Ming Lei Subject: [PATCH 4/6] bpf: hash: convert per-hashtable lock into per-bucket bit spinlock Date: Tue, 15 Dec 2015 19:21:02 +0800 Message-Id: <1450178464-27721-5-git-send-email-tom.leiming@gmail.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1450178464-27721-1-git-send-email-tom.leiming@gmail.com> References: <1450178464-27721-1-git-send-email-tom.leiming@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used just for protecting bucket's hlist, and per-bucket lock should be enough. This patch converts the per-hashtable lock into per-bucket bit spinlock, so that contention can be decreased a lot, and no extra memory can be consumed for these locks. Signed-off-by: Ming Lei --- kernel/bpf/hashtab.c | 38 ++++++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index d857fcb..8543fea 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -17,7 +17,6 @@ struct bpf_htab { struct bpf_map map; struct hlist_head *buckets; - raw_spinlock_t lock; atomic_t count; /* number of elements in this hashtable */ u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ @@ -105,7 +104,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) for (i = 0; i < htab->n_buckets; i++) INIT_HLIST_HEAD(&htab->buckets[i]); - raw_spin_lock_init(&htab->lock); atomic_set(&htab->count, 0); return &htab->map; @@ -142,6 +140,7 @@ static void *htab_map_lookup_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_head *head; + struct hlist_head h; struct htab_elem *l; u32 hash, key_size; @@ -153,6 +152,7 @@ static void *htab_map_lookup_elem(struct bpf_map *map, void *key) hash = htab_map_hash(key, key_size); head = select_bucket(htab, hash); + head = hlist_get_head_lock(head, &h); l = lookup_elem_raw(head, hash, key, key_size); @@ -167,6 +167,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_head *head; + struct hlist_head h; struct htab_elem *l, *next_l; u32 hash, key_size; int i; @@ -178,6 +179,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) hash = htab_map_hash(key, key_size); head = select_bucket(htab, hash); + head = hlist_get_head_lock(head, &h); /* lookup the key */ l = lookup_elem_raw(head, hash, key, key_size); @@ -205,6 +207,7 @@ find_first_elem: /* iterate over buckets */ for (; i < htab->n_buckets; i++) { head = select_bucket(htab, i); + head = hlist_get_head_lock(head, &h); /* pick first element in the bucket */ next_l = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(head)), @@ -227,6 +230,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old; struct hlist_head *head; + struct hlist_head h; unsigned long flags; u32 key_size; int ret; @@ -251,9 +255,11 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, head = select_bucket(htab, l_new->hash); /* bpf_map_update_elem() can be called in_irq() */ - raw_spin_lock_irqsave(&htab->lock, flags); + raw_local_irq_save(flags); + bit_spin_lock(HLIST_LOCK_BIT, (unsigned long *)&head->first); - l_old = lookup_elem_raw(head, l_new->hash, key, key_size); + l_old = lookup_elem_raw(hlist_get_head_lock(head, &h), l_new->hash, + key, key_size); if (!l_old && unlikely(atomic_read(&htab->count) >= map->max_entries)) { /* if elem with this 'key' doesn't exist and we've reached @@ -278,18 +284,20 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, /* add new element to the head of the list, so that concurrent * search will find it before old elem */ - hlist_add_head_rcu(&l_new->hash_node, head); + hlist_add_head_rcu_lock(&l_new->hash_node, head); if (l_old) { - hlist_del_rcu(&l_old->hash_node); + hlist_del_rcu_lock(&l_old->hash_node); kfree_rcu(l_old, rcu); } else { atomic_inc(&htab->count); } - raw_spin_unlock_irqrestore(&htab->lock, flags); + bit_spin_unlock(HLIST_LOCK_BIT, (unsigned long *)&head->first); + raw_local_irq_restore(flags); return 0; err: - raw_spin_unlock_irqrestore(&htab->lock, flags); + bit_spin_unlock(HLIST_LOCK_BIT, (unsigned long *)&head->first); + raw_local_irq_restore(flags); kfree(l_new); return ret; } @@ -299,6 +307,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_head *head; + struct hlist_head h; struct htab_elem *l; unsigned long flags; u32 hash, key_size; @@ -311,18 +320,20 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) hash = htab_map_hash(key, key_size); head = select_bucket(htab, hash); - raw_spin_lock_irqsave(&htab->lock, flags); + raw_local_irq_save(flags); + bit_spin_lock(HLIST_LOCK_BIT, (unsigned long *)&head->first); - l = lookup_elem_raw(head, hash, key, key_size); + l = lookup_elem_raw(hlist_get_head_lock(head, &h), hash, key, key_size); if (l) { - hlist_del_rcu(&l->hash_node); + hlist_del_rcu_lock(&l->hash_node); atomic_dec(&htab->count); kfree_rcu(l, rcu); ret = 0; } - raw_spin_unlock_irqrestore(&htab->lock, flags); + bit_spin_unlock(HLIST_LOCK_BIT, (unsigned long *)&head->first); + raw_local_irq_restore(flags); return ret; } @@ -332,9 +343,12 @@ static void delete_all_elements(struct bpf_htab *htab) for (i = 0; i < htab->n_buckets; i++) { struct hlist_head *head = select_bucket(htab, i); + struct hlist_head h; struct hlist_node *n; struct htab_elem *l; + head = hlist_get_head_lock(head, &h); + hlist_for_each_entry_safe(l, n, head, hash_node) { hlist_del_rcu(&l->hash_node); atomic_dec(&htab->count);