From patchwork Wed Sep 20 16:22:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Gallek X-Patchwork-Id: 816347 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3xy4ly1KQHz9sPt for ; Thu, 21 Sep 2017 02:22:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751622AbdITQWv (ORCPT ); Wed, 20 Sep 2017 12:22:51 -0400 Received: from mail-qt0-f171.google.com ([209.85.216.171]:44163 "EHLO mail-qt0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751499AbdITQWu (ORCPT ); Wed, 20 Sep 2017 12:22:50 -0400 Received: by mail-qt0-f171.google.com with SMTP id o13so3350238qtf.1 for ; Wed, 20 Sep 2017 09:22:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=6Ez9gv50ScccmyxKVx8i5BIca4h3uBx1kGkrxIfBdvU=; b=oldWyGqak3v94YFYaEAdjHyHYhTGUOGz24ptaqKN6UDf5YhMR+/ffE2NqKm8cU5beG Zc3auFogS+YtEHyCodGGwnR5t8QDRHTv0/A58T1pSuNCiKaP0v4K6V0r60R2eM7nhsny CLyJBLFm8dH6UXG7yWVEzIKPi9JDJY+UC/g/Asch7eJNPte83L+jMKiFGqus7uJTfR+C A/4yhCzp8MWOck/TQvosSpHHXpEiiff7bofAkQaRDHQQuVROn0UclolSZnCSXme1ZGZG JpN5CLWNEI9c5wAyJscmsbGO+bxFQEqjE1dKHnNco9ixOUx8Oe5XLOI6NvLUaiHGY089 5ceA== X-Gm-Message-State: AHPjjUi9Exxab/IJW322nvnXav67VBM5HH++sW6+vgen8R5zIXVi90LM a45G6GdkYSUs/vn95q+aBZ+5zQ== X-Google-Smtp-Source: AOwi7QCgECV9xAcoyJkMcenseJhFZiKhEAFPHiSJIMS0I7RutyWVkulEWzmd85UGisS+rhPB8H2BhQ== X-Received: by 10.200.34.167 with SMTP id f36mr8267991qta.173.1505924569372; Wed, 20 Sep 2017 09:22:49 -0700 (PDT) Received: from monkey.nyc.corp.google.com ([100.101.213.10]) by smtp.gmail.com with ESMTPSA id y11sm1590917qkb.20.2017.09.20.09.22.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 20 Sep 2017 09:22:48 -0700 (PDT) From: Craig Gallek To: Daniel Mack , Alexei Starovoitov , Daniel Borkmann , "David S . Miller" Cc: netdev@vger.kernel.org Subject: [PATCH net-next] bpf: Optimize lpm trie delete Date: Wed, 20 Sep 2017 12:22:47 -0400 Message-Id: <20170920162247.63787-1-kraigatgoog@gmail.com> X-Mailer: git-send-email 2.14.1.821.g8fa685d3b7-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Craig Gallek Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnecessary intermediate nodes after a node is removed and thus keeping the tree structure at a minimal size. Suggested-by: Daniel Mack Signed-off-by: Craig Gallek --- kernel/bpf/lpm_trie.c | 55 +++++++++++++++++++++++++++------------------------ 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index 9d58a576b2ae..b5a7d70ec8b5 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) struct lpm_trie_node __rcu **trim; struct lpm_trie_node *node; unsigned long irq_flags; - unsigned int next_bit; + unsigned int next_bit = 0; size_t matchlen = 0; int ret = 0; @@ -408,14 +408,12 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) /* Walk the tree looking for an exact key/length match and keeping * track of where we could begin trimming the tree. The trim-point - * is the sub-tree along the walk consisting of only single-child - * intermediate nodes and ending at a leaf node that we want to - * remove. + * is the location of the pointer where we will remove a node from the + * tree. */ trim = &trie->root; - node = rcu_dereference_protected( - trie->root, lockdep_is_held(&trie->lock)); - while (node) { + while ((node = rcu_dereference_protected( + *trim, lockdep_is_held(&trie->lock)))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -423,15 +421,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) break; next_bit = extract_bit(key->data, node->prefixlen); - /* If we hit a node that has more than one child or is a valid - * prefix itself, do not remove it. Reset the root of the trim - * path to its descendant on our path. - */ - if (!(node->flags & LPM_TREE_NODE_FLAG_IM) || - (node->child[0] && node->child[1])) - trim = &node->child[next_bit]; - node = rcu_dereference_protected( - node->child[next_bit], lockdep_is_held(&trie->lock)); + trim = &node->child[next_bit]; } if (!node || node->prefixlen != key->prefixlen || @@ -442,25 +432,38 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) trie->n_entries--; - /* If the node we are removing is not a leaf node, simply mark it + /* If the node we are removing has two children, simply mark it * as intermediate and we are done. */ - if (rcu_access_pointer(node->child[0]) || + if (rcu_access_pointer(node->child[0]) && rcu_access_pointer(node->child[1])) { node->flags |= LPM_TREE_NODE_FLAG_IM; goto out; } - /* trim should now point to the slot holding the start of a path from - * zero or more intermediate nodes to our leaf node for deletion. - */ - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + /* If the node has no children, it can be completely removed */ + if (!rcu_access_pointer(node->child[0]) && + !rcu_access_pointer(node->child[1])) { RCU_INIT_POINTER(*trim, NULL); - trim = rcu_access_pointer(node->child[0]) ? - &node->child[0] : - &node->child[1]; kfree_rcu(node, rcu); + goto out; + } + + /* If the node has one child, we may be able to collapse the tree + * while removing this node if the node's child is in the same + * 'next bit' slot as this node was in its parent or if the node + * itself is the root. + */ + if (trim == &trie->root) { + next_bit = node->child[0] ? 0 : 1; + rcu_assign_pointer(trie->root, node->child[next_bit]); + kfree_rcu(node, rcu); + } else if (rcu_access_pointer(node->child[next_bit])) { + rcu_assign_pointer(*trim, node->child[next_bit]); + kfree_rcu(node, rcu); + } else { + /* If we can't collapse, just mark this node as intermediate */ + node->flags |= LPM_TREE_NODE_FLAG_IM; } out: