Message ID | 20170920162247.63787-1-kraigatgoog@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] bpf: Optimize lpm trie delete | expand |
Hi Craig, Thanks, this looks much cleaner already :) On 09/20/2017 06:22 PM, Craig Gallek wrote: > diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c > index 9d58a576b2ae..b5a7d70ec8b5 100644 > --- a/kernel/bpf/lpm_trie.c > +++ b/kernel/bpf/lpm_trie.c > @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) > struct lpm_trie_node __rcu **trim; > struct lpm_trie_node *node; > unsigned long irq_flags; > - unsigned int next_bit; > + unsigned int next_bit = 0; This default assignment seems wrong, and I guess you only added it to squelch a compiler warning? [...] > + /* If the node has one child, we may be able to collapse the tree > + * while removing this node if the node's child is in the same > + * 'next bit' slot as this node was in its parent or if the node > + * itself is the root. > + */ > + if (trim == &trie->root) { > + next_bit = node->child[0] ? 0 : 1; > + rcu_assign_pointer(trie->root, node->child[next_bit]); > + kfree_rcu(node, rcu); I don't think you should treat this 'root' case special. Instead, move the 'next_bit' assignment outside of the condition ... > + } else if (rcu_access_pointer(node->child[next_bit])) { > + rcu_assign_pointer(*trim, node->child[next_bit]); > + kfree_rcu(node, rcu); ... and then this branch would handle the case just fine. Correct? Otherwise, looks good to me! Thanks, Daniel
On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <daniel@zonque.org> wrote: > Hi Craig, > > Thanks, this looks much cleaner already :) > > On 09/20/2017 06:22 PM, Craig Gallek wrote: >> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c >> index 9d58a576b2ae..b5a7d70ec8b5 100644 >> --- a/kernel/bpf/lpm_trie.c >> +++ b/kernel/bpf/lpm_trie.c >> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) >> struct lpm_trie_node __rcu **trim; >> struct lpm_trie_node *node; >> unsigned long irq_flags; >> - unsigned int next_bit; >> + unsigned int next_bit = 0; > > This default assignment seems wrong, and I guess you only added it to > squelch a compiler warning? Yes, this variable is only initialized after the lookup iterations below (meaning it will never be initialized the the root-removal case). > [...] > >> + /* If the node has one child, we may be able to collapse the tree >> + * while removing this node if the node's child is in the same >> + * 'next bit' slot as this node was in its parent or if the node >> + * itself is the root. >> + */ >> + if (trim == &trie->root) { >> + next_bit = node->child[0] ? 0 : 1; >> + rcu_assign_pointer(trie->root, node->child[next_bit]); >> + kfree_rcu(node, rcu); > > I don't think you should treat this 'root' case special. > > Instead, move the 'next_bit' assignment outside of the condition ... I'm not quite sure I follow. Are you saying do something like this: if (trim == &trie->root) { next_bit = node->child[0] ? 0 : 1; } if (rcu_access_pointer(node->child[next_bit])) { ... This would save a couple lines of code, but I think the as-is implementation is slightly easier to understand. I don't have a strong opinion either way, though. Thanks for the pointers, Craig
On 09/20/2017 08:51 PM, Craig Gallek wrote: > On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <daniel@zonque.org> wrote: >> Hi Craig, >> >> Thanks, this looks much cleaner already :) >> >> On 09/20/2017 06:22 PM, Craig Gallek wrote: >>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c >>> index 9d58a576b2ae..b5a7d70ec8b5 100644 >>> --- a/kernel/bpf/lpm_trie.c >>> +++ b/kernel/bpf/lpm_trie.c >>> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) >>> struct lpm_trie_node __rcu **trim; >>> struct lpm_trie_node *node; >>> unsigned long irq_flags; >>> - unsigned int next_bit; >>> + unsigned int next_bit = 0; >> >> This default assignment seems wrong, and I guess you only added it to >> squelch a compiler warning? > Yes, this variable is only initialized after the lookup iterations > below (meaning it will never be initialized the the root-removal > case). Right, and once set, it's only updated in case we don't have an exact match and try to drill down further. >> [...] >> >>> + /* If the node has one child, we may be able to collapse the tree >>> + * while removing this node if the node's child is in the same >>> + * 'next bit' slot as this node was in its parent or if the node >>> + * itself is the root. >>> + */ >>> + if (trim == &trie->root) { >>> + next_bit = node->child[0] ? 0 : 1; >>> + rcu_assign_pointer(trie->root, node->child[next_bit]); >>> + kfree_rcu(node, rcu); >> >> I don't think you should treat this 'root' case special. >> >> Instead, move the 'next_bit' assignment outside of the condition ... > I'm not quite sure I follow. Are you saying do something like this: > > if (trim == &trie->root) { > next_bit = node->child[0] ? 0 : 1; > } > if (rcu_access_pointer(node->child[next_bit])) { > ... > > This would save a couple lines of code, but I think the as-is > implementation is slightly easier to understand. I don't have a > strong opinion either way, though. Me neither :) My idea was to set next_bit = node->child[0] ? 0 : 1; unconditionally, because it should result in the same in both cases. It might be a bit of bike shedding, but I dislike this default assignment, and I believe that not relying on next_bit to be set as a side effect of the lookup loop makes the code a bit more readable. WDYT? Thanks, Daniel
On Wed, Sep 20, 2017 at 6:56 PM, Daniel Mack <daniel@zonque.org> wrote: > On 09/20/2017 08:51 PM, Craig Gallek wrote: >> On Wed, Sep 20, 2017 at 12:51 PM, Daniel Mack <daniel@zonque.org> wrote: >>> Hi Craig, >>> >>> Thanks, this looks much cleaner already :) >>> >>> On 09/20/2017 06:22 PM, Craig Gallek wrote: >>>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c >>>> index 9d58a576b2ae..b5a7d70ec8b5 100644 >>>> --- a/kernel/bpf/lpm_trie.c >>>> +++ b/kernel/bpf/lpm_trie.c >>>> @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) >>>> struct lpm_trie_node __rcu **trim; >>>> struct lpm_trie_node *node; >>>> unsigned long irq_flags; >>>> - unsigned int next_bit; >>>> + unsigned int next_bit = 0; >>> >>> This default assignment seems wrong, and I guess you only added it to >>> squelch a compiler warning? >> Yes, this variable is only initialized after the lookup iterations >> below (meaning it will never be initialized the the root-removal >> case). > > Right, and once set, it's only updated in case we don't have an exact > match and try to drill down further. > >>> [...] >>> >>>> + /* If the node has one child, we may be able to collapse the tree >>>> + * while removing this node if the node's child is in the same >>>> + * 'next bit' slot as this node was in its parent or if the node >>>> + * itself is the root. >>>> + */ >>>> + if (trim == &trie->root) { >>>> + next_bit = node->child[0] ? 0 : 1; >>>> + rcu_assign_pointer(trie->root, node->child[next_bit]); >>>> + kfree_rcu(node, rcu); >>> >>> I don't think you should treat this 'root' case special. >>> >>> Instead, move the 'next_bit' assignment outside of the condition ... >> I'm not quite sure I follow. Are you saying do something like this: >> >> if (trim == &trie->root) { >> next_bit = node->child[0] ? 0 : 1; >> } >> if (rcu_access_pointer(node->child[next_bit])) { >> ... >> >> This would save a couple lines of code, but I think the as-is >> implementation is slightly easier to understand. I don't have a >> strong opinion either way, though. > > Me neither :) > > My idea was to set > > next_bit = node->child[0] ? 0 : 1; > > unconditionally, because it should result in the same in both cases. > > It might be a bit of bike shedding, but I dislike this default > assignment, and I believe that not relying on next_bit to be set as a > side effect of the lookup loop makes the code a bit more readable. > > WDYT? That sounds reasonable. I'll spin a v2 today if no one else has any comments. Thanks again, Craig
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index 9d58a576b2ae..b5a7d70ec8b5 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -397,7 +397,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) struct lpm_trie_node __rcu **trim; struct lpm_trie_node *node; unsigned long irq_flags; - unsigned int next_bit; + unsigned int next_bit = 0; size_t matchlen = 0; int ret = 0; @@ -408,14 +408,12 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) /* Walk the tree looking for an exact key/length match and keeping * track of where we could begin trimming the tree. The trim-point - * is the sub-tree along the walk consisting of only single-child - * intermediate nodes and ending at a leaf node that we want to - * remove. + * is the location of the pointer where we will remove a node from the + * tree. */ trim = &trie->root; - node = rcu_dereference_protected( - trie->root, lockdep_is_held(&trie->lock)); - while (node) { + while ((node = rcu_dereference_protected( + *trim, lockdep_is_held(&trie->lock)))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || @@ -423,15 +421,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) break; next_bit = extract_bit(key->data, node->prefixlen); - /* If we hit a node that has more than one child or is a valid - * prefix itself, do not remove it. Reset the root of the trim - * path to its descendant on our path. - */ - if (!(node->flags & LPM_TREE_NODE_FLAG_IM) || - (node->child[0] && node->child[1])) - trim = &node->child[next_bit]; - node = rcu_dereference_protected( - node->child[next_bit], lockdep_is_held(&trie->lock)); + trim = &node->child[next_bit]; } if (!node || node->prefixlen != key->prefixlen || @@ -442,25 +432,38 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) trie->n_entries--; - /* If the node we are removing is not a leaf node, simply mark it + /* If the node we are removing has two children, simply mark it * as intermediate and we are done. */ - if (rcu_access_pointer(node->child[0]) || + if (rcu_access_pointer(node->child[0]) && rcu_access_pointer(node->child[1])) { node->flags |= LPM_TREE_NODE_FLAG_IM; goto out; } - /* trim should now point to the slot holding the start of a path from - * zero or more intermediate nodes to our leaf node for deletion. - */ - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { + /* If the node has no children, it can be completely removed */ + if (!rcu_access_pointer(node->child[0]) && + !rcu_access_pointer(node->child[1])) { RCU_INIT_POINTER(*trim, NULL); - trim = rcu_access_pointer(node->child[0]) ? - &node->child[0] : - &node->child[1]; kfree_rcu(node, rcu); + goto out; + } + + /* If the node has one child, we may be able to collapse the tree + * while removing this node if the node's child is in the same + * 'next bit' slot as this node was in its parent or if the node + * itself is the root. + */ + if (trim == &trie->root) { + next_bit = node->child[0] ? 0 : 1; + rcu_assign_pointer(trie->root, node->child[next_bit]); + kfree_rcu(node, rcu); + } else if (rcu_access_pointer(node->child[next_bit])) { + rcu_assign_pointer(*trim, node->child[next_bit]); + kfree_rcu(node, rcu); + } else { + /* If we can't collapse, just mark this node as intermediate */ + node->flags |= LPM_TREE_NODE_FLAG_IM; } out: