From patchwork Fri Mar 10 19:34:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 737553 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vfyCb5Ptbz9s7m for ; Sat, 11 Mar 2017 06:35:23 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 5E64FB47; Fri, 10 Mar 2017 19:35:20 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A46A4B35 for ; Fri, 10 Mar 2017 19:35:19 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6D4DE1A6 for ; Fri, 10 Mar 2017 19:35:18 +0000 (UTC) Received: from mfilter22-d.gandi.net (mfilter22-d.gandi.net [217.70.178.150]) by relay3-d.mail.gandi.net (Postfix) with ESMTP id 3CE41A80CB for ; Fri, 10 Mar 2017 20:35:17 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at mfilter22-d.gandi.net Received: from relay3-d.mail.gandi.net ([IPv6:::ffff:217.70.183.195]) by mfilter22-d.gandi.net (mfilter22-d.gandi.net [::ffff:10.0.15.180]) (amavisd-new, port 10024) with ESMTP id mFBLgYnPhAOH for ; Fri, 10 Mar 2017 20:35:14 +0100 (CET) X-Originating-IP: 209.85.220.179 Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com [209.85.220.179]) (Authenticated sender: joe@ovn.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 55288A80C8 for ; Fri, 10 Mar 2017 20:35:14 +0100 (CET) Received: by mail-qk0-f179.google.com with SMTP id 1so182813883qkl.3 for ; Fri, 10 Mar 2017 11:35:14 -0800 (PST) X-Gm-Message-State: AMke39lHhnKhUdvDOqeTltANnHk6ncJhXNXv7MfDSh8LNSqAdU31n+tsrTzZuqJSMCiHJZPxp0gFwUpNuo/3FQ== X-Received: by 10.55.183.133 with SMTP id h127mr21969423qkf.121.1489174512805; Fri, 10 Mar 2017 11:35:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.163.229 with HTTP; Fri, 10 Mar 2017 11:34:52 -0800 (PST) In-Reply-To: References: <1488908341-42514-1-git-send-email-yihung.wei@gmail.com> <1488908341-42514-2-git-send-email-yihung.wei@gmail.com> From: Joe Stringer Date: Fri, 10 Mar 2017 11:34:52 -0800 X-Gmail-Original-Message-ID: Message-ID: To: Yi-Hung Wei X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: ovs dev Subject: Re: [ovs-dev] [PATCH 2/2] ofproto: Add ref counting for variable length mf_fields. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org On 9 March 2017 at 10:22, Yi-Hung Wei wrote: > On Wed, Mar 8, 2017 at 11:40 AM, Joe Stringer wrote: >> > @@ -145,6 +150,7 @@ learn_execute(const struct ofpact_learn *learn, >> > const struct flow *flow, >> > spec->n_bits); >> > bitwise_one(ofpact_set_field_mask(sf), >> > spec->dst.field->n_bytes, >> > spec->dst.ofs, spec->n_bits); >> > + mf_vl_mff_set_tlv_bitmap(spec->dst.field, >> > &fm->ofpacts_tlv_bitmap); >> > break; >> > >> > case NX_LEARN_DST_OUTPUT: >> >> I think we should also set the tlv bitmap in this LEARN_DST_OUTPUT >> case? IIUC this uses an OXM field to determine the output port. I >> believe it's possible for someone to use a TLV to specify an output >> action using LEARN - so it would set the bitmap in the same way that >> the NX_LEARN_DST_LOAD case does. > > Yes, for output action in learn, we can specify a TLV to be the SRC field, > but the DST field will eventually be an immediate value that we derive from > the SRC field. So the ref count for a TLV in LEARN_OUTPUT is incresed when > we decode the learn flow indecode_NXAST_RAW_LEARN(), but not when we create > the "learned" flow in learn_execute(). If I follow, you're saying that the learn flow (original) takes a reference on the field, so when we create the "learned" flow in learn_execute, we don't take a reference on the field. But could the original learn flow be deleted and leave the learned flow behind without a reference? >> >> >> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c >> > index 40704e628aaa..7807b3055f5b 100644 >> > --- a/lib/meta-flow.c >> > +++ b/lib/meta-flow.c >> > @@ -27,6 +27,8 @@ >> > #include "openvswitch/dynamic-string.h" >> > #include "nx-match.h" >> > #include "openvswitch/ofp-util.h" >> > +#include "ofproto/ofproto-provider.h" >> > +#include "ovs-atomic.h" >> > #include "ovs-rcu.h" >> > #include "ovs-thread.h" >> > #include "packets.h" >> > @@ -2646,6 +2648,7 @@ field_array_set(enum mf_field_id id, const union >> > mf_value *value, >> > * struct vl_mff_map.*/ >> > struct vl_mf_field { >> > struct mf_field mf; >> > + struct ovs_refcount ref_cnt; >> > struct cmap_node cmap_node; /* In ofproto->vl_mff_map->cmap. */ >> > }; >> > >> > @@ -2655,17 +2658,25 @@ mf_field_hash(uint32_t key) >> > return hash_int(key, 0); >> > } >> > >> > -void >> > -mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map) >> > +enum ofperr >> > +mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool clear) >> > OVS_REQUIRES(vl_mff_map->mutex) >> > { >> > struct vl_mf_field *vmf; >> > >> > CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) { >> > - cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node, >> > - mf_field_hash(vmf->mf.id)); >> > - ovsrcu_postpone(free, vmf); >> > + if (clear) { >> > + cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node, >> > + mf_field_hash(vmf->mf.id)); >> > + ovsrcu_postpone(free, vmf); >> > + } else { >> > + if (ovs_refcount_read(&vmf->ref_cnt) != 1) { >> > + return OFPERR_NXTTMFC_INVALID_TLV_DEL; >> > + } >> > + } >> > } >> > + >> > + return 0; >> > } >> >> I found this a bit confusing. mf_vl_mff_map_clear() has a boolean, >> also named 'clear' that determines whether it will actually clear or >> not. If you call a function like "clear(clear=false)", what would it >> intuitively do? >> >> I see two calling conventions - first, for TLV clear command from >> controller: >> error = mf_vl_mff_map_clear(vl_mff_map, false); >> if (error) >> return error; >> error = mf_vl_mff_map_clear(vl_mff_map, true); >> >> Second, from ofproto cleanup code: >> mf_vl_mff_map_clear(vl_mff_map, true); >> >> I suggest that this would be easier to understand if the function were >> structured with the the bool to be 'force': >> >> enum ofperr >> mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool force) >> OVS_REQUIRES(vl_mff_map->mutex) >> { >> struct vl_mf_field *vmf; >> >> /* First, check whether any flows still refer to the current TLV >> map. */ >> if (!force) { >> CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) { >> if (ovs_refcount_read(&vmf->ref_cnt) != 1) { >> return OFPERR_NXTTMFC_INVALID_TLV_DEL; >> } >> } >> } >> >> CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) { >> cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node, >> mf_field_hash(vmf->mf.id)); >> ovsrcu_postpone(free, vmf); >> } >> >> return 0; >> } > > Thanks for suggestion, it is much clear after modification. > >> >> >> You may also consider refactoring the ovsrcu_postpone(free, vmf) into >> a separate, vmf_delete() function which looks something like this: >> >> static void vmf_delete(struct vl_mf_field *vmf) >> { >> if (ovs_refcount_unref(&vmf->ref_cnt) == 1) { >> /* Postpone as this function is typically called immediately >> after removing from cmap. */ >> ovsrcu_postpone(free, vmf); >> } else { >> VLOG_WARN_RL(&rl, "Attempted to delete VMF %s but refcount is >> nonzero!", >> vmf->mf.name); >> } >> } >> >> This could be reused from the other delete locations as well. > > I do not factor out the ovsrcu_postpone(), becuase the only case that we > force to clear the vl_mff_map is when we destory ofproto. Otherwise, we are > not going to remove vmf from vl_mff_map if ref_count > 1. I guess my point was partly that both mf_vl_mff_map_del() and mf_vl_mff_map_clear() can potentially RCU-defer freeing of the VMF, and if we push that out to a separate function here that ensures that the refcount is what we expect it to be, then it's more defensively coded against potential misuse - attempting to free when it's still being used, for instance. >> > +static void >> > +mf_vl_mff_ref_cnt_mod(const struct vl_mff_map *map, uint64_t >> > tlv_bitmap, >> > + bool ref) >> > +{ >> > + struct vl_mf_field *vmf; >> > + int i; >> > + >> > + if (map) { >> > + ULLONG_FOR_EACH_1 (i, tlv_bitmap) { >> > + vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map); >> > + if (vmf) { >> > + if (ref) { >> > + ovs_refcount_ref(&vmf->ref_cnt); >> > + } else { >> > + ovs_refcount_unref(&vmf->ref_cnt); >> >> We should warn if we're trying to unref and this is the final ref, >> because then we're not freeing the object. > > It could be the case that no active flow is currently use this tlv, but the > controller is going to use this tlv entry for some new flows. In this case, > it is not necessary to free vmf since it will be useful. Typically when refcounts are used, the last one to decrement it -> 0 is responsible for freeing it. Also, if it goes to 0 and something is able to find the element, then it probably shouldn't increment the refcount because most likely the piece of code that decremented it to 0 also rcu-deferred the free. It's easier to reason about how we ensure that the memory a) valid to access and b) is eventually freed if we stick to this idiom. For most cases, this is hit from the main thread which handles OpenFlow messages; if it's decoding a message and acquiring references to fields, then by definition it can't also be removing the fields that we're trying to reference, so I don't think this should ever decrement the refcount -> 0; therefore, if we warn, we could easily see that we have some bad assumption in how we understand this code. I will note though, I also see some learn_execute() callers for this, which means that another thread is getting involved - perhaps an upcall handler or perhaps a revalidator. This makes me a little more nervous about ensuring that references to the fields are being correctly counted and whether we may be trying to retain access to a field which was recently freed. Most likely it's fine, since that other thread would have to find a flow which references these fields to be able to take another reference (eg for the "learned" flow); but warning something like this would be more explicit in this function: VLOG_WARN("Invalid TLV index %d.", i); Maybe it's over the top. But it would be nicer to know through a log message than just have some crazy bug show up later on and try to figure out that this is what is happening. >> > @@ -1164,6 +1169,8 @@ decode_NXAST_RAW_OUTPUT_REG2(const struct >> > nx_action_output_reg2 *naor, >> > if (error) { >> > return error; >> > } >> > + mf_vl_mff_set_tlv_bitmap(output_reg->src.field, tlv_bitmap); >> > + >> >> I wonder if it's cleaner to fold this into nx_pull_header(). >> >> It's not so obvious why the ofpacts decode functions in this file >> inconsistently need to call the mf_check_vl_mff() and/or >> mf_vl_mff_set_tlv_bitmap(), which makes it a little more difficult to >> verify (and easier to mess up if new actions are added). It'd be nice >> to make this more consistent. > > > Thanks, it is hard to follow as in v1. Actually, we use nx_pull_header(), > nx_pull_entry(), and mf_from_nxm_header() to get a mf_field in ofp-action.c > and those three functions are heavily use in other part of the code base > that may not need to update the bitmap. So instead to fold > mf_vl_mff_set_tlv_bitmap() directly into the three functions, three wrapper > functions mf_vl_mff_nx_pull_header(), mf_vl_mff_nx_pull_entry(), and > mf_vl_mff_mf_from_nxm_header() are created that handle mf_field operation > and set the tlv_bitmap, and it should be more easier to follow now. OK, fair enough, thanks. >> > @@ -1603,6 +1599,12 @@ ofproto_destroy__(struct ofproto *ofproto) >> > } >> > free(ofproto->tables); >> > >> > + ovs_mutex_lock(&ofproto->vl_mff_map.mutex); >> > + mf_vl_mff_map_clear(&ofproto->vl_mff_map, true); >> > + ovs_mutex_unlock(&ofproto->vl_mff_map.mutex); >> > + cmap_destroy(&ofproto->vl_mff_map.cmap); >> > + ovs_mutex_destroy(&ofproto->vl_mff_map.mutex); >> > + >> > ovs_assert(hindex_is_empty(&ofproto->cookies)); >> > hindex_destroy(&ofproto->cookies); >> > >> >> Is there a reason for the above two hunks to shift this logic later? >> Should this be in a separate patch to explain why, or could it be >> dropped? >> > Cause I feel more comfortable to remove the vl_mf_field after all the rules > are deleted. In practice, it is invoked when ofproto is destroyed, so it > probabaly does not matter? I think it should be another patch to move it, and describe this concern in the commit message. >> > @@ -7784,14 +7802,15 @@ handle_tlv_table_mod(struct ofconn *ofconn, >> > const struct ofp_header *oh) >> > old_tab = ovsrcu_get_protected(struct tun_table *, >> > &ofproto->metadata_tab); >> > error = tun_metadata_table_mod(&ttm, old_tab, &new_tab); >> > if (!error) { >> > - ovsrcu_set(&ofproto->metadata_tab, new_tab); >> > - tun_metadata_postpone_free(old_tab); >> > - >> > ovs_mutex_lock(&ofproto->vl_mff_map.mutex); >> > error = >> > mf_vl_mff_map_mod_from_tun_metadata(&ofproto->vl_mff_map, >> > &ttm); >> > ovs_mutex_unlock(&ofproto->vl_mff_map.mutex); >> > } >> > + if (!error) { >> > + ovsrcu_set(&ofproto->metadata_tab, new_tab); >> > + tun_metadata_postpone_free(old_tab); >> > + } >> >> Maybe I didn't quite dig deep enough, but what's the implications of >> moving this? Could it fix another bug, should it be a separate patch? > > > I'm not sure if it be a separate patch? What this change is that we can only > modify update the TLV mapping table if all the table modification operations > are valid. May it makes more sense like this > > if (!error) { > ovs_mutex_lock(&ofproto->vl_mff_map.mutex); > error = mf_vl_mff_map_mod_from_tun_metadata(&ofproto->vl_mff_map, > &ttm); > ovs_mutex_unlock(&ofproto->vl_mff_map.mutex); > if (!error) { > ovsrcu_set(&ofproto->metadata_tab, new_tab); > tun_metadata_postpone_free(old_tab); > } > } LGTM. diff --git a/lib/meta-flow.c b/lib/meta-flow.c index e844008f6294..bef5aad768a3 100644 --- a/lib/meta-flow.c +++ b/lib/meta-flow.c @@ -2836,9 +2836,13 @@ mf_vl_mff_ref_cnt_mod(const struct vl_mff_map *map, uint64_t tlv_bitmap, vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map); if (vmf) { if (ref) { - ovs_refcount_ref(&vmf->ref_cnt); + if (ovs_refcount_ref(&vmf->ref_cnt) == 0) { + VLOG_WARN("Taking reference on freed VMF %d", i); + } } else { - ovs_refcount_unref(&vmf->ref_cnt); + if (ovs_refcount_unref(&vmf->ref_cnt) == 1) { + VLOG_WARN("Dereferencing VMF %d without free", i); + } } } else {