From patchwork Fri Mar 10 19:34:52 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Joe Stringer <joe@ovn.org>
X-Patchwork-Id: 737553
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3vfyCb5Ptbz9s7m
	for <incoming@patchwork.ozlabs.org>;
	Sat, 11 Mar 2017 06:35:23 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 5E64FB47;
	Fri, 10 Mar 2017 19:35:20 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id A46A4B35
	for <dev@openvswitch.org>; Fri, 10 Mar 2017 19:35:19 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
	[217.70.183.195])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6D4DE1A6
	for <dev@openvswitch.org>; Fri, 10 Mar 2017 19:35:18 +0000 (UTC)
Received: from mfilter22-d.gandi.net (mfilter22-d.gandi.net [217.70.178.150])
	by relay3-d.mail.gandi.net (Postfix) with ESMTP id 3CE41A80CB
	for <dev@openvswitch.org>; Fri, 10 Mar 2017 20:35:17 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at mfilter22-d.gandi.net
Received: from relay3-d.mail.gandi.net ([IPv6:::ffff:217.70.183.195])
	by mfilter22-d.gandi.net (mfilter22-d.gandi.net [::ffff:10.0.15.180])
	(amavisd-new, port 10024)
	with ESMTP id mFBLgYnPhAOH for <dev@openvswitch.org>;
	Fri, 10 Mar 2017 20:35:14 +0100 (CET)
X-Originating-IP: 209.85.220.179
Received: from mail-qk0-f179.google.com (mail-qk0-f179.google.com
	[209.85.220.179]) (Authenticated sender: joe@ovn.org)
	by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 55288A80C8
	for <dev@openvswitch.org>; Fri, 10 Mar 2017 20:35:14 +0100 (CET)
Received: by mail-qk0-f179.google.com with SMTP id 1so182813883qkl.3
	for <dev@openvswitch.org>; Fri, 10 Mar 2017 11:35:14 -0800 (PST)
X-Gm-Message-State: 
 AMke39lHhnKhUdvDOqeTltANnHk6ncJhXNXv7MfDSh8LNSqAdU31n+tsrTzZuqJSMCiHJZPxp0gFwUpNuo/3FQ==
X-Received: by 10.55.183.133 with SMTP id
	h127mr21969423qkf.121.1489174512805;
	Fri, 10 Mar 2017 11:35:12 -0800 (PST)
MIME-Version: 1.0
Received: by 10.12.163.229 with HTTP; Fri, 10 Mar 2017 11:34:52 -0800 (PST)
In-Reply-To: 
 <CAG1aQhK6SVo_dckAXMca+4HKa4SbNu4CQ4TbYJvLP9bCF+FWsw@mail.gmail.com>
References: <1488908341-42514-1-git-send-email-yihung.wei@gmail.com>
	<1488908341-42514-2-git-send-email-yihung.wei@gmail.com>
	<CAPWQB7FJqKCLRza2e-q+NEkUsLOqxpmg7eQopH8PjOcoHdcoeQ@mail.gmail.com>
	<CAG1aQhK6SVo_dckAXMca+4HKa4SbNu4CQ4TbYJvLP9bCF+FWsw@mail.gmail.com>
From: Joe Stringer <joe@ovn.org>
Date: Fri, 10 Mar 2017 11:34:52 -0800
X-Gmail-Original-Message-ID: 
 <CAPWQB7G0kPWwU+-mx7T8F8h9xfPXiohwJD6uv-iH_PKqEeOk0g@mail.gmail.com>
Message-ID: 
 <CAPWQB7G0kPWwU+-mx7T8F8h9xfPXiohwJD6uv-iH_PKqEeOk0g@mail.gmail.com>
To: Yi-Hung Wei <yihung.wei@gmail.com>
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW,
	RCVD_IN_SORBS_SPAM autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: ovs dev <dev@openvswitch.org>
Subject: Re: [ovs-dev] [PATCH 2/2] ofproto: Add ref counting for variable
	length mf_fields.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

On 9 March 2017 at 10:22, Yi-Hung Wei <yihung.wei@gmail.com> wrote:
> On Wed, Mar 8, 2017 at 11:40 AM, Joe Stringer <joe@ovn.org> wrote:
>> > @@ -145,6 +150,7 @@ learn_execute(const struct ofpact_learn *learn,
>> > const struct flow *flow,
>> >                           spec->n_bits);
>> >              bitwise_one(ofpact_set_field_mask(sf),
>> > spec->dst.field->n_bytes,
>> >                          spec->dst.ofs, spec->n_bits);
>> > +            mf_vl_mff_set_tlv_bitmap(spec->dst.field,
>> > &fm->ofpacts_tlv_bitmap);
>> >              break;
>> >
>> >          case NX_LEARN_DST_OUTPUT:
>>
>> I think we should also set the tlv bitmap in this LEARN_DST_OUTPUT
>> case? IIUC this uses an OXM field to determine the output port. I
>> believe it's possible for someone to use a TLV to specify an output
>> action using LEARN - so it would set the bitmap in the same way that
>> the NX_LEARN_DST_LOAD case does.
>
> Yes, for output action in learn, we can specify a TLV to be the SRC field,
> but the DST field will eventually be an immediate value that we derive from
> the SRC field. So the ref count for a TLV in LEARN_OUTPUT is incresed when
> we decode the learn flow indecode_NXAST_RAW_LEARN(), but not when we create
> the "learned" flow in learn_execute().

If I follow, you're saying that the learn flow (original) takes a
reference on the field, so when we create the "learned" flow in
learn_execute, we don't take a reference on the field. But could the
original learn flow be deleted and leave the learned flow behind
without a reference?

>>
>>
>> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>> > index 40704e628aaa..7807b3055f5b 100644
>> > --- a/lib/meta-flow.c
>> > +++ b/lib/meta-flow.c
>> > @@ -27,6 +27,8 @@
>> >  #include "openvswitch/dynamic-string.h"
>> >  #include "nx-match.h"
>> >  #include "openvswitch/ofp-util.h"
>> > +#include "ofproto/ofproto-provider.h"
>> > +#include "ovs-atomic.h"
>> >  #include "ovs-rcu.h"
>> >  #include "ovs-thread.h"
>> >  #include "packets.h"
>> > @@ -2646,6 +2648,7 @@ field_array_set(enum mf_field_id id, const union
>> > mf_value *value,
>> >   * struct vl_mff_map.*/
>> >  struct vl_mf_field {
>> >      struct mf_field mf;
>> > +    struct ovs_refcount ref_cnt;
>> >      struct cmap_node cmap_node; /* In ofproto->vl_mff_map->cmap. */
>> >  };
>> >
>> > @@ -2655,17 +2658,25 @@ mf_field_hash(uint32_t key)
>> >      return hash_int(key, 0);
>> >  }
>> >
>> > -void
>> > -mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map)
>> > +enum ofperr
>> > +mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool clear)
>> >      OVS_REQUIRES(vl_mff_map->mutex)
>> >  {
>> >      struct vl_mf_field *vmf;
>> >
>> >      CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
>> > -        cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
>> > -                    mf_field_hash(vmf->mf.id));
>> > -        ovsrcu_postpone(free, vmf);
>> > +        if (clear) {
>> > +            cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
>> > +                        mf_field_hash(vmf->mf.id));
>> > +            ovsrcu_postpone(free, vmf);
>> > +        } else {
>> > +            if (ovs_refcount_read(&vmf->ref_cnt) != 1) {
>> > +                return OFPERR_NXTTMFC_INVALID_TLV_DEL;
>> > +            }
>> > +        }
>> >      }
>> > +
>> > +    return 0;
>> >  }
>>
>> I found this a bit confusing. mf_vl_mff_map_clear() has a boolean,
>> also named 'clear' that determines whether it will actually clear or
>> not. If you call a function like "clear(clear=false)", what would it
>> intuitively do?
>>
>> I see two calling conventions - first, for TLV clear command from
>> controller:
>> error = mf_vl_mff_map_clear(vl_mff_map, false);
>> if (error)
>>     return error;
>> error = mf_vl_mff_map_clear(vl_mff_map, true);
>>
>> Second, from ofproto cleanup code:
>> mf_vl_mff_map_clear(vl_mff_map, true);
>>
>> I suggest that this would be easier to understand if the function were
>> structured with the the bool to be 'force':
>>
>> enum ofperr
>> mf_vl_mff_map_clear(struct vl_mff_map *vl_mff_map, bool force)
>>    OVS_REQUIRES(vl_mff_map->mutex)
>> {
>>    struct vl_mf_field *vmf;
>>
>>     /* First, check whether any flows still refer to the current TLV
>> map. */
>>    if (!force) {
>>        CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
>>            if (ovs_refcount_read(&vmf->ref_cnt) != 1) {
>>                return OFPERR_NXTTMFC_INVALID_TLV_DEL;
>>            }
>>        }
>>    }
>>
>>    CMAP_FOR_EACH (vmf, cmap_node, &vl_mff_map->cmap) {
>>        cmap_remove(&vl_mff_map->cmap, &vmf->cmap_node,
>>                    mf_field_hash(vmf->mf.id));
>>        ovsrcu_postpone(free, vmf);
>>    }
>>
>>    return 0;
>> }
>
> Thanks for suggestion, it is much clear after modification.
>
>>
>>
>> You may also consider refactoring the ovsrcu_postpone(free, vmf) into
>> a separate, vmf_delete() function which looks something like this:
>>
>> static void vmf_delete(struct vl_mf_field *vmf)
>> {
>>    if (ovs_refcount_unref(&vmf->ref_cnt) == 1) {
>>         /* Postpone as this function is typically called immediately
>> after removing from cmap. */
>>        ovsrcu_postpone(free, vmf);
>>    } else {
>>        VLOG_WARN_RL(&rl, "Attempted to delete VMF %s but refcount is
>> nonzero!",
>>                  vmf->mf.name);
>>    }
>> }
>>
>> This could be reused from the other delete locations as well.
>
> I do not factor out the ovsrcu_postpone(), becuase the only case that we
> force to clear the vl_mff_map is when we destory ofproto. Otherwise, we are
> not going to remove vmf from vl_mff_map if ref_count > 1.

I guess my point was partly that both mf_vl_mff_map_del() and
mf_vl_mff_map_clear() can potentially RCU-defer freeing of the VMF,
and if we push that out to a separate function here that ensures that
the refcount is what we expect it to be, then it's more defensively
coded against potential misuse - attempting to free when it's still
being used, for instance.

>> > +static void
>> > +mf_vl_mff_ref_cnt_mod(const struct vl_mff_map *map, uint64_t
>> > tlv_bitmap,
>> > +                      bool ref)
>> > +{
>> > +    struct vl_mf_field *vmf;
>> > +    int i;
>> > +
>> > +    if (map) {
>> > +        ULLONG_FOR_EACH_1 (i, tlv_bitmap) {
>> > +            vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map);
>> > +            if (vmf) {
>> > +                if (ref) {
>> > +                    ovs_refcount_ref(&vmf->ref_cnt);
>> > +                } else {
>> > +                    ovs_refcount_unref(&vmf->ref_cnt);
>>
>> We should warn if we're trying to unref and this is the final ref,
>> because then we're not freeing the object.
>
> It could be the case that no active flow is currently use this tlv, but the
> controller is going to use this tlv entry for some new flows. In this case,
> it is not necessary to free vmf since it will be useful.

Typically when refcounts are used, the last one to decrement it -> 0
is responsible for freeing it. Also, if it goes to 0 and something is
able to find the element, then it probably shouldn't increment the
refcount because most likely the piece of code that decremented it to
0 also rcu-deferred the free. It's easier to reason about how we
ensure that the memory a) valid to access and b) is eventually freed
if we stick to this idiom.

For most cases, this is hit from the main thread which handles
OpenFlow messages; if it's decoding a message and acquiring references
to fields, then by definition it can't also be removing the fields
that we're trying to reference, so I don't think this should ever
decrement the refcount -> 0; therefore, if we warn, we could easily
see that we have some bad assumption in how we understand this code.

I will note though, I also see some learn_execute() callers for this,
which means that another thread is getting involved - perhaps an
upcall handler or perhaps a revalidator. This makes me a little more
nervous about ensuring that references to the fields are being
correctly counted and whether we may be trying to retain access to a
field which was recently freed. Most likely it's fine, since that
other thread would have to find a flow which references these fields
to be able to take another reference (eg for the "learned" flow); but
warning something like this would be more explicit in this function:

                VLOG_WARN("Invalid TLV index %d.", i);

Maybe it's over the top. But it would be nicer to know through a log
message than just have some crazy bug show up later on and try to
figure out that this is what is happening.

>> > @@ -1164,6 +1169,8 @@ decode_NXAST_RAW_OUTPUT_REG2(const struct
>> > nx_action_output_reg2 *naor,
>> >      if (error) {
>> >          return error;
>> >      }
>> > +    mf_vl_mff_set_tlv_bitmap(output_reg->src.field, tlv_bitmap);
>> > +
>>
>> I wonder if it's cleaner to fold this into nx_pull_header().
>>
>> It's not so obvious why the ofpacts decode functions in this file
>> inconsistently need to call the mf_check_vl_mff() and/or
>> mf_vl_mff_set_tlv_bitmap(), which makes it a little more difficult to
>> verify (and easier to mess up if new actions are added). It'd be nice
>> to make this more consistent.
>
>
> Thanks, it is hard to follow as in v1. Actually, we use nx_pull_header(),
> nx_pull_entry(), and mf_from_nxm_header() to get a mf_field in ofp-action.c
> and those three functions are heavily use in other part of the code base
> that may not need to update the bitmap. So instead to fold
> mf_vl_mff_set_tlv_bitmap() directly into the three functions, three wrapper
> functions mf_vl_mff_nx_pull_header(), mf_vl_mff_nx_pull_entry(), and
> mf_vl_mff_mf_from_nxm_header() are created that handle mf_field operation
> and set the tlv_bitmap, and it should be more easier to follow now.

OK, fair enough, thanks.

>> > @@ -1603,6 +1599,12 @@ ofproto_destroy__(struct ofproto *ofproto)
>> >      }
>> >      free(ofproto->tables);
>> >
>> > +    ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
>> > +    mf_vl_mff_map_clear(&ofproto->vl_mff_map, true);
>> > +    ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
>> > +    cmap_destroy(&ofproto->vl_mff_map.cmap);
>> > +    ovs_mutex_destroy(&ofproto->vl_mff_map.mutex);
>> > +
>> >      ovs_assert(hindex_is_empty(&ofproto->cookies));
>> >      hindex_destroy(&ofproto->cookies);
>> >
>>
>> Is there a reason for the above two hunks to shift this logic later?
>> Should this be in a separate patch to explain why, or could it be
>> dropped?
>>
> Cause I feel more comfortable to remove the vl_mf_field after all the rules
> are deleted. In practice, it is invoked when ofproto is destroyed, so it
> probabaly does not matter?

I think it should be another patch to move it, and describe this
concern in the commit message.

>> > @@ -7784,14 +7802,15 @@ handle_tlv_table_mod(struct ofconn *ofconn,
>> > const struct ofp_header *oh)
>> >      old_tab = ovsrcu_get_protected(struct tun_table *,
>> > &ofproto->metadata_tab);
>> >      error = tun_metadata_table_mod(&ttm, old_tab, &new_tab);
>> >      if (!error) {
>> > -        ovsrcu_set(&ofproto->metadata_tab, new_tab);
>> > -        tun_metadata_postpone_free(old_tab);
>> > -
>> >          ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
>> >          error =
>> > mf_vl_mff_map_mod_from_tun_metadata(&ofproto->vl_mff_map,
>> >                                                      &ttm);
>> >          ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
>> >      }
>> > +    if (!error) {
>> > +        ovsrcu_set(&ofproto->metadata_tab, new_tab);
>> > +        tun_metadata_postpone_free(old_tab);
>> > +    }
>>
>> Maybe I didn't quite dig deep enough, but what's the implications of
>> moving this? Could it fix another bug, should it be a separate patch?
>
>
> I'm not sure if it be a separate patch? What this change is that we can only
> modify update the TLV mapping table if all the table modification operations
> are valid. May it makes more sense like this
>
>     if (!error) {
>         ovs_mutex_lock(&ofproto->vl_mff_map.mutex);
>         error = mf_vl_mff_map_mod_from_tun_metadata(&ofproto->vl_mff_map,
>                                                     &ttm);
>         ovs_mutex_unlock(&ofproto->vl_mff_map.mutex);
>         if (!error) {
>             ovsrcu_set(&ofproto->metadata_tab, new_tab);
>             tun_metadata_postpone_free(old_tab);
>         }
>     }

LGTM.

diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index e844008f6294..bef5aad768a3 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -2836,9 +2836,13 @@ mf_vl_mff_ref_cnt_mod(const struct vl_mff_map
*map, uint64_t tlv_bitmap,
            vmf = mf_get_vl_mff__(i + MFF_TUN_METADATA0, map);
            if (vmf) {
                if (ref) {
-                    ovs_refcount_ref(&vmf->ref_cnt);
+                    if (ovs_refcount_ref(&vmf->ref_cnt) == 0) {
+                        VLOG_WARN("Taking reference on freed VMF %d", i);
+                    }
                } else {
-                    ovs_refcount_unref(&vmf->ref_cnt);
+                    if (ovs_refcount_unref(&vmf->ref_cnt) == 1) {
+                        VLOG_WARN("Dereferencing VMF %d without free", i);
+                    }
                }
            } else {