From patchwork Tue Aug 21 03:25:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Pfaff X-Patchwork-Id: 960078 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=ovn.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41vbg33CC4z9rvt for ; Tue, 21 Aug 2018 13:26:06 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id D6E26CD8; Tue, 21 Aug 2018 03:26:02 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 75C068B1 for ; Tue, 21 Aug 2018 03:26:01 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from relay6-d.mail.gandi.net (relay6-d.mail.gandi.net [217.70.183.198]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2FA14E6 for ; Tue, 21 Aug 2018 03:25:59 +0000 (UTC) X-Originating-IP: 173.228.112.177 Received: from sigabrt.gateway.sonic.net (173-228-112-177.dsl.dynamic.fusionbroadband.com [173.228.112.177]) (Authenticated sender: blp@ovn.org) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 1D92DC0003; Tue, 21 Aug 2018 03:25:55 +0000 (UTC) From: Ben Pfaff To: dev@openvswitch.org Date: Mon, 20 Aug 2018 20:25:51 -0700 Message-Id: <20180821032551.32555-1-blp@ovn.org> X-Mailer: git-send-email 2.16.1 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Ben Pfaff Subject: [ovs-dev] [PATCH] ofproto-dpif-xlate: Avoid deadlock on multicast snooping recursion. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Until now, OVS did multicast snooping outputs holding the read-lock on the mcast_snooping object. This could recurse via a patch port to try to take the write-lock on the same object, which deadlocked. This patch fixes the problem, by releasing the read-lock before doing any outputs. It would probably be better to use RCU for mcast_snooping. That would be a bigger patch and less suitable for backporting. Reported-by: Sameh Elsharkawy Reported-at: https://github.com/openvswitch/ovs-issues/issues/153 Signed-off-by: Ben Pfaff Reviewed-by: Yifeng Sun --- ofproto/ofproto-dpif-xlate.c | 102 +++++++++++++++++++++++++++++++++++-------- 1 file changed, 84 insertions(+), 18 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index e26f6c8f554a..7701b64a2451 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -534,6 +534,8 @@ static void do_xlate_actions(const struct ofpact *, size_t ofpacts_len, static void clone_xlate_actions(const struct ofpact *, size_t ofpacts_len, struct xlate_ctx *, bool, bool); static void xlate_normal(struct xlate_ctx *); +static void xlate_normal_flood(struct xlate_ctx *ct, + struct xbundle *in_xbundle, struct xvlan *); static void xlate_table_action(struct xlate_ctx *, ofp_port_t in_port, uint8_t table_id, bool may_packet_in, bool honor_table_miss, bool with_ct_orig, @@ -2690,6 +2692,53 @@ update_mcast_snooping_table(const struct xlate_ctx *ctx, } ovs_rwlock_unlock(&ms->rwlock); } + +/* A list of multicast output ports. + * + * We accumulate output ports and then do all the outputs afterward. It would + * be more natural to do the outputs one at a time as we discover the need for + * each one, but this can cause a deadlock because we need to take the + * mcast_snooping's rwlock for reading to iterate through the port lists and + * doing an output, if it goes to a patch port, can eventually come back to the + * same mcast_snooping and attempt to take the write lock (see + * https://github.com/openvswitch/ovs-issues/issues/153). */ +struct mcast_output { + /* Discrete ports. */ + struct xbundle **xbundles; + size_t n, allocated; + + /* If set, flood to all ports. */ + bool flood; +}; +#define MCAST_OUTPUT_INIT { NULL, 0, 0, false } + +/* Add 'mcast_bundle' to 'out'. */ +static void +mcast_output_add(struct mcast_output *out, struct xbundle *mcast_xbundle) +{ + if (out->n >= out->allocated) { + out->xbundles = x2nrealloc(out->xbundles, &out->allocated, + sizeof *out->xbundles); + } + out->xbundles[out->n++] = mcast_xbundle; +} + +/* Outputs the packet in 'ctx' to all of the output ports in 'out', given input + * bundle 'in_xbundle' and the current 'xvlan'. */ +static void +mcast_output_finish(struct xlate_ctx *ctx, struct mcast_output *out, + struct xbundle *in_xbundle, struct xvlan *xvlan) +{ + if (out->flood) { + xlate_normal_flood(ctx, in_xbundle, xvlan); + } else { + for (size_t i = 0; i < out->n; i++) { + output_normal(ctx, out->xbundles[i], xvlan); + } + } + + free(out->xbundles); +} /* send the packet to ports having the multicast group learned */ static void @@ -2697,7 +2746,7 @@ xlate_normal_mcast_send_group(struct xlate_ctx *ctx, struct mcast_snooping *ms OVS_UNUSED, struct mcast_group *grp, struct xbundle *in_xbundle, - const struct xvlan *xvlan) + struct mcast_output *out) OVS_REQ_RDLOCK(ms->rwlock) { struct mcast_group_bundle *b; @@ -2707,7 +2756,7 @@ xlate_normal_mcast_send_group(struct xlate_ctx *ctx, mcast_xbundle = xbundle_lookup(ctx->xcfg, b->port); if (mcast_xbundle && mcast_xbundle != in_xbundle) { xlate_report(ctx, OFT_DETAIL, "forwarding to mcast group port"); - output_normal(ctx, mcast_xbundle, xvlan); + mcast_output_add(out, mcast_xbundle); } else if (!mcast_xbundle) { xlate_report(ctx, OFT_WARN, "mcast group port is unknown, dropping"); @@ -2723,7 +2772,8 @@ static void xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx, struct mcast_snooping *ms, struct xbundle *in_xbundle, - const struct xvlan *xvlan) + const struct xvlan *xvlan, + struct mcast_output *out) OVS_REQ_RDLOCK(ms->rwlock) { struct mcast_mrouter_bundle *mrouter; @@ -2734,7 +2784,7 @@ xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx, if (mcast_xbundle && mcast_xbundle != in_xbundle && mrouter->vlan == xvlan->v[0].vid) { xlate_report(ctx, OFT_DETAIL, "forwarding to mcast router port"); - output_normal(ctx, mcast_xbundle, xvlan); + mcast_output_add(out, mcast_xbundle); } else if (!mcast_xbundle) { xlate_report(ctx, OFT_WARN, "mcast router port is unknown, dropping"); @@ -2753,7 +2803,7 @@ static void xlate_normal_mcast_send_fports(struct xlate_ctx *ctx, struct mcast_snooping *ms, struct xbundle *in_xbundle, - const struct xvlan *xvlan) + struct mcast_output *out) OVS_REQ_RDLOCK(ms->rwlock) { struct mcast_port_bundle *fport; @@ -2763,7 +2813,7 @@ xlate_normal_mcast_send_fports(struct xlate_ctx *ctx, mcast_xbundle = xbundle_lookup(ctx->xcfg, fport->port); if (mcast_xbundle && mcast_xbundle != in_xbundle) { xlate_report(ctx, OFT_DETAIL, "forwarding to mcast flood port"); - output_normal(ctx, mcast_xbundle, xvlan); + mcast_output_add(out, mcast_xbundle); } else if (!mcast_xbundle) { xlate_report(ctx, OFT_WARN, "mcast flood port is unknown, dropping"); @@ -2779,7 +2829,7 @@ static void xlate_normal_mcast_send_rports(struct xlate_ctx *ctx, struct mcast_snooping *ms, struct xbundle *in_xbundle, - const struct xvlan *xvlan) + struct mcast_output *out) OVS_REQ_RDLOCK(ms->rwlock) { struct mcast_port_bundle *rport; @@ -2792,7 +2842,7 @@ xlate_normal_mcast_send_rports(struct xlate_ctx *ctx, && mcast_xbundle->ofbundle != in_xbundle->ofbundle) { xlate_report(ctx, OFT_DETAIL, "forwarding report to mcast flagged port"); - output_normal(ctx, mcast_xbundle, xvlan); + mcast_output_add(out, mcast_xbundle); } else if (!mcast_xbundle) { xlate_report(ctx, OFT_WARN, "mcast port is unknown, dropping the report"); @@ -2944,8 +2994,11 @@ xlate_normal(struct xlate_ctx *ctx) } if (mcast_snooping_is_membership(flow->tp_src)) { + struct mcast_output out = MCAST_OUTPUT_INIT; + ovs_rwlock_rdlock(&ms->rwlock); - xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan); + xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan, + &out); /* RFC4541: section 2.1.1, item 1: A snooping switch should * forward IGMP Membership Reports only to those ports where * multicast routers are attached. Alternatively stated: a @@ -2954,8 +3007,10 @@ xlate_normal(struct xlate_ctx *ctx) * An administrative control may be provided to override this * restriction, allowing the report messages to be flooded to * other ports. */ - xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, &xvlan); + xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, &out); ovs_rwlock_unlock(&ms->rwlock); + + mcast_output_finish(ctx, &out, in_xbundle, &xvlan); } else { xlate_report(ctx, OFT_DETAIL, "multicast traffic, flooding"); xlate_normal_flood(ctx, in_xbundle, &xvlan); @@ -2968,10 +3023,15 @@ xlate_normal(struct xlate_ctx *ctx) in_xbundle, ctx->xin->packet); } if (is_mld_report(flow, wc)) { + struct mcast_output out = MCAST_OUTPUT_INIT; + ovs_rwlock_rdlock(&ms->rwlock); - xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan); - xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, &xvlan); + xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan, + &out); + xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, &out); ovs_rwlock_unlock(&ms->rwlock); + + mcast_output_finish(ctx, &out, in_xbundle, &xvlan); } else { xlate_report(ctx, OFT_DETAIL, "MLD query, flooding"); xlate_normal_flood(ctx, in_xbundle, &xvlan); @@ -2989,6 +3049,8 @@ xlate_normal(struct xlate_ctx *ctx) } /* forwarding to group base ports */ + struct mcast_output out = MCAST_OUTPUT_INIT; + ovs_rwlock_rdlock(&ms->rwlock); if (flow->dl_type == htons(ETH_TYPE_IP)) { grp = mcast_snooping_lookup4(ms, flow->nw_dst, vlan); @@ -2996,20 +3058,24 @@ xlate_normal(struct xlate_ctx *ctx) grp = mcast_snooping_lookup(ms, &flow->ipv6_dst, vlan); } if (grp) { - xlate_normal_mcast_send_group(ctx, ms, grp, in_xbundle, &xvlan); - xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, &xvlan); - xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan); + xlate_normal_mcast_send_group(ctx, ms, grp, in_xbundle, &out); + xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, &out); + xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan, + &out); } else { if (mcast_snooping_flood_unreg(ms)) { xlate_report(ctx, OFT_DETAIL, "unregistered multicast, flooding"); - xlate_normal_flood(ctx, in_xbundle, &xvlan); + out.flood = true; } else { - xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan); - xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, &xvlan); + xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, &xvlan, + &out); + xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, &out); } } ovs_rwlock_unlock(&ms->rwlock); + + mcast_output_finish(ctx, &out, in_xbundle, &xvlan); } else { ovs_rwlock_rdlock(&ctx->xbridge->ml->rwlock); mac = mac_learning_lookup(ctx->xbridge->ml, flow->dl_dst, vlan);