From patchwork Mon May 15 08:04:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frode Nordahl X-Patchwork-Id: 1781152 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=hiV0A1zx; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QKX3Q61LKz20db for ; Mon, 15 May 2023 18:05:06 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 7517381FDC; Mon, 15 May 2023 08:05:04 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 7517381FDC Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=hiV0A1zx X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MMf4YuuynRtW; Mon, 15 May 2023 08:05:02 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5D85F81FC7; Mon, 15 May 2023 08:05:01 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5D85F81FC7 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 2999CC0037; Mon, 15 May 2023 08:05:01 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id D763AC002A for ; Mon, 15 May 2023 08:04:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 9BB0940C25 for ; Mon, 15 May 2023 08:04:58 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 9BB0940C25 Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=hiV0A1zx X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z2Tn-9hflY1o for ; Mon, 15 May 2023 08:04:56 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org DF670408B0 Received: from smtp-relay-canonical-1.canonical.com (smtp-relay-canonical-1.canonical.com [185.125.188.121]) by smtp2.osuosl.org (Postfix) with ESMTPS id DF670408B0 for ; Mon, 15 May 2023 08:04:55 +0000 (UTC) Received: from frode-threadripper.. (2.general.frode.uk.vpn [10.172.193.251]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id 213A7412CD; Mon, 15 May 2023 08:04:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1684137890; bh=vHZOSOIuiwsBwY1oTo6/AjMjCckAwlZcj+iuRuqAuik=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hiV0A1zxCnA2uozL9WmBdTWkZwN83lI4E5J6NZb3HNMHMSEU/NWmkhFTJue40nlXn 5jqiCiCkHLw0ncHWjOIvimqTpl/MAHoJhWj4TY/HUbxcbRbqUXv9TWrsGLLIjVf3BW 2LA7jpOaNcfFc2LmwtzjatebnVDXKBoWchCG+feGFgs2VkvjcmUll6LhR1wcOU+4zP tkw7f+siuc+bNlBWqabolmZ80q2Nk3MquQIW5Kh+gQ7GWbpDnojNCJg0i0drx03bif Xjx2iJCHBh48gkSyhtCTYE1j9mJ6ehW9QBe3GtTccUlEobuFr5G6kfQynoftb+YzSo 68EKTYMDhmZvA== From: Frode Nordahl To: dev@openvswitch.org Date: Mon, 15 May 2023 10:04:48 +0200 Message-Id: <20230515080448.954879-1-frode.nordahl@canonical.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Cc: Simon Horman , Paul Blakey , Flavio Leitner Subject: [ovs-dev] [PATCH v3] tc: fix crash on EAGAIN return from recvmsg on netlink socket. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. `tc_transact` in turn calls `nl_transact`, which via `nl_transact_multiple__` ultimately calls and handles return value from `recvmsg`. On error a check for EAGAIN is performed and a consequence of this condition is effectively to provide a non-error (0) result and an empty reply buffer. Before this change, the `tc_transact` and, consumers of it, were unaware of this condition. The use of assertions on the reply buffer can as such lead to a fatal crash of OVS. To be fair, the behavior of `nl_transact` when handling an EAGAIN return is currently not documented, so this change also adds that documentation. While fixing the problem, it led me to find potential problems with the one-time initialization functions in the netdev-offload-tc module. Make use of the information now available about an EAGAIN condition to retry one-time initialization, and resort to logging a warning if probing of tc features fails due to temporary situations such as resource depletion. For the record, the symptom of the crash is this in the log: EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 tc_replace_flower (id=, flower=) at ../lib/tc.c:3223 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=, actions=, actions_len=, ufid=, info=, stats=) at ../lib/netdev-offload-tc.c:2096 0x0000561dac117541 in netdev_flow_put (stats=, info=0x7fb65b7ba780, ufid=, act_len=, actions=, match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes f98e418fbdb6 (tc: Add tc flower functions) Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) Signed-off-by: Frode Nordahl Acked-by: Roi Dayan Reviewed-by: Simon Horman --- lib/dpif-netlink.c | 4 +- lib/dpif.c | 2 +- lib/netdev-linux.c | 34 +++++++++------- lib/netdev-offload-tc.c | 86 +++++++++++++++++++++++++++++++++++------ lib/netlink-socket.c | 5 +++ lib/tc.c | 18 +++++++++ 6 files changed, 122 insertions(+), 27 deletions(-) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 60bd39643..75537b2f8 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -2340,7 +2340,9 @@ parse_flow_put(struct dpif_netlink *dpif, struct dpif_flow_put *put) } netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true); } - level = (err == ENOSPC || err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; + level = (err == EAGAIN || + err == ENOSPC || + err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; VLOG_RL(&rl, level, "failed to offload flow: %s: %s", ovs_strerror(err), (oor_netdev ? oor_netdev->name : dev->name)); diff --git a/lib/dpif.c b/lib/dpif.c index 3305401fe..2bbb0aac4 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -1751,7 +1751,7 @@ flow_message_log_level(int error) * Kernels that support flow wildcarding will reject these flows as * duplicates (EEXIST), so lower the log level to debug for these * types of messages. */ - return (error && error != EEXIST) ? VLL_WARN : VLL_DBG; + return (error && error != EEXIST && error != EAGAIN) ? VLL_WARN : VLL_DBG; } static bool diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 36620199e..a0294ca00 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -5817,8 +5817,9 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) error = tc_transact(&request, &replyp); if (error) { - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", - index, error); + VLOG_RL(&rl, error == EAGAIN ? VLL_DBG : VLL_ERR, + "Failed to dump police action (index: %u), err=%d", + index, error); return error; } @@ -5849,8 +5850,9 @@ tc_del_policer_action(uint32_t index, struct ofputil_meter_stats *stats) error = tc_transact(&request, &replyp); if (error) { - VLOG_ERR_RL(&rl, "Failed to delete police action (index: %u), err=%d", - index, error); + VLOG_RL(&rl, error == EAGAIN ? VLL_DBG : VLL_ERR, + "Failed to delete police action (index: %u), err=%d", + index, error); return error; } @@ -6114,11 +6116,12 @@ tc_query_class(const struct netdev *netdev, error = tc_transact(&request, replyp); if (error) { - VLOG_WARN_RL(&rl, "query %s class %u:%u (parent %u:%u) failed (%s)", - netdev_get_name(netdev), - tc_get_major(handle), tc_get_minor(handle), - tc_get_major(parent), tc_get_minor(parent), - ovs_strerror(error)); + VLOG_RL(&rl, error == EAGAIN ? VLL_DBG : VLL_WARN, + "query %s class %u:%u (parent %u:%u) failed (%s)", + netdev_get_name(netdev), + tc_get_major(handle), tc_get_minor(handle), + tc_get_major(parent), tc_get_minor(parent), + ovs_strerror(error)); } return error; } @@ -6140,10 +6143,11 @@ tc_delete_class(const struct netdev *netdev, unsigned int handle) error = tc_transact(&request, NULL); if (error) { - VLOG_WARN_RL(&rl, "delete %s class %u:%u failed (%s)", - netdev_get_name(netdev), - tc_get_major(handle), tc_get_minor(handle), - ovs_strerror(error)); + VLOG_RL(&rl, error == EAGAIN ? VLL_DBG : VLL_WARN, + "delete %s class %u:%u failed (%s)", + netdev_get_name(netdev), + tc_get_major(handle), tc_get_minor(handle), + ovs_strerror(error)); } return error; } @@ -6262,7 +6266,9 @@ tc_query_qdisc(const struct netdev *netdev_) ops = &tc_ops_other; } } - } else if ((!error && !qdisc->size) || error == ENOENT) { + } else if ((!error && !qdisc->size) || + error == ENOENT || error == EAGAIN) + { /* Either it's a built-in qdisc, or (on Linux pre-2.6.35) it's a qdisc * set up by some other entity that doesn't have a handle 1:0. We will * assume that it's the system default qdisc. */ diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 4f26dd8cc..4fb9dc239 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -2505,10 +2505,10 @@ netdev_tc_flow_get(struct netdev *netdev, err = tc_get_flower(&id, &flower); if (err) { - VLOG_ERR_RL(&error_rl, - "flow get failed (dev %s prio %d handle %d): %s", - netdev_get_name(netdev), id.prio, id.handle, - ovs_strerror(err)); + VLOG_RL(&error_rl, err == EAGAIN ? VLL_DBG : VLL_ERR, + "flow get failed (dev %s prio %d handle %d): %s", + netdev_get_name(netdev), id.prio, id.handle, + ovs_strerror(err)); return err; } @@ -2571,11 +2571,36 @@ netdev_tc_get_n_flows(struct netdev *netdev, uint64_t *n_flows) return 0; } +/* This macro is for use by one-time initialization functions, where we have + * one shot per thread/process to perform a pertinent initialization task that + * may return a temporary error (EAGAIN). + * + * With min/max values of 1/64 we would retry 7 times, spending at the + * most 127 * 1E6 nsec (0.127s) sleeping. + */ +#define NETDEV_OFFLOAD_TC_BACKOFF_MIN 1 +#define NETDEV_OFFLOAD_TC_BACKOFF_MAX 64 +#define NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(BACKOFF, ERROR, CONDITION, \ + FUNCTION, ...) \ + BACKOFF = NETDEV_OFFLOAD_TC_BACKOFF_MIN; \ + do { \ + ERROR = (FUNCTION)(__VA_ARGS__); \ + if (CONDITION) { \ + xnanosleep(BACKOFF * 1E6); \ + if (BACKOFF < NETDEV_OFFLOAD_TC_BACKOFF_MAX) { \ + BACKOFF <<= 1; \ + } else { \ + break; \ + } \ + } \ + } while (CONDITION); + static void probe_multi_mask_per_prio(int ifindex) { struct tc_flower flower; struct tcf_id id1, id2; + uint64_t backoff; int block_id = 0; int prio = 1; int error; @@ -2594,8 +2619,13 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id1, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + tc_replace_flower, &id1, &flower); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe for multiple mask " + "support: %s", ovs_strerror(error)); + } goto out; } @@ -2603,10 +2633,15 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.src_mac, 0xff, sizeof flower.mask.src_mac); id2 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id2, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + tc_replace_flower, &id2, &flower); tc_del_flower_filter(&id1); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe for multiple mask " + "support: %s", ovs_strerror(error)); + } goto out; } @@ -2642,6 +2677,7 @@ probe_ct_state_support(int ifindex) { struct tc_flower flower; uint16_t ct_state; + uint64_t backoff; struct tcf_id id; int error; @@ -2657,8 +2693,13 @@ probe_ct_state_support(int ifindex) goto out; } - error = tc_get_flower(&id, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + tc_get_flower, &id, &flower); if (error || flower.mask.ct_state != ct_state) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out_del; } @@ -2670,10 +2711,16 @@ probe_ct_state_support(int ifindex) /* Test for reject, ct_state >= MAX */ ct_state = ~0; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (!error) { /* No reject, can't continue probing other flags */ goto out_del; + } else if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + goto out_del; } tc_del_flower_filter(&id); @@ -2682,8 +2729,14 @@ probe_ct_state_support(int ifindex) memset(&flower, 0, sizeof flower); ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_INVALID; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out; } @@ -2695,8 +2748,14 @@ probe_ct_state_support(int ifindex) ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED | TCA_FLOWER_KEY_CT_FLAGS_REPLY; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out; } @@ -2714,6 +2773,7 @@ probe_tc_block_support(int ifindex) { struct tc_flower flower; uint32_t block_id = 1; + uint64_t backoff; struct tcf_id id; int prio = 0; int error; @@ -2732,13 +2792,17 @@ probe_tc_block_support(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(backoff, error, error == EAGAIN, + tc_replace_flower, &id, &flower); tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS); if (!error) { block_support = true; VLOG_INFO("probe tc: block offload is supported."); + } else if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe block offload support: %s", + ovs_strerror(error)); } } diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 80da20d9f..dea060fc3 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock) * * 2. Resending the request causes it to be re-executed, so the request * needs to be idempotent. + * + * 3. In the event that the kernel is too busy to handle the request to + * receive the response (i.e. EAGAIN), this function will still return + * 0. The caller can check for this condition by checking for a zero + * size of the 'replyp' ofpbuf buffer. */ int nl_transact(int protocol, const struct ofpbuf *request, diff --git a/lib/tc.c b/lib/tc.c index 5c32c6f97..7df6a10c5 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -237,11 +237,29 @@ static void request_from_tcf_id(struct tcf_id *id, uint16_t eth_type, } } +/* The `tc_transact` function is a wrapper around `nl_transact` with the + * addition of: + * + * 1. the 'request' ofpbuf buffer is freed after `nl_transact` returns, + * regardless of success or failure. + * + * 2. When a 'replyp' pointer is provided; in the event of the kernel + * being too busy to process the request for the response, a positive + * error return will be provided with the value of EAGAIN. + * + * Please acquaint yourself with the documentation of the `nl_transact` + * function in the netlink-sock module before making use of this function. + */ int tc_transact(struct ofpbuf *request, struct ofpbuf **replyp) { int error = nl_transact(NETLINK_ROUTE, request, replyp); ofpbuf_uninit(request); + + if (!error && replyp && !(*replyp)->size) { + return EAGAIN; + } + return error; }