From patchwork Tue May 23 10:39:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Frode Nordahl X-Patchwork-Id: 1785080 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=Nvz4Ijfb; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QQW626v85z20Pb for ; Tue, 23 May 2023 20:39:37 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 4EC0C821C3; Tue, 23 May 2023 10:39:35 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 4EC0C821C3 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=Nvz4Ijfb X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7Zk3OcM-s2K3; Tue, 23 May 2023 10:39:33 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id CA93F821AB; Tue, 23 May 2023 10:39:32 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org CA93F821AB Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8C2FCC0035; Tue, 23 May 2023 10:39:32 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id BBE1AC002A for ; Tue, 23 May 2023 10:39:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 8937F60F0C for ; Tue, 23 May 2023 10:39:30 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 8937F60F0C Authentication-Results: smtp3.osuosl.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=Nvz4Ijfb X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l1JHW9XZaZ7n for ; Tue, 23 May 2023 10:39:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 88B6860E8A Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by smtp3.osuosl.org (Postfix) with ESMTPS id 88B6860E8A for ; Tue, 23 May 2023 10:39:28 +0000 (UTC) Received: from frode-threadripper.. (ti0189a430-1823.bb.online.no [85.167.118.38]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 1A1A13F26C; Tue, 23 May 2023 10:39:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1684838364; bh=zp+Q9/FYZhmj4gnzaPgISQvZhNEA5aAHke1pGOOBpiY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Nvz4Ijfba0CiQ3BmWJC9J5Zd/mhv0ovKHUdMPDFX04VJPjEyhTU0dmbXWsZkir/9g YJK3+iVIHSkoTkjn7PbjN9YPWymM8eb11oxv1Uhanlw6G9PGOfGJAmzqB80Bk8GtsV NEDowF51TovbeZaCDh/i5Eb8LvIT27eSBic5iMEjKIKgpZ5kJ2sfi/x/hcOzh731cc Dc5ZgyAJa7GL9KSXjE0tsCB+zXlNiFR9yOkE5GQWFQ7pfX1I/UwkThsfdXzRas+E2O Yn6I8p6j9WVHXSWGqty/PU0fw0X7BkBr8gVXOj1uBl6KWDvsH7EeO5H7+tX9AoYb0m v7hozSowirHPw== From: Frode Nordahl To: dev@openvswitch.org Date: Tue, 23 May 2023 12:39:21 +0200 Message-Id: <20230523103921.3505877-1-frode.nordahl@canonical.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Cc: Simon Horman Subject: [ovs-dev] [PATCH v4] tc: fix crash on EAGAIN return from recvmsg on netlink socket. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. `tc_transact` in turn calls `nl_transact`, which via `nl_transact_multiple__` ultimately calls and handles return value from `recvmsg`. On error a check for EAGAIN is performed and a consequence of this condition is effectively to provide a non-error (0) result and an empty reply buffer. Before this change, the `tc_transact` and, consumers of it, were unaware of this condition. The use of assertions on the reply buffer can as such lead to a fatal crash of OVS. To be fair, the behavior of `nl_transact` when handling an EAGAIN return is currently not documented, so this change also adds that documentation. While fixing the problem, it led me to find potential problems with the one-time initialization functions in the netdev-offload-tc module. Make use of the information now available about an EAGAIN condition to retry one-time initialization, and resort to logging a warning if probing of tc features fails due to temporary situations such as resource depletion. For the record, the symptom of the crash is this in the log: EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 tc_replace_flower (id=, flower=) at ../lib/tc.c:3223 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=, actions=, actions_len=, ufid=, info=, stats=) at ../lib/netdev-offload-tc.c:2096 0x0000561dac117541 in netdev_flow_put (stats=, info=0x7fb65b7ba780, ufid=, act_len=, actions=, match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes: f98e418fbdb6 ("tc: Add tc flower functions") Fixes: 407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message in a transaction.") Acked-by: Roi Dayan Reviewed-by: Simon Horman Signed-off-by: Frode Nordahl Acked-by: Eelco Chaudron --- lib/dpif.c | 12 +++++-- lib/netdev-linux.c | 7 ++-- lib/netdev-offload-tc.c | 73 +++++++++++++++++++++++++++++++++++++---- lib/netlink-socket.c | 5 +++ lib/tc.c | 26 +++++++++++++++ 5 files changed, 109 insertions(+), 14 deletions(-) diff --git a/lib/dpif.c b/lib/dpif.c index 3305401fe..f97a57e7b 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -1749,9 +1749,15 @@ flow_message_log_level(int error) /* If flows arrive in a batch, userspace may push down multiple * unique flow definitions that overlap when wildcards are applied. * Kernels that support flow wildcarding will reject these flows as - * duplicates (EEXIST), so lower the log level to debug for these - * types of messages. */ - return (error && error != EEXIST) ? VLL_WARN : VLL_DBG; + * duplicates (EEXIST). + * + * Some subsystems expose temporary error conditions such as EAGAIN return + * for operations on non-blocking sockets. This is done to make the right + * decissions during processing. + * + * If they bubble up here we ought to not log those as a warning, so lower + * the log level to debug for these types of messages. */ + return (error && error != EEXIST && error != EAGAIN) ? VLL_WARN : VLL_DBG; } static bool diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 36620199e..f5c241474 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -6235,8 +6235,7 @@ tc_query_qdisc(const struct netdev *netdev_) * On Linux 2.6.35+ we use the straightforward method because it allows us * to handle non-builtin qdiscs without handle 1:0 (e.g. codel). However, * in such a case we get no response at all from the kernel (!) if a - * builtin qdisc is in use (which is later caught by "!error && - * !qdisc->size"). */ + * builtin qdisc is in use (which is later caught by "error == EAGAIN" */ tcmsg = netdev_linux_tc_make_request(netdev_, RTM_GETQDISC, NLM_F_ECHO, &request); if (!tcmsg) { @@ -6247,7 +6246,7 @@ tc_query_qdisc(const struct netdev *netdev_) /* Figure out what tc class to instantiate. */ error = tc_transact(&request, &qdisc); - if (!error && qdisc->size) { + if (!error) { const char *kind; error = tc_parse_qdisc(qdisc, &kind, NULL); @@ -6262,7 +6261,7 @@ tc_query_qdisc(const struct netdev *netdev_) ops = &tc_ops_other; } } - } else if ((!error && !qdisc->size) || error == ENOENT) { + } else if (error == ENOENT || error == EAGAIN) { /* Either it's a built-in qdisc, or (on Linux pre-2.6.35) it's a qdisc * set up by some other entity that doesn't have a handle 1:0. We will * assume that it's the system default qdisc. */ diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 4f26dd8cc..c8ca280a7 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -2571,6 +2571,28 @@ netdev_tc_get_n_flows(struct netdev *netdev, uint64_t *n_flows) return 0; } +/* This macro is for use by one-time initialization functions, where we have + * one shot per thread/process to perform a pertinent initialization task that + * may return a temporary error (EAGAIN). + * + * With min/max values of 1/64 we would retry 7 times, spending at the + * most 127 * 1E6 nsec (0.127s) sleeping. + */ +#define NETDEV_OFFLOAD_TC_BACKOFF_MIN 1 +#define NETDEV_OFFLOAD_TC_BACKOFF_MAX 64 +#define NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(ERROR, CONDITION, FUNCTION, ...) \ + for (uint64_t backoff = NETDEV_OFFLOAD_TC_BACKOFF_MIN; \ + backoff <= NETDEV_OFFLOAD_TC_BACKOFF_MAX; \ + backoff <<= 1) \ + { \ + ERROR = (FUNCTION)(__VA_ARGS__); \ + if (CONDITION) { \ + xnanosleep(backoff * 1E6); \ + continue; \ + } \ + break; \ + } + static void probe_multi_mask_per_prio(int ifindex) { @@ -2594,8 +2616,13 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id1, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + tc_replace_flower, &id1, &flower); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe for multiple mask " + "support: %s", ovs_strerror(error)); + } goto out; } @@ -2603,10 +2630,15 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.src_mac, 0xff, sizeof flower.mask.src_mac); id2 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id2, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + tc_replace_flower, &id2, &flower); tc_del_flower_filter(&id1); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe for multiple mask " + "support: %s", ovs_strerror(error)); + } goto out; } @@ -2657,8 +2689,13 @@ probe_ct_state_support(int ifindex) goto out; } - error = tc_get_flower(&id, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + tc_get_flower, &id, &flower); if (error || flower.mask.ct_state != ct_state) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out_del; } @@ -2670,10 +2707,16 @@ probe_ct_state_support(int ifindex) /* Test for reject, ct_state >= MAX */ ct_state = ~0; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (!error) { /* No reject, can't continue probing other flags */ goto out_del; + } else if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + goto out_del; } tc_del_flower_filter(&id); @@ -2682,8 +2725,14 @@ probe_ct_state_support(int ifindex) memset(&flower, 0, sizeof flower); ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_INVALID; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out; } @@ -2695,8 +2744,14 @@ probe_ct_state_support(int ifindex) ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_ESTABLISHED | TCA_FLOWER_KEY_CT_FLAGS_REPLY; - error = probe_insert_ct_state_rule(ifindex, ct_state, &id); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + probe_insert_ct_state_rule, ifindex, + ct_state, &id); if (error) { + if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe ct_state support: %s", + ovs_strerror(error)); + } goto out; } @@ -2732,13 +2787,17 @@ probe_tc_block_support(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id, &flower); + NETDEV_OFFLOAD_TC_RETRY_WITH_BACKOFF(error, error == EAGAIN, + tc_replace_flower, &id, &flower); tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS); if (!error) { block_support = true; VLOG_INFO("probe tc: block offload is supported."); + } else if (error == EAGAIN) { + VLOG_WARN("probe tc: unable to probe block offload support: %s", + ovs_strerror(error)); } } diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 80da20d9f..dea060fc3 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock) * * 2. Resending the request causes it to be re-executed, so the request * needs to be idempotent. + * + * 3. In the event that the kernel is too busy to handle the request to + * receive the response (i.e. EAGAIN), this function will still return + * 0. The caller can check for this condition by checking for a zero + * size of the 'replyp' ofpbuf buffer. */ int nl_transact(int protocol, const struct ofpbuf *request, diff --git a/lib/tc.c b/lib/tc.c index 5c32c6f97..0396570ac 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -36,6 +36,7 @@ #include #include "byte-order.h" +#include "coverage.h" #include "netlink-socket.h" #include "netlink.h" #include "openvswitch/ofpbuf.h" @@ -67,6 +68,8 @@ VLOG_DEFINE_THIS_MODULE(tc); +COVERAGE_DEFINE(tc_transact_eagain); + static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(60, 5); static enum tc_offload_policy tc_policy = TC_POLICY_NONE; @@ -237,11 +240,34 @@ static void request_from_tcf_id(struct tcf_id *id, uint16_t eth_type, } } +/* The `tc_transact` function is a wrapper around `nl_transact` with the + * addition of: + * + * 1. the 'request' ofpbuf buffer is freed after `nl_transact` returns, + * regardless of success or failure. + * + * 2. When a 'replyp' pointer is provided; in the event of the kernel + * being too busy to process the request for the response, a positive + * error return will be provided with the value of EAGAIN. + * + * Please acquaint yourself with the documentation of the `nl_transact` + * function in the netlink-socket module before making use of this function. + */ int tc_transact(struct ofpbuf *request, struct ofpbuf **replyp) { int error = nl_transact(NETLINK_ROUTE, request, replyp); ofpbuf_uninit(request); + + if (!error && replyp && !(*replyp)->size) { + COVERAGE_INC(tc_transact_eagain); + /* We replicate the behavior of `nl_transact` for error conditions and + * free any allocations before setting the 'replyp' buffer to NULL. */ + ofpbuf_delete(*replyp); + *replyp = NULL; + return EAGAIN; + } + return error; }