Message ID | 20230511100350.3898835-1-frode.nordahl@canonical.com |
---|---|
State | Superseded |
Delegated to: | Simon Horman |
Headers | show |
Series | [ovs-dev,v2] tc: fix crash on EAGAIN return from recvmsg on netlink socket. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 11/05/2023 13:03, Frode Nordahl wrote: > The tc module combines the use of the `tc_transact` helper > function for communication with the in-kernel tc infrastructure > with assertions on the reply data by `ofpbuf_at_assert` on the > received data prior to further processing. > > `tc_transact` in turn calls `nl_transact`, which via > `nl_transact_multiple__` ultimately calls and handles return > value from `recvmsg`. On error a check for EAGAIN is performed > and a consequence of this condition is effectively to provide a > non-error (0) result and an empty reply buffer. > > Before this change, the `tc_transact` and, consumers of it, were > unaware of this condition. The use of assertions on the reply > buffer can as such lead to a fatal crash of OVS. > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > return is currently not documented, so this change also adds that > documentation. > > While fixing the problem, it led me to find potential problems > with several of the one-time initialization functions in > the netdev-offload-tc module. Before this patch they would > happily accept temporary failure as permanent one, with > consequences for the remaining lifetime of the process. Good catch on EAGAIN returning empty buffer. I have a question about the change to loop EAGAIN in netdev-offload-tc calls. Why not to keep treat EAGAIN as an error? the revalidator will call again if needed. In current situation what will happen if kernel will keep giving EAGAIN? why to keep ovs in this loop and not resume normal operation until next revalidation? This is so OVS could keep do other tasks as needed. > > For the record, the symptom of the crash is this in the log: > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() > > And an excerpt of the backtrace looks like this: > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223 > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096 > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 > > Reported-At: https://launchpad.net/bugs/2018500 > Fixes f98e418fbdb6 (tc: Add tc flower functions) > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) > Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> > --- > lib/dpif-netlink.c | 4 +++- > lib/dpif.c | 2 +- > lib/netdev-linux.c | 40 ++++++++++++++++++++++++++-------------- > lib/netdev-offload-tc.c | 31 ++++++++++++++++++++++--------- > lib/netlink-socket.c | 5 +++++ > lib/tc.c | 18 ++++++++++++++++++ > 6 files changed, 75 insertions(+), 25 deletions(-) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 60bd39643..75537b2f8 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -2340,7 +2340,9 @@ parse_flow_put(struct dpif_netlink *dpif, struct dpif_flow_put *put) > } > netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true); > } > - level = (err == ENOSPC || err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; > + level = (err == EAGAIN || > + err == ENOSPC || > + err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; > VLOG_RL(&rl, level, "failed to offload flow: %s: %s", > ovs_strerror(err), > (oor_netdev ? oor_netdev->name : dev->name)); > diff --git a/lib/dpif.c b/lib/dpif.c > index 3305401fe..2bbb0aac4 100644 > --- a/lib/dpif.c > +++ b/lib/dpif.c > @@ -1751,7 +1751,7 @@ flow_message_log_level(int error) > * Kernels that support flow wildcarding will reject these flows as > * duplicates (EEXIST), so lower the log level to debug for these > * types of messages. */ > - return (error && error != EEXIST) ? VLL_WARN : VLL_DBG; > + return (error && error != EEXIST && error != EAGAIN) ? VLL_WARN : VLL_DBG; > } > > static bool > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index 36620199e..a6065bf69 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) > > error = tc_transact(&request, &replyp); > if (error) { > - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", > - index, error); > + VLOG_RL(&rl, > + (error == EAGAIN) ? VLL_DBG : VLL_ERR, > + "Failed to dump police action (index: %u), err=%d", > + index, > + error); > return error; > } > > @@ -5849,8 +5852,11 @@ tc_del_policer_action(uint32_t index, struct ofputil_meter_stats *stats) > > error = tc_transact(&request, &replyp); > if (error) { > - VLOG_ERR_RL(&rl, "Failed to delete police action (index: %u), err=%d", > - index, error); > + VLOG_RL(&rl, > + (error == EAGAIN) ? VLL_DBG : VLL_ERR, > + "Failed to delete police action (index: %u), err=%d", > + index, > + error); > return error; > } > > @@ -6114,11 +6120,13 @@ tc_query_class(const struct netdev *netdev, > > error = tc_transact(&request, replyp); > if (error) { > - VLOG_WARN_RL(&rl, "query %s class %u:%u (parent %u:%u) failed (%s)", > - netdev_get_name(netdev), > - tc_get_major(handle), tc_get_minor(handle), > - tc_get_major(parent), tc_get_minor(parent), > - ovs_strerror(error)); > + VLOG_RL(&rl, > + (error == EAGAIN) ? VLL_DBG : VLL_WARN, > + "query %s class %u:%u (parent %u:%u) failed (%s)", > + netdev_get_name(netdev), > + tc_get_major(handle), tc_get_minor(handle), > + tc_get_major(parent), tc_get_minor(parent), > + ovs_strerror(error)); > } > return error; > } > @@ -6140,10 +6148,12 @@ tc_delete_class(const struct netdev *netdev, unsigned int handle) > > error = tc_transact(&request, NULL); > if (error) { > - VLOG_WARN_RL(&rl, "delete %s class %u:%u failed (%s)", > - netdev_get_name(netdev), > - tc_get_major(handle), tc_get_minor(handle), > - ovs_strerror(error)); > + VLOG_RL(&rl, > + (error == EAGAIN) ? VLL_DBG : VLL_WARN, > + "delete %s class %u:%u failed (%s)", > + netdev_get_name(netdev), > + tc_get_major(handle), tc_get_minor(handle), > + ovs_strerror(error)); > } > return error; > } > @@ -6262,7 +6272,9 @@ tc_query_qdisc(const struct netdev *netdev_) > ops = &tc_ops_other; > } > } > - } else if ((!error && !qdisc->size) || error == ENOENT) { > + } else if ((!error && !qdisc->size) || > + error == ENOENT || error == EAGAIN) > + { > /* Either it's a built-in qdisc, or (on Linux pre-2.6.35) it's a qdisc > * set up by some other entity that doesn't have a handle 1:0. We will > * assume that it's the system default qdisc. */ > diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c > index 4f26dd8cc..4c15989d6 100644 > --- a/lib/netdev-offload-tc.c > +++ b/lib/netdev-offload-tc.c > @@ -2505,10 +2505,11 @@ netdev_tc_flow_get(struct netdev *netdev, > > err = tc_get_flower(&id, &flower); > if (err) { > - VLOG_ERR_RL(&error_rl, > - "flow get failed (dev %s prio %d handle %d): %s", > - netdev_get_name(netdev), id.prio, id.handle, > - ovs_strerror(err)); > + VLOG_RL(&error_rl, > + (err == EAGAIN) ? VLL_DBG : VLL_ERR, > + "flow get failed (dev %s prio %d handle %d): %s", > + netdev_get_name(netdev), id.prio, id.handle, > + ovs_strerror(err)); > return err; > } > > @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex) > memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); > > id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > - error = tc_replace_flower(&id1, &flower); > + do { > + error = tc_replace_flower(&id1, &flower); > + } while (error == EAGAIN); > if (error) { > goto out; > } > @@ -2603,7 +2606,9 @@ probe_multi_mask_per_prio(int ifindex) > memset(&flower.mask.src_mac, 0xff, sizeof flower.mask.src_mac); > > id2 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > - error = tc_replace_flower(&id2, &flower); > + do { > + error = tc_replace_flower(&id2, &flower); > + } while (error == EAGAIN); > tc_del_flower_filter(&id1); > > if (error) { > @@ -2625,6 +2630,7 @@ probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id *id) > { > int prio = TC_RESERVED_PRIORITY_MAX + 1; > struct tc_flower flower; > + int error; > > memset(&flower, 0, sizeof flower); > flower.key.ct_state = ct_state; > @@ -2634,7 +2640,10 @@ probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id *id) > flower.mask.eth_type = OVS_BE16_MAX; > > *id = tc_make_tcf_id(ifindex, 0, prio, TC_INGRESS); > - return tc_replace_flower(id, &flower); > + do { > + error = tc_replace_flower(id, &flower); > + } while (error == EAGAIN); > + return error; > } > > static void > @@ -2657,7 +2666,9 @@ probe_ct_state_support(int ifindex) > goto out; > } > > - error = tc_get_flower(&id, &flower); > + do { > + error = tc_get_flower(&id, &flower); > + } while (error == EAGAIN); > if (error || flower.mask.ct_state != ct_state) { > goto out_del; > } > @@ -2732,7 +2743,9 @@ probe_tc_block_support(int ifindex) > memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); > > id = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > - error = tc_replace_flower(&id, &flower); > + do { > + error = tc_replace_flower(&id, &flower); > + } while (error == EAGAIN); > > tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS); > > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c > index 80da20d9f..dea060fc3 100644 > --- a/lib/netlink-socket.c > +++ b/lib/netlink-socket.c > @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock) > * > * 2. Resending the request causes it to be re-executed, so the request > * needs to be idempotent. > + * > + * 3. In the event that the kernel is too busy to handle the request to > + * receive the response (i.e. EAGAIN), this function will still return > + * 0. The caller can check for this condition by checking for a zero > + * size of the 'replyp' ofpbuf buffer. > */ > int > nl_transact(int protocol, const struct ofpbuf *request, > diff --git a/lib/tc.c b/lib/tc.c > index 5c32c6f97..7df6a10c5 100644 > --- a/lib/tc.c > +++ b/lib/tc.c > @@ -237,11 +237,29 @@ static void request_from_tcf_id(struct tcf_id *id, uint16_t eth_type, > } > } > > +/* The `tc_transact` function is a wrapper around `nl_transact` with the > + * addition of: > + * > + * 1. the 'request' ofpbuf buffer is freed after `nl_transact` returns, > + * regardless of success or failure. > + * > + * 2. When a 'replyp' pointer is provided; in the event of the kernel > + * being too busy to process the request for the response, a positive > + * error return will be provided with the value of EAGAIN. > + * > + * Please acquaint yourself with the documentation of the `nl_transact` > + * function in the netlink-sock module before making use of this function. > + */ > int > tc_transact(struct ofpbuf *request, struct ofpbuf **replyp) > { > int error = nl_transact(NETLINK_ROUTE, request, replyp); > ofpbuf_uninit(request); > + > + if (!error && replyp && !(*replyp)->size) { > + return EAGAIN; > + } > + > return error; > } >
Hello, Roi, Thank you for your prompt reply, see response in-line below. On Thu, May 11, 2023 at 12:51 PM Roi Dayan <roid@nvidia.com> wrote: > On 11/05/2023 13:03, Frode Nordahl wrote: > > The tc module combines the use of the `tc_transact` helper > > function for communication with the in-kernel tc infrastructure > > with assertions on the reply data by `ofpbuf_at_assert` on the > > received data prior to further processing. > > > > `tc_transact` in turn calls `nl_transact`, which via > > `nl_transact_multiple__` ultimately calls and handles return > > value from `recvmsg`. On error a check for EAGAIN is performed > > and a consequence of this condition is effectively to provide a > > non-error (0) result and an empty reply buffer. > > > > Before this change, the `tc_transact` and, consumers of it, were > > unaware of this condition. The use of assertions on the reply > > buffer can as such lead to a fatal crash of OVS. > > > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > > return is currently not documented, so this change also adds that > > documentation. > > > > While fixing the problem, it led me to find potential problems > > with several of the one-time initialization functions in > > the netdev-offload-tc module. Before this patch they would > > happily accept temporary failure as permanent one, with > > consequences for the remaining lifetime of the process. > > Good catch on EAGAIN returning empty buffer. > I have a question about the change to loop EAGAIN in netdev-offload-tc calls. > Why not to keep treat EAGAIN as an error? the revalidator will call > again if needed. > In current situation what will happen if kernel will keep giving EAGAIN? > why to keep ovs in this loop and not resume normal operation until next revalidation? > This is so OVS could keep do other tasks as needed. The change to loop was only made in one-time initialization functions (i.e. `probe_multi_mask_per_prio`, `probe_insert_ct_state_rule`, `probe_ct_state_support`, `probe_tc_block_support`). AFAICT, these are all ultimately called from within a one time initialization block, i.e: if (ovsthread_once_start(&once)) {} Which means they are only attempted once per ovs-vswitchd process invocation. Or am I missing something?
On Thu, May 11, 2023 at 12:03:50PM +0200, Frode Nordahl wrote: > The tc module combines the use of the `tc_transact` helper > function for communication with the in-kernel tc infrastructure > with assertions on the reply data by `ofpbuf_at_assert` on the > received data prior to further processing. > > `tc_transact` in turn calls `nl_transact`, which via > `nl_transact_multiple__` ultimately calls and handles return > value from `recvmsg`. On error a check for EAGAIN is performed > and a consequence of this condition is effectively to provide a > non-error (0) result and an empty reply buffer. > > Before this change, the `tc_transact` and, consumers of it, were > unaware of this condition. The use of assertions on the reply > buffer can as such lead to a fatal crash of OVS. > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > return is currently not documented, so this change also adds that > documentation. > > While fixing the problem, it led me to find potential problems > with several of the one-time initialization functions in > the netdev-offload-tc module. Before this patch they would > happily accept temporary failure as permanent one, with > consequences for the remaining lifetime of the process. > > For the record, the symptom of the crash is this in the log: > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() > > And an excerpt of the backtrace looks like this: > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223 > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096 > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 > > Reported-At: https://launchpad.net/bugs/2018500 > Fixes f98e418fbdb6 (tc: Add tc flower functions) > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) > Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> Hi Frode, thanks for the patch, a nettlesome problem to be sure. ... > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index 36620199e..a6065bf69 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) > > error = tc_transact(&request, &replyp); > if (error) { > - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", > - index, error); > + VLOG_RL(&rl, > + (error == EAGAIN) ? VLL_DBG : VLL_ERR, > + "Failed to dump police action (index: %u), err=%d", > + index, > + error); nit: could we economise the number of lines? VLOG_RL(&rl, (error == EAGAIN) ? VLL_DBG : VLL_ERR, "Failed to dump police action (index: %u), err=%d", index, error); > return error; > } > ... > @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex) > memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); > > id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > - error = tc_replace_flower(&id1, &flower); > + do { > + error = tc_replace_flower(&id1, &flower); Perhaps there is a reason why this can't arise, but I'm concerned that this loop could spin forever. > + } while (error == EAGAIN); > if (error) { > goto out; > } ...
On Fri, May 12, 2023 at 10:35 AM Simon Horman <simon.horman@corigine.com> wrote: > > On Thu, May 11, 2023 at 12:03:50PM +0200, Frode Nordahl wrote: > > The tc module combines the use of the `tc_transact` helper > > function for communication with the in-kernel tc infrastructure > > with assertions on the reply data by `ofpbuf_at_assert` on the > > received data prior to further processing. > > > > `tc_transact` in turn calls `nl_transact`, which via > > `nl_transact_multiple__` ultimately calls and handles return > > value from `recvmsg`. On error a check for EAGAIN is performed > > and a consequence of this condition is effectively to provide a > > non-error (0) result and an empty reply buffer. > > > > Before this change, the `tc_transact` and, consumers of it, were > > unaware of this condition. The use of assertions on the reply > > buffer can as such lead to a fatal crash of OVS. > > > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > > return is currently not documented, so this change also adds that > > documentation. > > > > While fixing the problem, it led me to find potential problems > > with several of the one-time initialization functions in > > the netdev-offload-tc module. Before this patch they would > > happily accept temporary failure as permanent one, with > > consequences for the remaining lifetime of the process. > > > > For the record, the symptom of the crash is this in the log: > > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() > > > > And an excerpt of the backtrace looks like this: > > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223 > > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, > > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096 > > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, > > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 > > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 > > > > Reported-At: https://launchpad.net/bugs/2018500 > > Fixes f98e418fbdb6 (tc: Add tc flower functions) > > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) > > Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> > > Hi Frode, > > thanks for the patch, a nettlesome problem to be sure. Thank you for taking the time to review! > ... > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > index 36620199e..a6065bf69 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) > > > > error = tc_transact(&request, &replyp); > > if (error) { > > - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", > > - index, error); > > + VLOG_RL(&rl, > > + (error == EAGAIN) ? VLL_DBG : VLL_ERR, > > + "Failed to dump police action (index: %u), err=%d", > > + index, > > + error); > > nit: could we economise the number of lines? Sure, I will compact these. > VLOG_RL(&rl, (error == EAGAIN) ? VLL_DBG : VLL_ERR, > "Failed to dump police action (index: %u), err=%d", > index, error); > > > return error; > > } > > > > ... > > > @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex) > > memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); > > > > id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > > - error = tc_replace_flower(&id1, &flower); > > + do { > > + error = tc_replace_flower(&id1, &flower); > > Perhaps there is a reason why this can't arise, > but I'm concerned that this loop could spin forever. I understand the concern, my rationale is that if we make the wrong decision here because of a temporary error, it will have consequences for the rest of the execution time of the process/thread, which may be a source of inconsistent behavior. But we absolutely do not want to hang forever either. I'll revisit this and see if there is a way to retry the initialization steps for a finite amount of attempts/time and move on with a warning/error logged after that. Does that sound reasonable?
On Fri, May 12, 2023 at 11:25:02AM +0200, Frode Nordahl wrote: > On Fri, May 12, 2023 at 10:35 AM Simon Horman <simon.horman@corigine.com> wrote: > > > > On Thu, May 11, 2023 at 12:03:50PM +0200, Frode Nordahl wrote: > > > The tc module combines the use of the `tc_transact` helper > > > function for communication with the in-kernel tc infrastructure > > > with assertions on the reply data by `ofpbuf_at_assert` on the > > > received data prior to further processing. > > > > > > `tc_transact` in turn calls `nl_transact`, which via > > > `nl_transact_multiple__` ultimately calls and handles return > > > value from `recvmsg`. On error a check for EAGAIN is performed > > > and a consequence of this condition is effectively to provide a > > > non-error (0) result and an empty reply buffer. > > > > > > Before this change, the `tc_transact` and, consumers of it, were > > > unaware of this condition. The use of assertions on the reply > > > buffer can as such lead to a fatal crash of OVS. > > > > > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > > > return is currently not documented, so this change also adds that > > > documentation. > > > > > > While fixing the problem, it led me to find potential problems > > > with several of the one-time initialization functions in > > > the netdev-offload-tc module. Before this patch they would > > > happily accept temporary failure as permanent one, with > > > consequences for the remaining lifetime of the process. > > > > > > For the record, the symptom of the crash is this in the log: > > > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() > > > > > > And an excerpt of the backtrace looks like this: > > > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > > > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223 > > > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, > > > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096 > > > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, > > > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > > > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 > > > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 > > > > > > Reported-At: https://launchpad.net/bugs/2018500 > > > Fixes f98e418fbdb6 (tc: Add tc flower functions) > > > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) > > > Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> > > > > Hi Frode, > > > > thanks for the patch, a nettlesome problem to be sure. > > Thank you for taking the time to review! > > > ... > > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > > index 36620199e..a6065bf69 100644 > > > --- a/lib/netdev-linux.c > > > +++ b/lib/netdev-linux.c > > > @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) > > > > > > error = tc_transact(&request, &replyp); > > > if (error) { > > > - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", > > > - index, error); > > > + VLOG_RL(&rl, > > > + (error == EAGAIN) ? VLL_DBG : VLL_ERR, > > > + "Failed to dump police action (index: %u), err=%d", > > > + index, > > > + error); > > > > nit: could we economise the number of lines? > > Sure, I will compact these. > > > VLOG_RL(&rl, (error == EAGAIN) ? VLL_DBG : VLL_ERR, > > "Failed to dump police action (index: %u), err=%d", > > index, error); > > > > > return error; > > > } > > > > > > > ... > > > > > @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex) > > > memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); > > > > > > id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); > > > - error = tc_replace_flower(&id1, &flower); > > > + do { > > > + error = tc_replace_flower(&id1, &flower); > > > > Perhaps there is a reason why this can't arise, > > but I'm concerned that this loop could spin forever. > > I understand the concern, my rationale is that if we make the wrong decision > here because of a temporary error, it will have consequences for the rest of > the execution time of the process/thread, which may be a source of > inconsistent behavior. But we absolutely do not want to hang forever either. > > I'll revisit this and see if there is a way to retry the > initialization steps for a > finite amount of attempts/time and move on with a warning/error logged > after that. > > Does that sound reasonable? Yes, that would be what I would do too.
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 60bd39643..75537b2f8 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -2340,7 +2340,9 @@ parse_flow_put(struct dpif_netlink *dpif, struct dpif_flow_put *put) } netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true); } - level = (err == ENOSPC || err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; + level = (err == EAGAIN || + err == ENOSPC || + err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; VLOG_RL(&rl, level, "failed to offload flow: %s: %s", ovs_strerror(err), (oor_netdev ? oor_netdev->name : dev->name)); diff --git a/lib/dpif.c b/lib/dpif.c index 3305401fe..2bbb0aac4 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -1751,7 +1751,7 @@ flow_message_log_level(int error) * Kernels that support flow wildcarding will reject these flows as * duplicates (EEXIST), so lower the log level to debug for these * types of messages. */ - return (error && error != EEXIST) ? VLL_WARN : VLL_DBG; + return (error && error != EEXIST && error != EAGAIN) ? VLL_WARN : VLL_DBG; } static bool diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 36620199e..a6065bf69 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -5817,8 +5817,11 @@ tc_get_policer_action(uint32_t index, struct ofputil_meter_stats *stats) error = tc_transact(&request, &replyp); if (error) { - VLOG_ERR_RL(&rl, "Failed to dump police action (index: %u), err=%d", - index, error); + VLOG_RL(&rl, + (error == EAGAIN) ? VLL_DBG : VLL_ERR, + "Failed to dump police action (index: %u), err=%d", + index, + error); return error; } @@ -5849,8 +5852,11 @@ tc_del_policer_action(uint32_t index, struct ofputil_meter_stats *stats) error = tc_transact(&request, &replyp); if (error) { - VLOG_ERR_RL(&rl, "Failed to delete police action (index: %u), err=%d", - index, error); + VLOG_RL(&rl, + (error == EAGAIN) ? VLL_DBG : VLL_ERR, + "Failed to delete police action (index: %u), err=%d", + index, + error); return error; } @@ -6114,11 +6120,13 @@ tc_query_class(const struct netdev *netdev, error = tc_transact(&request, replyp); if (error) { - VLOG_WARN_RL(&rl, "query %s class %u:%u (parent %u:%u) failed (%s)", - netdev_get_name(netdev), - tc_get_major(handle), tc_get_minor(handle), - tc_get_major(parent), tc_get_minor(parent), - ovs_strerror(error)); + VLOG_RL(&rl, + (error == EAGAIN) ? VLL_DBG : VLL_WARN, + "query %s class %u:%u (parent %u:%u) failed (%s)", + netdev_get_name(netdev), + tc_get_major(handle), tc_get_minor(handle), + tc_get_major(parent), tc_get_minor(parent), + ovs_strerror(error)); } return error; } @@ -6140,10 +6148,12 @@ tc_delete_class(const struct netdev *netdev, unsigned int handle) error = tc_transact(&request, NULL); if (error) { - VLOG_WARN_RL(&rl, "delete %s class %u:%u failed (%s)", - netdev_get_name(netdev), - tc_get_major(handle), tc_get_minor(handle), - ovs_strerror(error)); + VLOG_RL(&rl, + (error == EAGAIN) ? VLL_DBG : VLL_WARN, + "delete %s class %u:%u failed (%s)", + netdev_get_name(netdev), + tc_get_major(handle), tc_get_minor(handle), + ovs_strerror(error)); } return error; } @@ -6262,7 +6272,9 @@ tc_query_qdisc(const struct netdev *netdev_) ops = &tc_ops_other; } } - } else if ((!error && !qdisc->size) || error == ENOENT) { + } else if ((!error && !qdisc->size) || + error == ENOENT || error == EAGAIN) + { /* Either it's a built-in qdisc, or (on Linux pre-2.6.35) it's a qdisc * set up by some other entity that doesn't have a handle 1:0. We will * assume that it's the system default qdisc. */ diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c index 4f26dd8cc..4c15989d6 100644 --- a/lib/netdev-offload-tc.c +++ b/lib/netdev-offload-tc.c @@ -2505,10 +2505,11 @@ netdev_tc_flow_get(struct netdev *netdev, err = tc_get_flower(&id, &flower); if (err) { - VLOG_ERR_RL(&error_rl, - "flow get failed (dev %s prio %d handle %d): %s", - netdev_get_name(netdev), id.prio, id.handle, - ovs_strerror(err)); + VLOG_RL(&error_rl, + (err == EAGAIN) ? VLL_DBG : VLL_ERR, + "flow get failed (dev %s prio %d handle %d): %s", + netdev_get_name(netdev), id.prio, id.handle, + ovs_strerror(err)); return err; } @@ -2594,7 +2595,9 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id1 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id1, &flower); + do { + error = tc_replace_flower(&id1, &flower); + } while (error == EAGAIN); if (error) { goto out; } @@ -2603,7 +2606,9 @@ probe_multi_mask_per_prio(int ifindex) memset(&flower.mask.src_mac, 0xff, sizeof flower.mask.src_mac); id2 = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id2, &flower); + do { + error = tc_replace_flower(&id2, &flower); + } while (error == EAGAIN); tc_del_flower_filter(&id1); if (error) { @@ -2625,6 +2630,7 @@ probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id *id) { int prio = TC_RESERVED_PRIORITY_MAX + 1; struct tc_flower flower; + int error; memset(&flower, 0, sizeof flower); flower.key.ct_state = ct_state; @@ -2634,7 +2640,10 @@ probe_insert_ct_state_rule(int ifindex, uint16_t ct_state, struct tcf_id *id) flower.mask.eth_type = OVS_BE16_MAX; *id = tc_make_tcf_id(ifindex, 0, prio, TC_INGRESS); - return tc_replace_flower(id, &flower); + do { + error = tc_replace_flower(id, &flower); + } while (error == EAGAIN); + return error; } static void @@ -2657,7 +2666,9 @@ probe_ct_state_support(int ifindex) goto out; } - error = tc_get_flower(&id, &flower); + do { + error = tc_get_flower(&id, &flower); + } while (error == EAGAIN); if (error || flower.mask.ct_state != ct_state) { goto out_del; } @@ -2732,7 +2743,9 @@ probe_tc_block_support(int ifindex) memset(&flower.mask.dst_mac, 0xff, sizeof flower.mask.dst_mac); id = tc_make_tcf_id(ifindex, block_id, prio, TC_INGRESS); - error = tc_replace_flower(&id, &flower); + do { + error = tc_replace_flower(&id, &flower); + } while (error == EAGAIN); tc_add_del_qdisc(ifindex, false, block_id, TC_INGRESS); diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 80da20d9f..dea060fc3 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock) * * 2. Resending the request causes it to be re-executed, so the request * needs to be idempotent. + * + * 3. In the event that the kernel is too busy to handle the request to + * receive the response (i.e. EAGAIN), this function will still return + * 0. The caller can check for this condition by checking for a zero + * size of the 'replyp' ofpbuf buffer. */ int nl_transact(int protocol, const struct ofpbuf *request, diff --git a/lib/tc.c b/lib/tc.c index 5c32c6f97..7df6a10c5 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -237,11 +237,29 @@ static void request_from_tcf_id(struct tcf_id *id, uint16_t eth_type, } } +/* The `tc_transact` function is a wrapper around `nl_transact` with the + * addition of: + * + * 1. the 'request' ofpbuf buffer is freed after `nl_transact` returns, + * regardless of success or failure. + * + * 2. When a 'replyp' pointer is provided; in the event of the kernel + * being too busy to process the request for the response, a positive + * error return will be provided with the value of EAGAIN. + * + * Please acquaint yourself with the documentation of the `nl_transact` + * function in the netlink-sock module before making use of this function. + */ int tc_transact(struct ofpbuf *request, struct ofpbuf **replyp) { int error = nl_transact(NETLINK_ROUTE, request, replyp); ofpbuf_uninit(request); + + if (!error && replyp && !(*replyp)->size) { + return EAGAIN; + } + return error; }
The tc module combines the use of the `tc_transact` helper function for communication with the in-kernel tc infrastructure with assertions on the reply data by `ofpbuf_at_assert` on the received data prior to further processing. `tc_transact` in turn calls `nl_transact`, which via `nl_transact_multiple__` ultimately calls and handles return value from `recvmsg`. On error a check for EAGAIN is performed and a consequence of this condition is effectively to provide a non-error (0) result and an empty reply buffer. Before this change, the `tc_transact` and, consumers of it, were unaware of this condition. The use of assertions on the reply buffer can as such lead to a fatal crash of OVS. To be fair, the behavior of `nl_transact` when handling an EAGAIN return is currently not documented, so this change also adds that documentation. While fixing the problem, it led me to find potential problems with several of the one-time initialization functions in the netdev-offload-tc module. Before this patch they would happily accept temporary failure as permanent one, with consequences for the remaining lifetime of the process. For the record, the symptom of the crash is this in the log: EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size failed in ofpbuf_at_assert() And an excerpt of the backtrace looks like this: 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 tc_replace_flower (id=<optimized out>, flower=<optimized out>) at ../lib/tc.c:3223 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at ../lib/netdev-offload-tc.c:2096 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, actions=<optimized out>, match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2297 try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at ../lib/dpif-netlink.c:2384 Reported-At: https://launchpad.net/bugs/2018500 Fixes f98e418fbdb6 (tc: Add tc flower functions) Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message in a transaction.) Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com> --- lib/dpif-netlink.c | 4 +++- lib/dpif.c | 2 +- lib/netdev-linux.c | 40 ++++++++++++++++++++++++++-------------- lib/netdev-offload-tc.c | 31 ++++++++++++++++++++++--------- lib/netlink-socket.c | 5 +++++ lib/tc.c | 18 ++++++++++++++++++ 6 files changed, 75 insertions(+), 25 deletions(-)