Message ID | 20230517150507.20452-1-bnemeth@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,v9] ofproto-dpif-upcall: Don't set statistics to 0 when they jump back | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 17 May 2023, at 17:05, Balazs Nemeth wrote: > The only way that stats->{n_packets,n_bytes} would decrease is due to an > overflow, or if there are bugs in how statistics are handled. In the > past, there were multiple issues that caused a jump backward. A > workaround was in place to set the statistics to 0 in that case. When > this happened while the revalidator was under heavy load, the workaround > had an unintended side effect where should_revalidate returned false > causing the flow to be removed because the metric it calculated was > based on a bogus value. Since many of those bugs have now been > identified and resolved, there is no need to set the statistics to 0. In > addition, the (unlikely) overflow still needs to be handled > appropriately. If an unexpected jump does happen, just log it as a > warning. One small addition, the rest looks good to me. //Eelco > Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> > --- It is common practice for OVS to add a version history here. So reviewers have an idea what changes without manually doing the diffs. > ofproto/ofproto-dpif-upcall.c | 24 ++++++++++++++++-------- > 1 file changed, 16 insertions(+), 8 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index cd57fdbd9..139eb6c77 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -2368,24 +2368,32 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, > enum reval_result result = UKEY_DELETE; > struct dpif_flow_stats push; > > - ofpbuf_clear(odp_actions); > - > push.used = stats->used; > push.tcp_flags = stats->tcp_flags; > - push.n_packets = (stats->n_packets > ukey->stats.n_packets > - ? stats->n_packets - ukey->stats.n_packets > - : 0); > - push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes > - ? stats->n_bytes - ukey->stats.n_bytes > - : 0); > + push.n_packets = stats->n_packets - ukey->stats.n_packets; > + push.n_bytes = stats->n_bytes - ukey->stats.n_bytes; > > if (stats->n_packets < ukey->stats.n_packets && > ukey->stats.n_packets < UINT64_THREE_QUARTERS) { > + static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 5); > + struct ds ds = DS_EMPTY_INITIALIZER; > + > /* Report cases where the packet counter is lower than the previous > * instance, but exclude the potential wrapping of an uint64_t. */ > COVERAGE_INC(ukey_invalid_stat_reset); > + + odp_format_ufid(&ukey->ufid, &ds); + ds_put_cstr(&ds, " "); This is so we also have the ufid of the flow. > + odp_flow_key_format(ukey->key, ukey->key_len, &ds); > + ds_put_cstr(&ds, ", actions:"); > + format_odp_actions(&ds, odp_actions->data, odp_actions->size, NULL); > + > + VLOG_WARN_RL(&rll, "Unexpected jump in packet stats from %"PRIu64 > + " to %"PRIu64" when handling ukey %s", > + stats->n_packets, ukey->stats.n_packets, ds_cstr(&ds)); > + ds_destroy(&ds); > } > > + ofpbuf_clear(odp_actions); > + > if (need_revalidate) { > if (should_revalidate(udpif, ukey, push.n_packets)) { > if (!ukey->xcache) { > -- > 2.40.1
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index cd57fdbd9..139eb6c77 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -2368,24 +2368,32 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, enum reval_result result = UKEY_DELETE; struct dpif_flow_stats push; - ofpbuf_clear(odp_actions); - push.used = stats->used; push.tcp_flags = stats->tcp_flags; - push.n_packets = (stats->n_packets > ukey->stats.n_packets - ? stats->n_packets - ukey->stats.n_packets - : 0); - push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes - ? stats->n_bytes - ukey->stats.n_bytes - : 0); + push.n_packets = stats->n_packets - ukey->stats.n_packets; + push.n_bytes = stats->n_bytes - ukey->stats.n_bytes; if (stats->n_packets < ukey->stats.n_packets && ukey->stats.n_packets < UINT64_THREE_QUARTERS) { + static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 5); + struct ds ds = DS_EMPTY_INITIALIZER; + /* Report cases where the packet counter is lower than the previous * instance, but exclude the potential wrapping of an uint64_t. */ COVERAGE_INC(ukey_invalid_stat_reset); + + odp_flow_key_format(ukey->key, ukey->key_len, &ds); + ds_put_cstr(&ds, ", actions:"); + format_odp_actions(&ds, odp_actions->data, odp_actions->size, NULL); + + VLOG_WARN_RL(&rll, "Unexpected jump in packet stats from %"PRIu64 + " to %"PRIu64" when handling ukey %s", + stats->n_packets, ukey->stats.n_packets, ds_cstr(&ds)); + ds_destroy(&ds); } + ofpbuf_clear(odp_actions); + if (need_revalidate) { if (should_revalidate(udpif, ukey, push.n_packets)) { if (!ukey->xcache) {
The only way that stats->{n_packets,n_bytes} would decrease is due to an overflow, or if there are bugs in how statistics are handled. In the past, there were multiple issues that caused a jump backward. A workaround was in place to set the statistics to 0 in that case. When this happened while the revalidator was under heavy load, the workaround had an unintended side effect where should_revalidate returned false causing the flow to be removed because the metric it calculated was based on a bogus value. Since many of those bugs have now been identified and resolved, there is no need to set the statistics to 0. In addition, the (unlikely) overflow still needs to be handled appropriately. If an unexpected jump does happen, just log it as a warning. Signed-off-by: Balazs Nemeth <bnemeth@redhat.com> --- ofproto/ofproto-dpif-upcall.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)