Message ID | 20240404120938.2207251-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Commit | ed379a810ac81bfa7d86543d7b1914b17d2c0063 |
Headers | show |
Series | [ovs-dev] ofproto-dpif-upcall: Fix ukey installation failure logs and counters. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On 4 Apr 2024, at 14:09, Ilya Maximets wrote: > ukey_install() returns boolean signaling if the ukey was installed > or not. Installation may fail for a few reasons: > > 1. Conflicting ukey. > 2. Mutex contention while trying to replace existing ukey. > 3. The same ukey already exists and active. > > Only the first case here signals an actual problem. Third one is > a little odd for userspace datapath, but harmless. Second is the > most common one that can easily happen during normal operation > since other threads like revalidators may be currently working on > this ukey preventing an immediate access. > > Since only the first case is actually worth logging and it already > has its own log message, removing the 'upcall installation fails' > warning from the upcall_cb(). This should fix most of the random > failures of userspace system tests in CI. > > While at it, also fixing coverage counters. Mutex contention was > mistakenly counted as a duplicate upcall. ukey contention for > revalidators was counted only in one of two places. > > New counter added for the ukey contention on replace. We should > not re-use existing upcall_ukey_contention counter for this, since > it may lead to double counting. > > Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") > Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- Thanks for looking into this, and the patch looks good to me. Maybe we should have another patch fixing some of the namings? upcall_ukey_replace -> ukey_replace handler_duplicate_upcall -> duplicate_upcall upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention Cheers, Eelco Acked-by: Eelco Chaudron <echaudro@redhat.com>
On 4/5/24 14:33, Eelco Chaudron wrote: > > > On 4 Apr 2024, at 14:09, Ilya Maximets wrote: > >> ukey_install() returns boolean signaling if the ukey was installed >> or not. Installation may fail for a few reasons: >> >> 1. Conflicting ukey. >> 2. Mutex contention while trying to replace existing ukey. >> 3. The same ukey already exists and active. >> >> Only the first case here signals an actual problem. Third one is >> a little odd for userspace datapath, but harmless. Second is the >> most common one that can easily happen during normal operation >> since other threads like revalidators may be currently working on >> this ukey preventing an immediate access. >> >> Since only the first case is actually worth logging and it already >> has its own log message, removing the 'upcall installation fails' >> warning from the upcall_cb(). This should fix most of the random >> failures of userspace system tests in CI. >> >> While at it, also fixing coverage counters. Mutex contention was >> mistakenly counted as a duplicate upcall. ukey contention for >> revalidators was counted only in one of two places. >> >> New counter added for the ukey contention on replace. We should >> not re-use existing upcall_ukey_contention counter for this, since >> it may lead to double counting. >> >> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") >> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- > > Thanks for looking into this, and the patch looks good to me. > > Maybe we should have another patch fixing some of the namings? > > upcall_ukey_replace -> ukey_replace > handler_duplicate_upcall -> duplicate_upcall > upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention I had something similar in mind, but I didn't make any radical renaming since I plan to backport this change. One more thing is that we would likely want to distinguish contention in handlers/pmds from contention in revalidators, so these will need separate counters, I think. > > Cheers, > > Eelco > > Acked-by: Eelco Chaudron <echaudro@redhat.com> >
On 5 Apr 2024, at 15:04, Ilya Maximets wrote: > On 4/5/24 14:33, Eelco Chaudron wrote: >> >> >> On 4 Apr 2024, at 14:09, Ilya Maximets wrote: >> >>> ukey_install() returns boolean signaling if the ukey was installed >>> or not. Installation may fail for a few reasons: >>> >>> 1. Conflicting ukey. >>> 2. Mutex contention while trying to replace existing ukey. >>> 3. The same ukey already exists and active. >>> >>> Only the first case here signals an actual problem. Third one is >>> a little odd for userspace datapath, but harmless. Second is the >>> most common one that can easily happen during normal operation >>> since other threads like revalidators may be currently working on >>> this ukey preventing an immediate access. >>> >>> Since only the first case is actually worth logging and it already >>> has its own log message, removing the 'upcall installation fails' >>> warning from the upcall_cb(). This should fix most of the random >>> failures of userspace system tests in CI. >>> >>> While at it, also fixing coverage counters. Mutex contention was >>> mistakenly counted as a duplicate upcall. ukey contention for >>> revalidators was counted only in one of two places. >>> >>> New counter added for the ukey contention on replace. We should >>> not re-use existing upcall_ukey_contention counter for this, since >>> it may lead to double counting. >>> >>> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") >>> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") >>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >>> --- >> >> Thanks for looking into this, and the patch looks good to me. >> >> Maybe we should have another patch fixing some of the namings? >> >> upcall_ukey_replace -> ukey_replace >> handler_duplicate_upcall -> duplicate_upcall >> upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention > > I had something similar in mind, but I didn't make any radical > renaming since I plan to backport this change. > > One more thing is that we would likely want to distinguish > contention in handlers/pmds from contention in revalidators, > so these will need separate counters, I think. Ack but all can call the update functions ;) We can follow this up after the backport… >> >> Cheers, >> >> Eelco >> >> Acked-by: Eelco Chaudron <echaudro@redhat.com> >>
On 4/5/24 15:46, Eelco Chaudron wrote: > > > On 5 Apr 2024, at 15:04, Ilya Maximets wrote: > >> On 4/5/24 14:33, Eelco Chaudron wrote: >>> >>> >>> On 4 Apr 2024, at 14:09, Ilya Maximets wrote: >>> >>>> ukey_install() returns boolean signaling if the ukey was installed >>>> or not. Installation may fail for a few reasons: >>>> >>>> 1. Conflicting ukey. >>>> 2. Mutex contention while trying to replace existing ukey. >>>> 3. The same ukey already exists and active. >>>> >>>> Only the first case here signals an actual problem. Third one is >>>> a little odd for userspace datapath, but harmless. Second is the >>>> most common one that can easily happen during normal operation >>>> since other threads like revalidators may be currently working on >>>> this ukey preventing an immediate access. >>>> >>>> Since only the first case is actually worth logging and it already >>>> has its own log message, removing the 'upcall installation fails' >>>> warning from the upcall_cb(). This should fix most of the random >>>> failures of userspace system tests in CI. >>>> >>>> While at it, also fixing coverage counters. Mutex contention was >>>> mistakenly counted as a duplicate upcall. ukey contention for >>>> revalidators was counted only in one of two places. >>>> >>>> New counter added for the ukey contention on replace. We should >>>> not re-use existing upcall_ukey_contention counter for this, since >>>> it may lead to double counting. >>>> >>>> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") >>>> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") >>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >>>> --- >>> >>> Thanks for looking into this, and the patch looks good to me. >>> >>> Maybe we should have another patch fixing some of the namings? >>> >>> upcall_ukey_replace -> ukey_replace >>> handler_duplicate_upcall -> duplicate_upcall >>> upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention >> >> I had something similar in mind, but I didn't make any radical >> renaming since I plan to backport this change. >> >> One more thing is that we would likely want to distinguish >> contention in handlers/pmds from contention in revalidators, >> so these will need separate counters, I think. > > Ack but all can call the update functions ;) We can follow this up after the backport… Sure. > >>> >>> Cheers, >>> >>> Eelco >>> >>> Acked-by: Eelco Chaudron <echaudro@redhat.com> >>> > Thanks! Applied and backported down to 2.17. Best regards, Ilya Maximets.
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index d88195636..73901b651 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -59,6 +59,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(revalidate_missed_dp_flow); COVERAGE_DEFINE(ukey_dp_change); COVERAGE_DEFINE(ukey_invalid_stat_reset); +COVERAGE_DEFINE(ukey_replace_contention); COVERAGE_DEFINE(upcall_flow_limit_grew); COVERAGE_DEFINE(upcall_flow_limit_hit); COVERAGE_DEFINE(upcall_flow_limit_kill); @@ -1449,8 +1450,6 @@ upcall_cb(const struct dp_packet *packet, const struct flow *flow, ovs_u128 *ufi } if (upcall.ukey && !ukey_install(udpif, upcall.ukey)) { - static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 1); - VLOG_WARN_RL(&rll, "upcall_cb failure: ukey installation fails"); error = ENOSPC; } out: @@ -1948,15 +1947,15 @@ try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey, transition_ukey(old_ukey, UKEY_DELETED); transition_ukey(new_ukey, UKEY_VISIBLE); replaced = true; + COVERAGE_INC(upcall_ukey_replace); + } else { + COVERAGE_INC(handler_duplicate_upcall); } ovs_mutex_unlock(&old_ukey->mutex); - } - - if (replaced) { - COVERAGE_INC(upcall_ukey_replace); } else { - COVERAGE_INC(handler_duplicate_upcall); + COVERAGE_INC(ukey_replace_contention); } + return replaced; } @@ -3009,6 +3008,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) /* Handler threads could be holding a ukey lock while it installs a * new flow, so don't hang around waiting for access to it. */ if (ovs_mutex_trylock(&ukey->mutex)) { + COVERAGE_INC(upcall_ukey_contention); continue; } ukey_state = ukey->state;
ukey_install() returns boolean signaling if the ukey was installed or not. Installation may fail for a few reasons: 1. Conflicting ukey. 2. Mutex contention while trying to replace existing ukey. 3. The same ukey already exists and active. Only the first case here signals an actual problem. Third one is a little odd for userspace datapath, but harmless. Second is the most common one that can easily happen during normal operation since other threads like revalidators may be currently working on this ukey preventing an immediate access. Since only the first case is actually worth logging and it already has its own log message, removing the 'upcall installation fails' warning from the upcall_cb(). This should fix most of the random failures of userspace system tests in CI. While at it, also fixing coverage counters. Mutex contention was mistakenly counted as a duplicate upcall. ukey contention for revalidators was counted only in one of two places. New counter added for the ukey contention on replace. We should not re-use existing upcall_ukey_contention counter for this, since it may lead to double counting. Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.") Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- ofproto/ofproto-dpif-upcall.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)