diff mbox series

[ovs-dev] dpctl: Netdev leak causes stale VHU & reaching limit.

Message ID PAWPR07MB9877C192D7E7712C72A403EE90A52@PAWPR07MB9877.eurprd07.prod.outlook.com
State Superseded
Delegated to: Simon Horman
Headers show
Series [ovs-dev] dpctl: Netdev leak causes stale VHU & reaching limit. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Vipul Ashri July 11, 2024, 8:20 p.m. UTC
While running a test with a continous VM creation/deletion using an
orchestration script with-in cloud environment. In parallel we have
some monitoring script calling ovs-appctl dpctl/show stats commands
every minute.

During VHU port delete, one of netdev references were not reduced to
0 as show_dpif call has not given-up the reference back or doing bad
cleanup. This pending deference preventing VHU deletion sequence, this
is found to be one of corner case inside dpctl code which results in
leaking up netdev which ultimately results in stale VHU entry. After
fixing this problematic cleanup, issue is not seen.

Fixes: fceef2095222 ("dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev")
Signed-off-by: Vipul Ashri <vipul.ashri@ericsson.com>
---
 lib/dpctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Simon Horman July 15, 2024, 1:55 p.m. UTC | #1
On Thu, Jul 11, 2024 at 08:20:28PM +0000, Vipul Ashri via dev wrote:
> 
> While running a test with a continous VM creation/deletion using an
> orchestration script with-in cloud environment. In parallel we have
> some monitoring script calling ovs-appctl dpctl/show stats commands
> every minute.
> 
> During VHU port delete, one of netdev references were not reduced to
> 0 as show_dpif call has not given-up the reference back or doing bad
> cleanup. This pending deference preventing VHU deletion sequence, this
> is found to be one of corner case inside dpctl code which results in
> leaking up netdev which ultimately results in stale VHU entry. After
> fixing this problematic cleanup, issue is not seen.
> 
> Fixes: fceef2095222 ("dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev")
> Signed-off-by: Vipul Ashri <vipul.ashri@ericsson.com>

Acked-by: Simon Horman <horms@ovn.org>

> ---
>  lib/dpctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dpctl.c b/lib/dpctl.c
> index a70df5342..81e9ca0e9 100644
> --- a/lib/dpctl.c
> +++ b/lib/dpctl.c
> @@ -739,7 +739,6 @@ show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
>              }
>              error = netdev_get_stats(netdev, &s);

FWIIW, i think netdev_close() could go here.

>              if (!error) {
> -                netdev_close(netdev);
>                  print_stat(dpctl_p, "    RX packets:", s.rx_packets);
>                  print_stat(dpctl_p, " errors:", s.rx_errors);
>                  print_stat(dpctl_p, " dropped:", s.rx_dropped);
> @@ -770,6 +769,7 @@ show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
>                  dpctl_print(dpctl_p, ", could not retrieve stats (%s)",
>                              ovs_strerror(error));
>              }
> +            netdev_close(netdev);
>          }
>          dpif_port_destroy(&dpif_port);
>      }

...
Simon Horman July 15, 2024, 2:03 p.m. UTC | #2
On Mon, Jul 15, 2024 at 02:55:29PM +0100, Simon Horman wrote:
> On Thu, Jul 11, 2024 at 08:20:28PM +0000, Vipul Ashri via dev wrote:
> > 
> > While running a test with a continous VM creation/deletion using an
> > orchestration script with-in cloud environment. In parallel we have
> > some monitoring script calling ovs-appctl dpctl/show stats commands
> > every minute.
> > 
> > During VHU port delete, one of netdev references were not reduced to
> > 0 as show_dpif call has not given-up the reference back or doing bad
> > cleanup. This pending deference preventing VHU deletion sequence, this
> > is found to be one of corner case inside dpctl code which results in
> > leaking up netdev which ultimately results in stale VHU entry. After
> > fixing this problematic cleanup, issue is not seen.
> > 
> > Fixes: fceef2095222 ("dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev")
> > Signed-off-by: Vipul Ashri <vipul.ashri@ericsson.com>
> 
> Acked-by: Simon Horman <horms@ovn.org>

I now see this is a duplicate of an earlier posting [1] (why?).

[1] https://patchwork.ozlabs.org/project/openvswitch/patch/PAWPR07MB9877859E07FBAD793EEFD01D90A52@PAWPR07MB9877.eurprd07.prod.outlook.com/

In that thread David Marchard (CCed) said:

    "The issue seems generic as any netdev for which a call to
     netdev_get_stats failed would trigger the same leak.
     So I would update the commit title with something more generic like:
     dpctl: Fix netdev reference leak in "show" command.

    "This can probably be changed when applying.
     In any case, the fix lgtm.

    "Reviewed-by: David Marchand <david.marchand@redhat.com>

Which all seems reasonable to me.
Vipul Ashri July 16, 2024, 8:22 a.m. UTC | #3
> On Mon, Jul 15, 2024 at 02:55:29PM +0100, Simon Horman wrote:
> > On Thu, Jul 11, 2024 at 08:20:28PM +0000, Vipul Ashri via dev wrote:
> > >
> > > While running a test with a continous VM creation/deletion using an
> > > orchestration script with-in cloud environment. In parallel we have
> > > some monitoring script calling ovs-appctl dpctl/show stats commands
> > > every minute.
> > >
> > > During VHU port delete, one of netdev references were not reduced to
> > > 0 as show_dpif call has not given-up the reference back or doing bad
> > > cleanup. This pending deference preventing VHU deletion sequence, this
> > > is found to be one of corner case inside dpctl code which results in
> > > leaking up netdev which ultimately results in stale VHU entry. After
> > > fixing this problematic cleanup, issue is not seen.
> > >
> > > Fixes: fceef2095222 ("dpctl: add ovs-appctl dpctl/* commands to talk to dpif-netdev")
> > > Signed-off-by: Vipul Ashri <vipul.ashri@ericsson.com>
> >
> > Acked-by: Simon Horman <horms@ovn.org>
>
> I now see this is a duplicate of an earlier posting [1] (why?).

Sending twice was mistake from my end, You can ignore this time.

>
> [1] https://patchwork.ozlabs.org/project/openvswitch/patch/PAWPR07MB9877859E07FBAD793EEFD01D90A52@PAWPR07MB9877.eurprd07.prod.outlook.com/
>
> In that thread David Marchard (CCed) said:
>
>     "The issue seems generic as any netdev for which a call to
>      netdev_get_stats failed would trigger the same leak.
>      So I would update the commit title with something more generic like:
>      dpctl: Fix netdev reference leak in "show" command.

Let me send a v2 patch addressing this @David Marchand comment and one more
comment suggested by @Simon Horman about changing netdev_close() place.

>
>     "This can probably be changed when applying.
>      In any case, the fix lgtm.
>
>     "Reviewed-by: David Marchand <david.marchand@redhat.com>
>
> Which all seems reasonable to me.
diff mbox series

Patch

diff --git a/lib/dpctl.c b/lib/dpctl.c
index a70df5342..81e9ca0e9 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -739,7 +739,6 @@  show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
             }
             error = netdev_get_stats(netdev, &s);
             if (!error) {
-                netdev_close(netdev);
                 print_stat(dpctl_p, "    RX packets:", s.rx_packets);
                 print_stat(dpctl_p, " errors:", s.rx_errors);
                 print_stat(dpctl_p, " dropped:", s.rx_dropped);
@@ -770,6 +769,7 @@  show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
                 dpctl_print(dpctl_p, ", could not retrieve stats (%s)",
                             ovs_strerror(error));
             }
+            netdev_close(netdev);
         }
         dpif_port_destroy(&dpif_port);
     }