diff mbox

[ovs-dev,V2,4/4] tests: Fix fail of OVS_APP_EXIT_AND_WAIT on Windows

Message ID CAPWQB7H0Xp+jLahQ9zpZ4vKapsVLC0oOjpajeEOOu61qYUNwTA@mail.gmail.com
State Not Applicable
Headers show

Commit Message

Joe Stringer June 3, 2016, 1:41 a.m. UTC
On 2 June 2016 at 01:28, Paul Boca <pboca@cloudbasesolutions.com> wrote:
> Hi!
>
> Thanks for review!
> Please see my comments inline.

Thanks for looking into the windows testsuite failures :-)

More below.

>> -----Original Message-----
>> From: Joe Stringer [mailto:joe@ovn.org]
>> Sent: Thursday, June 2, 2016 4:53 AM
>> To: Paul Boca
>> Cc: dev@openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH V2 4/4] tests: Fix fail of
>> OVS_APP_EXIT_AND_WAIT on Windows
>>
>> On 1 June 2016 at 04:46, Paul Boca <pboca@cloudbasesolutions.com> wrote:
>> > The problem is that on some cases it gets called with the socket
>> > name instead of the service name.
>> >
>> > Signed-off-by: Paul-Daniel Boca <pboca@cloudbasesolutions.com>
>>
>> Looking at commit 0c473314294930a47a18d380e0bbcdf7b02a16f2 which
>> introduced this macro, it seems like simply skipping these pieces
>> could cause some tests to intermittently fail, because "ovs-appctl ...
>> exit" does not wait until the process has terminated; it simply tells
>> the process to terminate itself, then returns.
> [Paul Boca] The problem with OVS_APP_EXIT_AND_WAIT is that it gets called with "`pwd`"/unixctl,
> the macro tries to get the pid from the wrong file "$1.pid", this expands to "`pwd`"/unixctl.pid
> which doesn't exist. Also the macro tries to wait for the pid inside "`pwd`"/unixctl.pid.
> I'm looking at commit d9c8c57c48a22b145cf1b5e78839ac0a16e039e9 which replaces
> "ovs-appctl -t `pwd`/unixctl exit" with OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl]) which it
> doesn't do the right thing in this case, for other situations this works well.
>
> Here you can see one of the unixctl used:
> https://github.com/openvswitch/ovs/commit/d9c8c57c48a22b145cf1b5e78839ac0a16e039e9
> #diff-b878790402a4b86a273e4b8c75b9bea6R86
>
>>
>> What needs to happen differently on windows between validating that a
>> process has exited vs. a service has exited?
> [Paul Boca] The macro does the right thing except the case above.
> If a pid file name is sent to the macro, with the previous version, it works well.

Bear with me, I'm a little confused - firstly, about why the TMPPID
stuff is supposed to work at all, and secondly why the behaviour
differs between platforms.

One one hand, if the user of OVS_APP_EXIT_AND_WAIT() passes
`pwd`/unixctl, wouldn't the path then become
$OVS_RUNDIR/`pwd`/unixctl.pid ? That doesn't seem like a valid path to
read to acquire a PID. In fact, when I removed the "2>/dev/null" from
the TMPPID line, I found some tests that would quietly log that the
file doesn't exist - and rightly so. This suggests that the
$OVS_RUNDIR part of the cat command should be removed.

On the other hand, if that command returned effectively an empty
string, then I'd expect the kill command would become something like
"kill -0", which should print the usage and return a non-zero exit
code. Because the exit code is nonzero, the OVS_WAIT_WHILE should exit
immediately, so the test should continue. But this doesn't seem to be
the behaviour on windows? Maybe one of my assumptions here is a bit
off.

Lance, it seems like the "$OVS_RUNDIR" part of these commands was
introduced by some of the changes you made around gcov support,
specifically commits d9c8c57c48a2 ("tests: consistently use
OVS_APP_EXIT_AND_WAIT() for daemon termination") and f9b11f2a09b4
("tests: Make OVS_APP_EXIT_AND_WAIT() wait for process termination").
As far as I can tell, they're broken for a bunch of the tests. I
started investigating by applying the diff below, and I can see that
there's a lot of failures locating the pidfiles. Would you be able to
take a look?

+   AT_CHECK([test -e $pidfile])
+   TMPPID=$(cat "$pidfile")
   AT_CHECK([ovs-appctl --target=$OVS_RUNDIR/ovsdb-server.$TMPPID.ctl exit])
   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])])

Comments

Lance Richardson June 3, 2016, 2:07 a.m. UTC | #1
----- Original Message -----
> From: "Joe Stringer" <joe@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org, "Paul Boca" <pboca@cloudbasesolutions.com>
> Sent: Thursday, June 2, 2016 9:41:23 PM
> Subject: Re: [ovs-dev] [PATCH V2 4/4] tests: Fix fail of OVS_APP_EXIT_AND_WAIT on Windows
> 
> On 2 June 2016 at 01:28, Paul Boca <pboca@cloudbasesolutions.com> wrote:
> > Hi!
> >
> > Thanks for review!
> > Please see my comments inline.
> 
> Thanks for looking into the windows testsuite failures :-)
> 
> More below.
> 
> >> -----Original Message-----
> >> From: Joe Stringer [mailto:joe@ovn.org]
> >> Sent: Thursday, June 2, 2016 4:53 AM
> >> To: Paul Boca
> >> Cc: dev@openvswitch.org
> >> Subject: Re: [ovs-dev] [PATCH V2 4/4] tests: Fix fail of
> >> OVS_APP_EXIT_AND_WAIT on Windows
> >>
> >> On 1 June 2016 at 04:46, Paul Boca <pboca@cloudbasesolutions.com> wrote:
> >> > The problem is that on some cases it gets called with the socket
> >> > name instead of the service name.
> >> >
> >> > Signed-off-by: Paul-Daniel Boca <pboca@cloudbasesolutions.com>
> >>
> >> Looking at commit 0c473314294930a47a18d380e0bbcdf7b02a16f2 which
> >> introduced this macro, it seems like simply skipping these pieces
> >> could cause some tests to intermittently fail, because "ovs-appctl ...
> >> exit" does not wait until the process has terminated; it simply tells
> >> the process to terminate itself, then returns.
> > [Paul Boca] The problem with OVS_APP_EXIT_AND_WAIT is that it gets called
> > with "`pwd`"/unixctl,
> > the macro tries to get the pid from the wrong file "$1.pid", this expands
> > to "`pwd`"/unixctl.pid
> > which doesn't exist. Also the macro tries to wait for the pid inside
> > "`pwd`"/unixctl.pid.
> > I'm looking at commit d9c8c57c48a22b145cf1b5e78839ac0a16e039e9 which
> > replaces
> > "ovs-appctl -t `pwd`/unixctl exit" with
> > OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl]) which it
> > doesn't do the right thing in this case, for other situations this works
> > well.
> >
> > Here you can see one of the unixctl used:
> > https://github.com/openvswitch/ovs/commit/d9c8c57c48a22b145cf1b5e78839ac0a16e039e9
> > #diff-b878790402a4b86a273e4b8c75b9bea6R86
> >
> >>
> >> What needs to happen differently on windows between validating that a
> >> process has exited vs. a service has exited?
> > [Paul Boca] The macro does the right thing except the case above.
> > If a pid file name is sent to the macro, with the previous version, it
> > works well.
> 
> Bear with me, I'm a little confused - firstly, about why the TMPPID
> stuff is supposed to work at all, and secondly why the behaviour
> differs between platforms.
> 
> One one hand, if the user of OVS_APP_EXIT_AND_WAIT() passes
> `pwd`/unixctl, wouldn't the path then become
> $OVS_RUNDIR/`pwd`/unixctl.pid ? That doesn't seem like a valid path to
> read to acquire a PID. In fact, when I removed the "2>/dev/null" from
> the TMPPID line, I found some tests that would quietly log that the
> file doesn't exist - and rightly so. This suggests that the
> $OVS_RUNDIR part of the cat command should be removed.
> 
> On the other hand, if that command returned effectively an empty
> string, then I'd expect the kill command would become something like
> "kill -0", which should print the usage and return a non-zero exit
> code. Because the exit code is nonzero, the OVS_WAIT_WHILE should exit
> immediately, so the test should continue. But this doesn't seem to be
> the behaviour on windows? Maybe one of my assumptions here is a bit
> off.
> 
> Lance, it seems like the "$OVS_RUNDIR" part of these commands was
> introduced by some of the changes you made around gcov support,
> specifically commits d9c8c57c48a2 ("tests: consistently use
> OVS_APP_EXIT_AND_WAIT() for daemon termination") and f9b11f2a09b4
> ("tests: Make OVS_APP_EXIT_AND_WAIT() wait for process termination").
> As far as I can tell, they're broken for a bunch of the tests. I
> started investigating by applying the diff below, and I can see that
> there's a lot of failures locating the pidfiles. Would you be able to
> take a look?
>

Sure, I'll take a look in the morning and report back.

   Lance

> diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
> index e5710a07c192..2411fab2bb3f 100644
> --- a/tests/ovs-macros.at
> +++ b/tests/ovs-macros.at
> @@ -131,11 +131,18 @@ m4_define([OVS_WAIT_WHILE],
>   [OVS_WAIT([if $1; then return 1; else return 0; fi], [$2])])
> 
> dnl OVS_APP_EXIT_AND_WAIT(DAEMON)
> dnl
> dnl Ask the daemon named DAEMON to exit, via ovs-appctl, and then waits for
> it
> dnl to exit.
> m4_define([OVS_APP_EXIT_AND_WAIT],
> -  [TMPPID=$(cat "$OVS_RUNDIR"/$1.pid 2>/dev/null)
> +  [pidfile="$1.pid"
> +   AT_CHECK([test -e $pidfile])
> +   TMPPID=$(cat $pidfile)
>    AT_CHECK([ovs-appctl -t $1 exit])
>    OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])])
> 
> @@ -144,7 +151,9 @@ dnl
> dnl Ask the daemon named DAEMON to exit, via ovs-appctl (using the target
> dnl argument), and then waits for it to exit.
> m4_define([OVS_APP_EXIT_AND_WAIT_BY_TARGET],
> -  [TMPPID=`cat "$OVS_RUNDIR"/$1.pid 2>/dev/null`
> +  [pidfile="$OVS_RUNDIR"/$1.pid
> +   AT_CHECK([test -e $pidfile])
> +   TMPPID=$(cat "$pidfile")
>    AT_CHECK([ovs-appctl --target=$OVS_RUNDIR/ovsdb-server.$TMPPID.ctl exit])
>    OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])])
>
Ben Pfaff June 3, 2016, 4:37 a.m. UTC | #2
On Thu, Jun 02, 2016 at 06:41:23PM -0700, Joe Stringer wrote:
> On the other hand, if that command returned effectively an empty
> string, then I'd expect the kill command would become something like
> "kill -0", which should print the usage and return a non-zero exit
> code.

That's what happens on a POSIX system, but for Windows, in
tests/ovs-macros.at, we have a "kill"-emulating shell function that
fails to faithfully reproduce that behavior:

    kill () {
        signal=
        retval=0
        for arg; do
            case $arg in
            -*) signal=$arg ;;
            [1-9][0-9]*)
                # tasklist always returns 0.
                # If pid does exist, there will be a line with the pid.
                if tasklist //fi "PID eq $arg" | grep $arg >/dev/null; then
                    if test "X$signal" != "X-0"; then
                        tskill $arg
                    fi
                else
                    retval=1
                fi
                ;;
            esac
        done
        return $retval

Perhaps we should add something like:

    if test $# = 0; then
        echo >&2 "kill: usage: kill [-signum] pid..."
        return 1
    fi

> Because the exit code is nonzero, the OVS_WAIT_WHILE should exit
> immediately, so the test should continue. But this doesn't seem to be
> the behaviour on windows? Maybe one of my assumptions here is a bit
> off.
Ben Pfaff June 3, 2016, 4:43 a.m. UTC | #3
On Thu, Jun 02, 2016 at 09:37:01PM -0700, Ben Pfaff wrote:
> On Thu, Jun 02, 2016 at 06:41:23PM -0700, Joe Stringer wrote:
> > On the other hand, if that command returned effectively an empty
> > string, then I'd expect the kill command would become something like
> > "kill -0", which should print the usage and return a non-zero exit
> > code.
> 
> That's what happens on a POSIX system, but for Windows, in
> tests/ovs-macros.at, we have a "kill"-emulating shell function that
> fails to faithfully reproduce that behavior:
> 
>     kill () {
>         signal=
>         retval=0
>         for arg; do
>             case $arg in
>             -*) signal=$arg ;;
>             [1-9][0-9]*)
>                 # tasklist always returns 0.
>                 # If pid does exist, there will be a line with the pid.
>                 if tasklist //fi "PID eq $arg" | grep $arg >/dev/null; then
>                     if test "X$signal" != "X-0"; then
>                         tskill $arg
>                     fi
>                 else
>                     retval=1
>                 fi
>                 ;;
>             esac
>         done
>         return $retval
> 
> Perhaps we should add something like:
> 
>     if test $# = 0; then
>         echo >&2 "kill: usage: kill [-signum] pid..."
>         return 1
>     fi

Actually that's still incomplete, of course, because "test $# = 0" would
ignore "kill -0".
Lance Richardson June 3, 2016, 4 p.m. UTC | #4
----- Original Message -----
> From: "Joe Stringer" <joe@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org, "Paul Boca" <pboca@cloudbasesolutions.com>
> Sent: Thursday, June 2, 2016 9:41:23 PM
> Subject: Re: [ovs-dev] [PATCH V2 4/4] tests: Fix fail of OVS_APP_EXIT_AND_WAIT on Windows
> 

> Bear with me, I'm a little confused - firstly, about why the TMPPID
> stuff is supposed to work at all, and secondly why the behaviour
> differs between platforms.
> 
> One one hand, if the user of OVS_APP_EXIT_AND_WAIT() passes
> `pwd`/unixctl, wouldn't the path then become
> $OVS_RUNDIR/`pwd`/unixctl.pid ? That doesn't seem like a valid path to
> read to acquire a PID. In fact, when I removed the "2>/dev/null" from
> the TMPPID line, I found some tests that would quietly log that the
> file doesn't exist - and rightly so. This suggests that the
> $OVS_RUNDIR part of the cat command should be removed.
> 
> On the other hand, if that command returned effectively an empty
> string, then I'd expect the kill command would become something like
> "kill -0", which should print the usage and return a non-zero exit
> code. Because the exit code is nonzero, the OVS_WAIT_WHILE should exit
> immediately, so the test should continue. But this doesn't seem to be
> the behaviour on windows? Maybe one of my assumptions here is a bit
> off.
> 
> Lance, it seems like the "$OVS_RUNDIR" part of these commands was
> introduced by some of the changes you made around gcov support,
> specifically commits d9c8c57c48a2 ("tests: consistently use
> OVS_APP_EXIT_AND_WAIT() for daemon termination") and f9b11f2a09b4
> ("tests: Make OVS_APP_EXIT_AND_WAIT() wait for process termination").
> As far as I can tell, they're broken for a bunch of the tests. I
> started investigating by applying the diff below, and I can see that
> there's a lot of failures locating the pidfiles. Would you be able to
> take a look?
> 

OK, it appears that I was probably a little too focused on the ~250
locations where OVS_APP_EXIT_AND_WAIT() is invoked with a daemon
name (where the TMPPID stuff works fine) and not enough to the ~8
locations where it's invoked with the path of a unixctl socket.

Ben's suggestion to have the Windows-specfic kill() function emulate
the kill command's handling of "kill -0" should make this macro
behave as it does on other systems, but I'm wondering if it might
also be good to create a separate macro for the unixctl case. Ideally
there would be a way to derive the pid from the unixctl socket path,
but I believe we can't assume the existence of things like fuser or
lsof even on Linux or BSD systems. Or maybe this variant could
take both the unixctl socket path and pid file path as parameters.

Maybe something like:

dnl OVS_APP_EXIT_AND_WAIT_UNIXCTL(SOCKET PIDFILE)
dnl
dnl Ask the daemon associated with SOCKET to exit,
dnl via ovs-appctl, and then wait for it to exit.
m4_define([OVS_APP_EXIT_AND_WAIT_UNIXCTL],
  [TMPPID=$(cat $2 2>/dev/null)
   AT_CHECK([ovs-appctl -t $1 exit])
   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])])

and change these spots to use the new macro:

daemon.at:  OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl])
ovsdb-monitor.at:   OVS_APP_EXIT_AND_WAIT(["`pwd`"/unixctl])
ovsdb-server.at:  [OVS_APP_EXIT_AND_WAIT(["`pwd`"/unixctl])])
ovs-vswitchd.at:OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl])
ovs-vswitchd.at:OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl])
ovs-vswitchd.at:OVS_APP_EXIT_AND_WAIT([`pwd`/unixctl])
vlog.at:OVS_APP_EXIT_AND_WAIT([test-unixctl])
vlog.at:OVS_APP_EXIT_AND_WAIT([test-unixctl])

It looks like some of ovs-vswitchd.at cases would need to changed to
start ovs-vswitchd with "--pidfile xxx" in order to have a pidfile.

If that sounds reasonable, I can work up the patches to do it.

Thanks,

   Lance
Ben Pfaff June 3, 2016, 4:38 p.m. UTC | #5
On Fri, Jun 03, 2016 at 12:00:11PM -0400, Lance Richardson wrote:
> OK, it appears that I was probably a little too focused on the ~250
> locations where OVS_APP_EXIT_AND_WAIT() is invoked with a daemon
> name (where the TMPPID stuff works fine) and not enough to the ~8
> locations where it's invoked with the path of a unixctl socket.
> 
> Ben's suggestion to have the Windows-specfic kill() function emulate
> the kill command's handling of "kill -0" should make this macro
> behave as it does on other systems, but I'm wondering if it might
> also be good to create a separate macro for the unixctl case. Ideally
> there would be a way to derive the pid from the unixctl socket path,
> but I believe we can't assume the existence of things like fuser or
> lsof even on Linux or BSD systems. Or maybe this variant could
> take both the unixctl socket path and pid file path as parameters.

The "kill -0" change is a tangent.  It solves a particular problem, but
not the root of this issue.

It's a mistake to pass a unixctl socket name to OVS_APP_EXIT_AND_WAIT.
A new macro is one possible solution.  However, looking at a few of the
cases where OVS_APP_EXIT_AND_WAIT is being passed a unixctl socket, I'm
not sure why we couldn't just pass a pidfile instead.  Some of the users
don't have a pidfile, but why not just add one?
Lance Richardson June 3, 2016, 4:49 p.m. UTC | #6
----- Original Message -----
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: "Joe Stringer" <joe@ovn.org>, dev@openvswitch.org, "Paul Boca" <pboca@cloudbasesolutions.com>
> Sent: Friday, June 3, 2016 12:38:56 PM
> Subject: Re: [ovs-dev] [PATCH V2 4/4] tests: Fix fail of OVS_APP_EXIT_AND_WAIT on Windows
> 
> On Fri, Jun 03, 2016 at 12:00:11PM -0400, Lance Richardson wrote:
> > OK, it appears that I was probably a little too focused on the ~250
> > locations where OVS_APP_EXIT_AND_WAIT() is invoked with a daemon
> > name (where the TMPPID stuff works fine) and not enough to the ~8
> > locations where it's invoked with the path of a unixctl socket.
> > 
> > Ben's suggestion to have the Windows-specfic kill() function emulate
> > the kill command's handling of "kill -0" should make this macro
> > behave as it does on other systems, but I'm wondering if it might
> > also be good to create a separate macro for the unixctl case. Ideally
> > there would be a way to derive the pid from the unixctl socket path,
> > but I believe we can't assume the existence of things like fuser or
> > lsof even on Linux or BSD systems. Or maybe this variant could
> > take both the unixctl socket path and pid file path as parameters.
> 
> The "kill -0" change is a tangent.  It solves a particular problem, but
> not the root of this issue.
> 
> It's a mistake to pass a unixctl socket name to OVS_APP_EXIT_AND_WAIT.

Right.

> A new macro is one possible solution.  However, looking at a few of the
> cases where OVS_APP_EXIT_AND_WAIT is being passed a unixctl socket, I'm
> not sure why we couldn't just pass a pidfile instead.  Some of the users
> don't have a pidfile, but why not just add one?
> 

I'm fine with that, the main difference between creating a new macro and
adding a parameter to the existing one is the number of invocation sites
that would need to be modified... 8 or so if a new macro is created and
over 250 if a parameter is added to the existing macro.  It's more work
in the short term to add a parameter, but it does have a cleaner end result.

I'll go ahead and start working a patch set up to add a pidfile parameter
to OVS_APP_EXIT_AND_WAIT.

Thanks,

   Lance
diff mbox

Patch

diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at
index e5710a07c192..2411fab2bb3f 100644
--- a/tests/ovs-macros.at
+++ b/tests/ovs-macros.at
@@ -131,11 +131,18 @@  m4_define([OVS_WAIT_WHILE],
  [OVS_WAIT([if $1; then return 1; else return 0; fi], [$2])])

dnl OVS_APP_EXIT_AND_WAIT(DAEMON)
dnl
dnl Ask the daemon named DAEMON to exit, via ovs-appctl, and then waits for it
dnl to exit.
m4_define([OVS_APP_EXIT_AND_WAIT],
-  [TMPPID=$(cat "$OVS_RUNDIR"/$1.pid 2>/dev/null)
+  [pidfile="$1.pid"
+   AT_CHECK([test -e $pidfile])
+   TMPPID=$(cat $pidfile)
   AT_CHECK([ovs-appctl -t $1 exit])
   OVS_WAIT_WHILE([kill -0 $TMPPID 2>/dev/null])])

@@ -144,7 +151,9 @@  dnl
dnl Ask the daemon named DAEMON to exit, via ovs-appctl (using the target
dnl argument), and then waits for it to exit.
m4_define([OVS_APP_EXIT_AND_WAIT_BY_TARGET],
-  [TMPPID=`cat "$OVS_RUNDIR"/$1.pid 2>/dev/null`
+  [pidfile="$OVS_RUNDIR"/$1.pid