diff mbox series

[ovs-dev] ci: Add timeout command to test execution.

Message ID 20241010122038.1083809-1-amusil@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] ci: Add timeout command to test execution. | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Ales Musil Oct. 10, 2024, 12:20 p.m. UTC
The timeout command allows us to specify a timeout for each job in CI.
This is handy because there might be a test that is not able to
finish fro various reasons, last time it was forking nc process.
At the same time having timeout over the make command allows us
to preserve artifacts even if the timeout is reached so it might
be easier to debug what is happening.

Signed-off-by: Ales Musil <amusil@redhat.com>
---
 .ci/ci.sh                  | 12 +++++++++---
 .ci/linux-build.sh         |  9 ++++++---
 .github/workflows/test.yml |  4 ++--
 3 files changed, 17 insertions(+), 8 deletions(-)

Comments

Dumitru Ceara Oct. 11, 2024, 11:26 a.m. UTC | #1
Hi Ales,

Thanks for the patch!

On 10/10/24 14:20, Ales Musil wrote:
> The timeout command allows us to specify a timeout for each job in CI.
> This is handy because there might be a test that is not able to
> finish fro various reasons, last time it was forking nc process.

Typo: "fro".

> At the same time having timeout over the make command allows us
> to preserve artifacts even if the timeout is reached so it might
> be easier to debug what is happening.
> 
> Signed-off-by: Ales Musil <amusil@redhat.com>
> ---
>  .ci/ci.sh                  | 12 +++++++++---
>  .ci/linux-build.sh         |  9 ++++++---
>  .github/workflows/test.yml |  4 ++--
>  3 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/.ci/ci.sh b/.ci/ci.sh
> index f543967dc..13df3db34 100755
> --- a/.ci/ci.sh
> +++ b/.ci/ci.sh
> @@ -20,6 +20,7 @@ CONTAINER_CMD=${CONTAINER_CMD:-podman}
>  CONTAINER_WORKSPACE="/workspace"
>  CONTAINER_WORKDIR="/workspace/ovn-tmp"
>  IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"}
> +TIMEOUT=${TIMEOUT:-"0"}
>  
>  # Test variables
>  ARCH=${ARCH:-$(uname -m)}
> @@ -100,7 +101,8 @@ function run_tests() {
>          && \
>          ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
>          TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
> -        RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
> +        RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \
> +        ./.ci/linux-build.sh
>      "
>  }
>  
> @@ -115,7 +117,7 @@ function check_clang_version_ge() {
>  }
>  
>  options=$(getopt --options "" \
> -    --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\
> +    --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\
>      -- "${@}")
>  eval set -- "$options"
>  while true; do
> @@ -142,11 +144,15 @@ while true; do
>      --archive-logs)
>          archive_logs="1"
>          ;;
> +    --timeout)
> +        shift
> +        TIMEOUT="$1"
> +        ;;
>      --help)
>          set +x
>          printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] "
>          printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] "
> -        printf "[--image-name=<IMAGE_NAME>]\n"
> +        printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n"
>          exit
>          ;;
>      --)
> diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
> index 75a9480f9..7740f7a3f 100755
> --- a/.ci/linux-build.sh
> +++ b/.ci/linux-build.sh
> @@ -10,6 +10,7 @@ OVN_CFLAGS=""
>  OPTS="$OPTS --enable-Werror"
>  JOBS=${JOBS:-"-j4"}
>  RECHECK=${RECHECK:-"no"}
> +TIMEOUT=${TIMEOUT:="0"}

This should be TIMEOUT=${TIMEOUT:-"0"} I guess.

>  
>  function install_dpdk()
>  {
> @@ -105,7 +106,8 @@ function configure_clang()
>  
>  function run_tests()
>  {
> -    if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
> +    if ! timeout $TIMEOUT make distcheck \

This sends TERM on $TIMEOUT.  Maybe it's safer to also send KILL after a
while, and make it more verbose, e.g.:

timeout -k <someinterval> -v $TIMEOUT ..

> +        CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
>          TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \
>          SKIP_UNSTABLE=$SKIP_UNSTABLE
>      then
> @@ -147,8 +149,9 @@ function run_system_tests()
>      local type=$1
>      local log_file=$2
>  
> -    if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \
> -            RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then
> +    if ! sudo timeout $TIMEOUT make $JOBS $type \
> +            TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \
> +            SKIP_UNSTABLE=$SKIP_UNSTABLE; then

Same comment here.

>          # $log_file is necessary for debugging.
>          cat tests/$log_file
>          return 1
> diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
> index f7a184c13..47085c4fe 100644
> --- a/.github/workflows/test.yml
> +++ b/.github/workflows/test.yml
> @@ -168,11 +168,11 @@ jobs:
>  
>      - name: build
>        if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }}
> -      run: sudo -E ./.ci/ci.sh --archive-logs
> +      run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h

I think the current default for GitHub actions jobs is 6 hours.  So it
makes sense to limit this further but maybe we should be more
conservative and make it 2h instead?

>  
>      - name: build
>        if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }}
> -      run: ./.ci/ci.sh --archive-logs
> +      run: ./.ci/ci.sh --archive-logs --timeout=1h
>  

Same here.

>      - name: upload logs on failure
>        if: failure() || cancelled()

Thanks,
Dumitru
Ales Musil Oct. 11, 2024, 11:54 a.m. UTC | #2
On Fri, Oct 11, 2024 at 1:27 PM Dumitru Ceara <dceara@redhat.com> wrote:

> Hi Ales,
>
> Thanks for the patch!
>

Hi Dumitru,

thank you for the review. All is addressed in v2.


> On 10/10/24 14:20, Ales Musil wrote:
> > The timeout command allows us to specify a timeout for each job in CI.
> > This is handy because there might be a test that is not able to
> > finish fro various reasons, last time it was forking nc process.
>
> Typo: "fro".
>
> > At the same time having timeout over the make command allows us
> > to preserve artifacts even if the timeout is reached so it might
> > be easier to debug what is happening.
> >
> > Signed-off-by: Ales Musil <amusil@redhat.com>
> > ---
> >  .ci/ci.sh                  | 12 +++++++++---
> >  .ci/linux-build.sh         |  9 ++++++---
> >  .github/workflows/test.yml |  4 ++--
> >  3 files changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/.ci/ci.sh b/.ci/ci.sh
> > index f543967dc..13df3db34 100755
> > --- a/.ci/ci.sh
> > +++ b/.ci/ci.sh
> > @@ -20,6 +20,7 @@ CONTAINER_CMD=${CONTAINER_CMD:-podman}
> >  CONTAINER_WORKSPACE="/workspace"
> >  CONTAINER_WORKDIR="/workspace/ovn-tmp"
> >  IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"}
> > +TIMEOUT=${TIMEOUT:-"0"}
> >
> >  # Test variables
> >  ARCH=${ARCH:-$(uname -m)}
> > @@ -100,7 +101,8 @@ function run_tests() {
> >          && \
> >          ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
> >          TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
> > -        RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
> > +        RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \
> > +        ./.ci/linux-build.sh
> >      "
> >  }
> >
> > @@ -115,7 +117,7 @@ function check_clang_version_ge() {
> >  }
> >
> >  options=$(getopt --options "" \
> > -    --long
> help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\
> > +    --long
> help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\
> >      -- "${@}")
> >  eval set -- "$options"
> >  while true; do
> > @@ -142,11 +144,15 @@ while true; do
> >      --archive-logs)
> >          archive_logs="1"
> >          ;;
> > +    --timeout)
> > +        shift
> > +        TIMEOUT="$1"
> > +        ;;
> >      --help)
> >          set +x
> >          printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] "
> >          printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] "
> > -        printf "[--image-name=<IMAGE_NAME>]\n"
> > +        printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n"
> >          exit
> >          ;;
> >      --)
> > diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
> > index 75a9480f9..7740f7a3f 100755
> > --- a/.ci/linux-build.sh
> > +++ b/.ci/linux-build.sh
> > @@ -10,6 +10,7 @@ OVN_CFLAGS=""
> >  OPTS="$OPTS --enable-Werror"
> >  JOBS=${JOBS:-"-j4"}
> >  RECHECK=${RECHECK:-"no"}
> > +TIMEOUT=${TIMEOUT:="0"}
>
> This should be TIMEOUT=${TIMEOUT:-"0"} I guess.
>
> >
> >  function install_dpdk()
> >  {
> > @@ -105,7 +106,8 @@ function configure_clang()
> >
> >  function run_tests()
> >  {
> > -    if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
> > +    if ! timeout $TIMEOUT make distcheck \
>
> This sends TERM on $TIMEOUT.  Maybe it's safer to also send KILL after a
> while, and make it more verbose, e.g.:
>
> timeout -k <someinterval> -v $TIMEOUT ..
>
> > +        CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
> >          TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \
> >          SKIP_UNSTABLE=$SKIP_UNSTABLE
> >      then
> > @@ -147,8 +149,9 @@ function run_system_tests()
> >      local type=$1
> >      local log_file=$2
> >
> > -    if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \
> > -            RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then
> > +    if ! sudo timeout $TIMEOUT make $JOBS $type \
> > +            TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \
> > +            SKIP_UNSTABLE=$SKIP_UNSTABLE; then
>
> Same comment here.
>
> >          # $log_file is necessary for debugging.
> >          cat tests/$log_file
> >          return 1
> > diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
> > index f7a184c13..47085c4fe 100644
> > --- a/.github/workflows/test.yml
> > +++ b/.github/workflows/test.yml
> > @@ -168,11 +168,11 @@ jobs:
> >
> >      - name: build
> >        if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }}
> > -      run: sudo -E ./.ci/ci.sh --archive-logs
> > +      run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h
>
> I think the current default for GitHub actions jobs is 6 hours.  So it
> makes sense to limit this further but maybe we should be more
> conservative and make it 2h instead?
>
> >
> >      - name: build
> >        if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }}
> > -      run: ./.ci/ci.sh --archive-logs
> > +      run: ./.ci/ci.sh --archive-logs --timeout=1h
> >
>
> Same here.
>
> >      - name: upload logs on failure
> >        if: failure() || cancelled()
>
> Thanks,
> Dumitru
>
>
Thanks,
Ales
diff mbox series

Patch

diff --git a/.ci/ci.sh b/.ci/ci.sh
index f543967dc..13df3db34 100755
--- a/.ci/ci.sh
+++ b/.ci/ci.sh
@@ -20,6 +20,7 @@  CONTAINER_CMD=${CONTAINER_CMD:-podman}
 CONTAINER_WORKSPACE="/workspace"
 CONTAINER_WORKDIR="/workspace/ovn-tmp"
 IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"}
+TIMEOUT=${TIMEOUT:-"0"}
 
 # Test variables
 ARCH=${ARCH:-$(uname -m)}
@@ -100,7 +101,8 @@  function run_tests() {
         && \
         ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \
         TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \
-        RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh
+        RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \
+        ./.ci/linux-build.sh
     "
 }
 
@@ -115,7 +117,7 @@  function check_clang_version_ge() {
 }
 
 options=$(getopt --options "" \
-    --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\
+    --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\
     -- "${@}")
 eval set -- "$options"
 while true; do
@@ -142,11 +144,15 @@  while true; do
     --archive-logs)
         archive_logs="1"
         ;;
+    --timeout)
+        shift
+        TIMEOUT="$1"
+        ;;
     --help)
         set +x
         printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] "
         printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] "
-        printf "[--image-name=<IMAGE_NAME>]\n"
+        printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n"
         exit
         ;;
     --)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 75a9480f9..7740f7a3f 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -10,6 +10,7 @@  OVN_CFLAGS=""
 OPTS="$OPTS --enable-Werror"
 JOBS=${JOBS:-"-j4"}
 RECHECK=${RECHECK:-"no"}
+TIMEOUT=${TIMEOUT:="0"}
 
 function install_dpdk()
 {
@@ -105,7 +106,8 @@  function configure_clang()
 
 function run_tests()
 {
-    if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
+    if ! timeout $TIMEOUT make distcheck \
+        CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \
         TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \
         SKIP_UNSTABLE=$SKIP_UNSTABLE
     then
@@ -147,8 +149,9 @@  function run_system_tests()
     local type=$1
     local log_file=$2
 
-    if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \
-            RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then
+    if ! sudo timeout $TIMEOUT make $JOBS $type \
+            TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \
+            SKIP_UNSTABLE=$SKIP_UNSTABLE; then
         # $log_file is necessary for debugging.
         cat tests/$log_file
         return 1
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index f7a184c13..47085c4fe 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -168,11 +168,11 @@  jobs:
 
     - name: build
       if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }}
-      run: sudo -E ./.ci/ci.sh --archive-logs
+      run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h
 
     - name: build
       if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }}
-      run: ./.ci/ci.sh --archive-logs
+      run: ./.ci/ci.sh --archive-logs --timeout=1h
 
     - name: upload logs on failure
       if: failure() || cancelled()