Message ID | 20241010122038.1083809-1-amusil@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] ci: Add timeout command to test execution. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | warning | apply and check: warning |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Hi Ales, Thanks for the patch! On 10/10/24 14:20, Ales Musil wrote: > The timeout command allows us to specify a timeout for each job in CI. > This is handy because there might be a test that is not able to > finish fro various reasons, last time it was forking nc process. Typo: "fro". > At the same time having timeout over the make command allows us > to preserve artifacts even if the timeout is reached so it might > be easier to debug what is happening. > > Signed-off-by: Ales Musil <amusil@redhat.com> > --- > .ci/ci.sh | 12 +++++++++--- > .ci/linux-build.sh | 9 ++++++--- > .github/workflows/test.yml | 4 ++-- > 3 files changed, 17 insertions(+), 8 deletions(-) > > diff --git a/.ci/ci.sh b/.ci/ci.sh > index f543967dc..13df3db34 100755 > --- a/.ci/ci.sh > +++ b/.ci/ci.sh > @@ -20,6 +20,7 @@ CONTAINER_CMD=${CONTAINER_CMD:-podman} > CONTAINER_WORKSPACE="/workspace" > CONTAINER_WORKDIR="/workspace/ovn-tmp" > IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"} > +TIMEOUT=${TIMEOUT:-"0"} > > # Test variables > ARCH=${ARCH:-$(uname -m)} > @@ -100,7 +101,8 @@ function run_tests() { > && \ > ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \ > TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \ > - RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh > + RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \ > + ./.ci/linux-build.sh > " > } > > @@ -115,7 +117,7 @@ function check_clang_version_ge() { > } > > options=$(getopt --options "" \ > - --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\ > + --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\ > -- "${@}") > eval set -- "$options" > while true; do > @@ -142,11 +144,15 @@ while true; do > --archive-logs) > archive_logs="1" > ;; > + --timeout) > + shift > + TIMEOUT="$1" > + ;; > --help) > set +x > printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] " > printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] " > - printf "[--image-name=<IMAGE_NAME>]\n" > + printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n" > exit > ;; > --) > diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh > index 75a9480f9..7740f7a3f 100755 > --- a/.ci/linux-build.sh > +++ b/.ci/linux-build.sh > @@ -10,6 +10,7 @@ OVN_CFLAGS="" > OPTS="$OPTS --enable-Werror" > JOBS=${JOBS:-"-j4"} > RECHECK=${RECHECK:-"no"} > +TIMEOUT=${TIMEOUT:="0"} This should be TIMEOUT=${TIMEOUT:-"0"} I guess. > > function install_dpdk() > { > @@ -105,7 +106,8 @@ function configure_clang() > > function run_tests() > { > - if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > + if ! timeout $TIMEOUT make distcheck \ This sends TERM on $TIMEOUT. Maybe it's safer to also send KILL after a while, and make it more verbose, e.g.: timeout -k <someinterval> -v $TIMEOUT .. > + CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \ > SKIP_UNSTABLE=$SKIP_UNSTABLE > then > @@ -147,8 +149,9 @@ function run_system_tests() > local type=$1 > local log_file=$2 > > - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \ > - RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then > + if ! sudo timeout $TIMEOUT make $JOBS $type \ > + TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \ > + SKIP_UNSTABLE=$SKIP_UNSTABLE; then Same comment here. > # $log_file is necessary for debugging. > cat tests/$log_file > return 1 > diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml > index f7a184c13..47085c4fe 100644 > --- a/.github/workflows/test.yml > +++ b/.github/workflows/test.yml > @@ -168,11 +168,11 @@ jobs: > > - name: build > if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }} > - run: sudo -E ./.ci/ci.sh --archive-logs > + run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h I think the current default for GitHub actions jobs is 6 hours. So it makes sense to limit this further but maybe we should be more conservative and make it 2h instead? > > - name: build > if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }} > - run: ./.ci/ci.sh --archive-logs > + run: ./.ci/ci.sh --archive-logs --timeout=1h > Same here. > - name: upload logs on failure > if: failure() || cancelled() Thanks, Dumitru
On Fri, Oct 11, 2024 at 1:27 PM Dumitru Ceara <dceara@redhat.com> wrote: > Hi Ales, > > Thanks for the patch! > Hi Dumitru, thank you for the review. All is addressed in v2. > On 10/10/24 14:20, Ales Musil wrote: > > The timeout command allows us to specify a timeout for each job in CI. > > This is handy because there might be a test that is not able to > > finish fro various reasons, last time it was forking nc process. > > Typo: "fro". > > > At the same time having timeout over the make command allows us > > to preserve artifacts even if the timeout is reached so it might > > be easier to debug what is happening. > > > > Signed-off-by: Ales Musil <amusil@redhat.com> > > --- > > .ci/ci.sh | 12 +++++++++--- > > .ci/linux-build.sh | 9 ++++++--- > > .github/workflows/test.yml | 4 ++-- > > 3 files changed, 17 insertions(+), 8 deletions(-) > > > > diff --git a/.ci/ci.sh b/.ci/ci.sh > > index f543967dc..13df3db34 100755 > > --- a/.ci/ci.sh > > +++ b/.ci/ci.sh > > @@ -20,6 +20,7 @@ CONTAINER_CMD=${CONTAINER_CMD:-podman} > > CONTAINER_WORKSPACE="/workspace" > > CONTAINER_WORKDIR="/workspace/ovn-tmp" > > IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"} > > +TIMEOUT=${TIMEOUT:-"0"} > > > > # Test variables > > ARCH=${ARCH:-$(uname -m)} > > @@ -100,7 +101,8 @@ function run_tests() { > > && \ > > ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \ > > TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \ > > - RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh > > + RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \ > > + ./.ci/linux-build.sh > > " > > } > > > > @@ -115,7 +117,7 @@ function check_clang_version_ge() { > > } > > > > options=$(getopt --options "" \ > > - --long > help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\ > > + --long > help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\ > > -- "${@}") > > eval set -- "$options" > > while true; do > > @@ -142,11 +144,15 @@ while true; do > > --archive-logs) > > archive_logs="1" > > ;; > > + --timeout) > > + shift > > + TIMEOUT="$1" > > + ;; > > --help) > > set +x > > printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] " > > printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] " > > - printf "[--image-name=<IMAGE_NAME>]\n" > > + printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n" > > exit > > ;; > > --) > > diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh > > index 75a9480f9..7740f7a3f 100755 > > --- a/.ci/linux-build.sh > > +++ b/.ci/linux-build.sh > > @@ -10,6 +10,7 @@ OVN_CFLAGS="" > > OPTS="$OPTS --enable-Werror" > > JOBS=${JOBS:-"-j4"} > > RECHECK=${RECHECK:-"no"} > > +TIMEOUT=${TIMEOUT:="0"} > > This should be TIMEOUT=${TIMEOUT:-"0"} I guess. > > > > > function install_dpdk() > > { > > @@ -105,7 +106,8 @@ function configure_clang() > > > > function run_tests() > > { > > - if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > > + if ! timeout $TIMEOUT make distcheck \ > > This sends TERM on $TIMEOUT. Maybe it's safer to also send KILL after a > while, and make it more verbose, e.g.: > > timeout -k <someinterval> -v $TIMEOUT .. > > > + CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ > > TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \ > > SKIP_UNSTABLE=$SKIP_UNSTABLE > > then > > @@ -147,8 +149,9 @@ function run_system_tests() > > local type=$1 > > local log_file=$2 > > > > - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \ > > - RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then > > + if ! sudo timeout $TIMEOUT make $JOBS $type \ > > + TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \ > > + SKIP_UNSTABLE=$SKIP_UNSTABLE; then > > Same comment here. > > > # $log_file is necessary for debugging. > > cat tests/$log_file > > return 1 > > diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml > > index f7a184c13..47085c4fe 100644 > > --- a/.github/workflows/test.yml > > +++ b/.github/workflows/test.yml > > @@ -168,11 +168,11 @@ jobs: > > > > - name: build > > if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }} > > - run: sudo -E ./.ci/ci.sh --archive-logs > > + run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h > > I think the current default for GitHub actions jobs is 6 hours. So it > makes sense to limit this further but maybe we should be more > conservative and make it 2h instead? > > > > > - name: build > > if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }} > > - run: ./.ci/ci.sh --archive-logs > > + run: ./.ci/ci.sh --archive-logs --timeout=1h > > > > Same here. > > > - name: upload logs on failure > > if: failure() || cancelled() > > Thanks, > Dumitru > > Thanks, Ales
diff --git a/.ci/ci.sh b/.ci/ci.sh index f543967dc..13df3db34 100755 --- a/.ci/ci.sh +++ b/.ci/ci.sh @@ -20,6 +20,7 @@ CONTAINER_CMD=${CONTAINER_CMD:-podman} CONTAINER_WORKSPACE="/workspace" CONTAINER_WORKDIR="/workspace/ovn-tmp" IMAGE_NAME=${IMAGE_NAME:-"ovn-org/ovn-tests"} +TIMEOUT=${TIMEOUT:-"0"} # Test variables ARCH=${ARCH:-$(uname -m)} @@ -100,7 +101,8 @@ function run_tests() { && \ ARCH=$ARCH CC=$CC LIBS=$LIBS OPTS=$OPTS TESTSUITE=$TESTSUITE \ TEST_RANGE=$TEST_RANGE SANITIZERS=$SANITIZERS DPDK=$DPDK \ - RECHECK=$RECHECK UNSTABLE=$UNSTABLE ./.ci/linux-build.sh + RECHECK=$RECHECK UNSTABLE=$UNSTABLE TIMEOUT=$TIMEOUT \ + ./.ci/linux-build.sh " } @@ -115,7 +117,7 @@ function check_clang_version_ge() { } options=$(getopt --options "" \ - --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:\ + --long help,shell,archive-logs,jobs:,ovn-path:,ovs-path:,image-name:,timeout:\ -- "${@}") eval set -- "$options" while true; do @@ -142,11 +144,15 @@ while true; do --archive-logs) archive_logs="1" ;; + --timeout) + shift + TIMEOUT="$1" + ;; --help) set +x printf "$0 [--shell] [--help] [--archive-logs] [--jobs=<JOBS>] " printf "[--ovn-path=<OVN_PATH>] [--ovs-path=<OVS_PATH>] " - printf "[--image-name=<IMAGE_NAME>]\n" + printf "[--image-name=<IMAGE_NAME>] [--timeout=<TIMEOUT>]\n" exit ;; --) diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh index 75a9480f9..7740f7a3f 100755 --- a/.ci/linux-build.sh +++ b/.ci/linux-build.sh @@ -10,6 +10,7 @@ OVN_CFLAGS="" OPTS="$OPTS --enable-Werror" JOBS=${JOBS:-"-j4"} RECHECK=${RECHECK:-"no"} +TIMEOUT=${TIMEOUT:="0"} function install_dpdk() { @@ -105,7 +106,8 @@ function configure_clang() function run_tests() { - if ! make distcheck CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ + if ! timeout $TIMEOUT make distcheck \ + CFLAGS="${COMMON_CFLAGS} ${OVN_CFLAGS}" $JOBS \ TESTSUITEFLAGS="$JOBS $TEST_RANGE" RECHECK=$RECHECK \ SKIP_UNSTABLE=$SKIP_UNSTABLE then @@ -147,8 +149,9 @@ function run_system_tests() local type=$1 local log_file=$2 - if ! sudo make $JOBS $type TESTSUITEFLAGS="$TEST_RANGE" \ - RECHECK=$RECHECK SKIP_UNSTABLE=$SKIP_UNSTABLE; then + if ! sudo timeout $TIMEOUT make $JOBS $type \ + TESTSUITEFLAGS="$TEST_RANGE" RECHECK=$RECHECK \ + SKIP_UNSTABLE=$SKIP_UNSTABLE; then # $log_file is necessary for debugging. cat tests/$log_file return 1 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index f7a184c13..47085c4fe 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -168,11 +168,11 @@ jobs: - name: build if: ${{ startsWith(matrix.cfg.testsuite, 'system-test') }} - run: sudo -E ./.ci/ci.sh --archive-logs + run: sudo -E ./.ci/ci.sh --archive-logs --timeout=1h - name: build if: ${{ !startsWith(matrix.cfg.testsuite, 'system-test') }} - run: ./.ci/ci.sh --archive-logs + run: ./.ci/ci.sh --archive-logs --timeout=1h - name: upload logs on failure if: failure() || cancelled()
The timeout command allows us to specify a timeout for each job in CI. This is handy because there might be a test that is not able to finish fro various reasons, last time it was forking nc process. At the same time having timeout over the make command allows us to preserve artifacts even if the timeout is reached so it might be easier to debug what is happening. Signed-off-by: Ales Musil <amusil@redhat.com> --- .ci/ci.sh | 12 +++++++++--- .ci/linux-build.sh | 9 ++++++--- .github/workflows/test.yml | 4 ++-- 3 files changed, 17 insertions(+), 8 deletions(-)