Message ID | 20230826210011.39269-1-romain.naour@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | [RFC] support/misc/gitlab-ci.yml.in: retry a job only if it failed due to a runner issue | expand |
On Sat, 26 Aug 2023 23:00:11 +0200 Romain Naour <romain.naour@gmail.com> wrote: > Each time a new pipeline is trigged, some jobs may fail due to temporary > issue with a Gitlab runner (network, power supply, docker or maintainance). > > Most of the problems are "runner system failure" [1] and requires to retart > each failed jobs manually by maintainers to complete the pipeline with only > real failure if any. > > The "retry" keyword allows to configure how many time a job is retried if > it fails. "retry:when" allows to retry a failed job only on specific > failure types like "runner_system_failure". > > While at it, retry a job if it failed due to a timeout failure (this > timeout means that the job was pending for more than 24h) [2]. > > Such timeout failure occurs on pipeline testing each Buildroot's defconfig > since there is not enough gitlab runner avaible to build all of them > within 24h. > > Retry only jobs that are more likely to wait for a runner > (generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg). > > [1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure) > [2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck) > > https://docs.gitlab.com/ee/ci/yaml/#retrywhen > > Signed-off-by: Romain Naour <romain.naour@gmail.com> > Cc: Arnout Vandecappelle <arnout@mind.be> > --- > .gitlab-ci.yml | 5 +++++ > support/misc/gitlab-ci.yml.in | 15 +++++++++++++++ > 2 files changed, 20 insertions(+) Excellent! I hope this will improve the reliability of Gitlab results. I've applied to master, after fixing some minor typos in the commit log. Thanks a lot! Thomas
>>>>> "Romain" == Romain Naour <romain.naour@gmail.com> writes: > Each time a new pipeline is trigged, some jobs may fail due to temporary > issue with a Gitlab runner (network, power supply, docker or maintainance). > Most of the problems are "runner system failure" [1] and requires to retart > each failed jobs manually by maintainers to complete the pipeline with only > real failure if any. > The "retry" keyword allows to configure how many time a job is retried if > it fails. "retry:when" allows to retry a failed job only on specific > failure types like "runner_system_failure". > While at it, retry a job if it failed due to a timeout failure (this > timeout means that the job was pending for more than 24h) [2]. > Such timeout failure occurs on pipeline testing each Buildroot's defconfig > since there is not enough gitlab runner avaible to build all of them > within 24h. > Retry only jobs that are more likely to wait for a runner > (generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg). > [1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure) > [2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck) > https://docs.gitlab.com/ee/ci/yaml/#retrywhen > Signed-off-by: Romain Naour <romain.naour@gmail.com> > Cc: Arnout Vandecappelle <arnout@mind.be> Committed to 2023.02.x and 2023.05.x, thanks.
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index ed17bb14b9..3d7719568f 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -10,6 +10,11 @@ stages: generate-gitlab-ci-yml: stage: generate-gitlab-ci script: ./support/scripts/generate-gitlab-ci-yml support/misc/gitlab-ci.yml.in > generated-gitlab-ci.yml + retry: + max: 2 + when: + - runner_system_failure + - stuck_or_timeout_failure artifacts: when: always paths: diff --git a/support/misc/gitlab-ci.yml.in b/support/misc/gitlab-ci.yml.in index 446132846f..4d9acbc3d3 100644 --- a/support/misc/gitlab-ci.yml.in +++ b/support/misc/gitlab-ci.yml.in @@ -67,6 +67,11 @@ before_script: tail -200 runtime-test.log exit 1 } + retry: + max: 2 + when: + - runner_system_failure + - stuck_or_timeout_failure artifacts: when: always expire_in: 2 weeks @@ -99,6 +104,11 @@ before_script: - TEST_CASE_NAME=${CI_JOB_NAME} - echo "Starting runtime test ${TEST_CASE_NAME}" - ./support/testing/run-tests -o test-output/ -d test-dl/ -k --timeout-multiplier 10 ${TEST_CASE_NAME} + retry: + max: 2 + when: + - runner_system_failure + - stuck_or_timeout_failure artifacts: when: always expire_in: 2 weeks @@ -119,6 +129,11 @@ before_script: needs: - pipeline: $PARENT_PIPELINE_ID job: generate-gitlab-ci-yml + retry: + max: 2 + when: + - runner_system_failure + - stuck_or_timeout_failure artifacts: when: always expire_in: 2 weeks
Each time a new pipeline is trigged, some jobs may fail due to temporary issue with a Gitlab runner (network, power supply, docker or maintainance). Most of the problems are "runner system failure" [1] and requires to retart each failed jobs manually by maintainers to complete the pipeline with only real failure if any. The "retry" keyword allows to configure how many time a job is retried if it fails. "retry:when" allows to retry a failed job only on specific failure types like "runner_system_failure". While at it, retry a job if it failed due to a timeout failure (this timeout means that the job was pending for more than 24h) [2]. Such timeout failure occurs on pipeline testing each Buildroot's defconfig since there is not enough gitlab runner avaible to build all of them within 24h. Retry only jobs that are more likely to wait for a runner (generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg). [1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure) [2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck) https://docs.gitlab.com/ee/ci/yaml/#retrywhen Signed-off-by: Romain Naour <romain.naour@gmail.com> Cc: Arnout Vandecappelle <arnout@mind.be> --- .gitlab-ci.yml | 5 +++++ support/misc/gitlab-ci.yml.in | 15 +++++++++++++++ 2 files changed, 20 insertions(+)