diff mbox series

[RFC] support/misc/gitlab-ci.yml.in: retry a job only if it failed due to a runner issue

Message ID 20230826210011.39269-1-romain.naour@gmail.com
State Accepted
Headers show
Series [RFC] support/misc/gitlab-ci.yml.in: retry a job only if it failed due to a runner issue | expand

Commit Message

Romain Naour Aug. 26, 2023, 9 p.m. UTC
Each time a new pipeline is trigged, some jobs may fail due to temporary
issue with a Gitlab runner (network, power supply, docker or maintainance).

Most of the problems are "runner system failure" [1] and requires to retart
each failed jobs manually by maintainers to complete the pipeline with only
real failure if any.

The "retry" keyword allows to configure how many time a job is retried if
it fails. "retry:when" allows to retry a failed job only on specific
failure types like "runner_system_failure".

While at it, retry a job if it failed due to a timeout failure (this
timeout means that the job was pending for more than 24h) [2].

Such timeout failure occurs on pipeline testing each Buildroot's defconfig
since there is not enough gitlab runner avaible to build all of them
within 24h.

Retry only jobs that are more likely to wait for a runner
(generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg).

[1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure)
[2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck)

https://docs.gitlab.com/ee/ci/yaml/#retrywhen

Signed-off-by: Romain Naour <romain.naour@gmail.com>
Cc: Arnout Vandecappelle <arnout@mind.be>
---
 .gitlab-ci.yml                |  5 +++++
 support/misc/gitlab-ci.yml.in | 15 +++++++++++++++
 2 files changed, 20 insertions(+)

Comments

Thomas Petazzoni Aug. 27, 2023, 8:11 a.m. UTC | #1
On Sat, 26 Aug 2023 23:00:11 +0200
Romain Naour <romain.naour@gmail.com> wrote:

> Each time a new pipeline is trigged, some jobs may fail due to temporary
> issue with a Gitlab runner (network, power supply, docker or maintainance).
> 
> Most of the problems are "runner system failure" [1] and requires to retart
> each failed jobs manually by maintainers to complete the pipeline with only
> real failure if any.
> 
> The "retry" keyword allows to configure how many time a job is retried if
> it fails. "retry:when" allows to retry a failed job only on specific
> failure types like "runner_system_failure".
> 
> While at it, retry a job if it failed due to a timeout failure (this
> timeout means that the job was pending for more than 24h) [2].
> 
> Such timeout failure occurs on pipeline testing each Buildroot's defconfig
> since there is not enough gitlab runner avaible to build all of them
> within 24h.
> 
> Retry only jobs that are more likely to wait for a runner
> (generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg).
> 
> [1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure)
> [2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck)
> 
> https://docs.gitlab.com/ee/ci/yaml/#retrywhen
> 
> Signed-off-by: Romain Naour <romain.naour@gmail.com>
> Cc: Arnout Vandecappelle <arnout@mind.be>
> ---
>  .gitlab-ci.yml                |  5 +++++
>  support/misc/gitlab-ci.yml.in | 15 +++++++++++++++
>  2 files changed, 20 insertions(+)

Excellent! I hope this will improve the reliability of Gitlab results.
I've applied to master, after fixing some minor typos in the commit
log. Thanks a lot!

Thomas
Peter Korsgaard Sept. 13, 2023, 7:28 p.m. UTC | #2
>>>>> "Romain" == Romain Naour <romain.naour@gmail.com> writes:

 > Each time a new pipeline is trigged, some jobs may fail due to temporary
 > issue with a Gitlab runner (network, power supply, docker or maintainance).

 > Most of the problems are "runner system failure" [1] and requires to retart
 > each failed jobs manually by maintainers to complete the pipeline with only
 > real failure if any.

 > The "retry" keyword allows to configure how many time a job is retried if
 > it fails. "retry:when" allows to retry a failed job only on specific
 > failure types like "runner_system_failure".

 > While at it, retry a job if it failed due to a timeout failure (this
 > timeout means that the job was pending for more than 24h) [2].

 > Such timeout failure occurs on pipeline testing each Buildroot's defconfig
 > since there is not enough gitlab runner avaible to build all of them
 > within 24h.

 > Retry only jobs that are more likely to wait for a runner
 > (generate-gitlab-ci-yml, runtime_test_base, defconfig_base and test_pkg).

 > [1] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949397 (runner system failure)
 > [2] https://gitlab.com/buildroot.org/buildroot/-/jobs/4936949530 (timeout failure or the job got stuck)

 > https://docs.gitlab.com/ee/ci/yaml/#retrywhen

 > Signed-off-by: Romain Naour <romain.naour@gmail.com>
 > Cc: Arnout Vandecappelle <arnout@mind.be>

Committed to 2023.02.x and 2023.05.x, thanks.
diff mbox series

Patch

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index ed17bb14b9..3d7719568f 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -10,6 +10,11 @@  stages:
 generate-gitlab-ci-yml:
   stage: generate-gitlab-ci
   script: ./support/scripts/generate-gitlab-ci-yml support/misc/gitlab-ci.yml.in > generated-gitlab-ci.yml
+  retry:
+    max: 2
+    when:
+      - runner_system_failure
+      - stuck_or_timeout_failure
   artifacts:
     when: always
     paths:
diff --git a/support/misc/gitlab-ci.yml.in b/support/misc/gitlab-ci.yml.in
index 446132846f..4d9acbc3d3 100644
--- a/support/misc/gitlab-ci.yml.in
+++ b/support/misc/gitlab-ci.yml.in
@@ -67,6 +67,11 @@  before_script:
                 tail -200 runtime-test.log
                 exit 1
             }
+    retry:
+        max: 2
+        when:
+            - runner_system_failure
+            - stuck_or_timeout_failure
     artifacts:
         when: always
         expire_in: 2 weeks
@@ -99,6 +104,11 @@  before_script:
         - TEST_CASE_NAME=${CI_JOB_NAME}
         - echo "Starting runtime test ${TEST_CASE_NAME}"
         - ./support/testing/run-tests -o test-output/ -d test-dl/ -k --timeout-multiplier 10 ${TEST_CASE_NAME}
+    retry:
+        max: 2
+        when:
+            - runner_system_failure
+            - stuck_or_timeout_failure
     artifacts:
         when: always
         expire_in: 2 weeks
@@ -119,6 +129,11 @@  before_script:
     needs:
         - pipeline: $PARENT_PIPELINE_ID
           job: generate-gitlab-ci-yml
+    retry:
+        max: 2
+        when:
+            - runner_system_failure
+            - stuck_or_timeout_failure
     artifacts:
         when: always
         expire_in: 2 weeks