mbox series

[v7,0/4] GitLab Custom Runners and Jobs (was: QEMU Gating CI)

Message ID 20210630012619.115262-1-crosa@redhat.com
Headers show
Series GitLab Custom Runners and Jobs (was: QEMU Gating CI) | expand

Message

Cleber Rosa June 30, 2021, 1:26 a.m. UTC
TL;DR: this should allow the QEMU maintainer to push to the staging
branch, and have custom jobs running on the project's aarch64 and
s390x machines.  Jobs in this version are allowed to fail, to allow
for the inclusion of the novel machines/jobs without CI disruption.
Simple usage looks like:

   git push remote staging
   ./scripts/ci/gitlab-pipeline-status --verbose --wait

Long version:

The idea about a public facing Gating CI for QEMU was summarized in an
RFC[1].  Since then, it was decided that a simpler version should be
attempted first.

At this point, there are two specific runners (an aarch64 and an s390x)
registered with GitLab, at https://gitlab.com/qemu-project, currently
setup to the "qemu" repository.  To make it extra clear, the following
points deserve notice:

 - This work dealt with two different QEMU project machines:
    I) s390x running Ubuntu 18.04
   II) aarch64 running Ubuntu 20.04

 - All CI jobs introduced here are allowed to fail.  It should have
   no impact on the overall PASS/FAIL result of the pipeline.

 - Both machines have already been completely configured used the code
   on this series.  No further action (besides pushing this code to
   the staging branch) is needed to have CI jobs executed on them.

 - The actual CI jobs to be executed are defined in this series,
   and map as best as possible the tests run by Peter Maydell on
   the staging branch.  Peter, given the time delta since the
   beginning of this work, it may be needed to further tweak
   them.

 - The actual results of CI jobs run from these definitions are
   probably more fluid than the results from the CI jobs run on the
   shared runners.  Both changes to the code being tested, and the
   conditions/resources of the machine should be taken into account.

 - A pipeline, and some of the jobs, can be seen on the links bellow.
   Besides successful jobs, it also contains jobs not started (set to
   manual, with the specific reasons noted later on the changes
   section):

   * https://gitlab.com/cleber.gnu/qemu/-/pipelines/316471691
    - ubuntu-18.04-s390x-all: https://gitlab.com/cleber.gnu/qemu/-/jobs/1325698118
    - ubuntu-20.04-aarch64-all: https://gitlab.com/cleber.gnu/qemu/-/jobs/1325698124

Changes from v6:

 - Added minimum ansible version check to the build-environment.yml
   playbook, and a note to the documentation about the same
   requirement.

 - Removed FreeBSD from the list of covered systems in the
   documentation, and added a note saying that other systems can be
   supported by extending the current playbook.

 - Added note to documentation about required privileges to execute the
   playbook's tasks, and ansible-playbook options that can help with
   that.

 - Added a "apt upgrade" along with the "apt cache update" to reduce
   possible conflicts during package installation.

 - Removed note about custom builds of runners host at
   https://cleber.fedorapeople.org/gitlab-runner/, now that all
   architectures covered here have official builds.

 - Added more detailed instructions to the documentation on where to
   find the CI/CD settings.

 - Updated the header for the runner token page matching the current
   GitLab UI.

 - Added inventory to .gitignore.

 - Added missing newline to .gitignore.

 - Added comments with descriptions to both playbook files.

 - Added note about contacts for machine maintainers section on the
   QEMU wiki (https://wiki.qemu.org/AdminContacts).

Changes from v5:

 - Moved include of ".gitlab-ci.d/custom-runners.yml" from
   ".gitlab-ci.yml" to ".gitlab-ci.d/qemu-project.yml"

 - Changed git clone strategy from "submodule recursive" to "clone",
   to guarantee a fresh and clean repo on every job, and let
   QEMU handle eventual needed recursive submodule operations

 - Require user to create an Ansible inventory file, based on the
   now provided "inventory.template" one.  Previously, the
   "iventory" file itself was provided and users were asked to edit
   it.

 - Registered runners will now be given a default set of tags, with
   their OS and architectures.  This, besides automating another step,
   works around a "gitlab-runner register" command line issue with
   "--run-untagged=false" not being respected if no tags are given.

 - Added conditional for installing either "clang" or "clang-10" to
   match the package name on different versions of Ubuntu.

 - Changed Ubuntu 20.04 jobs to use clang-10 instead of clang.

 - Defaults to not running untagged jobs when registering a gitlab
   runner.

 - Added python3-sphinx-rtd-theme package installation to
   build-environment.yml, to match 73e6aec6522.

 - Added genisoimage package installation to build-environment.yml
   when on Ubuntu 20.04 (not on 18.04) to match 7e86e5d5ccc.

 - Added liblttng-ust-dev package installation to
   build-environment.yml when on Ubuntu 20.04 (not on 18.04) to match
   8e9419b7902.

 - Added libslirp-dev package installation to build-environment.yml
   when on Ubuntu 20.04 (not on 18.04) to match 8e9419b7902.

 - Added netcat-openbsd package installation to build-environment.yml
   when on Ubuntu 20.04 (not on 18.04) to match c4cb1c9f2e1.

 - Bumped gitlab-runner version to 13.12.0

 - Use only gitlab-runner binaries from the official distribution (they
   are now also available for s390x, previously missing).

 - Selection of the OS and architecture for the gitlab-runner binary
   is now done automatically.  If necessary, architecture chosen can be
   influenced by editing the "ansible_to_gitlab_arch" values on vars.yml
   (after you've created one from the vars.yml.template).

 - Marked job "ubuntu-18.04-s390x-clang" as manual, given that
   the latest executions are getting stuck with the last
   output being "Sanitizers are enabled ==> Not running the
   qemu-iotests."

 - Marked job "ubuntu-18.04-s390x-notcg" as manual, given that the
   latest executions are getting stuck with the last tests being
   "tests/qtest/pxe-test" and "tests/qtest/boot-serial-test" which
   contains commands lines with "-accel kvm -accel tcg" and "-accel
   qtest".  Obviously tcg is disabled and KVM is not available on the
   QEMU s390x machine, and the fallback to qtest gets the tests stuck.

 - Marked job "ubuntu-20.04-aarch64-notcg" as manual due to build
   failures (FAILED: libqemu-aarch64-softmmu.fa.p/target_arm_psci.c.o).

 - Marked job "ubuntu-20.04-aarch64-clang" as manual due to test
   failures:
      ../tests/qtest/libqtest.c:157: kill_qemu() tried to terminate
      QEMU process but encountered exit status 1 (expected 0)
      ERROR qtest-aarch64/qom-test - too few tests run (expected 80, got 6)

 - Cleared Daniel's R-B due to the number of changes in the
   corresponding patches.

Known issues:

 - tests/qtest/qos-test has been seen stuck a number of times on both
   the s390x and aarch64 machines.  Further investigation is needed.
   While a resolution is found, this should not affect the pipeline,
   given that all new jobs introduced here are currently allowed to
   fail.

Next development / discussion points:

 - Test docker executor
 - Test docker executor with podman
 - Configuration of more accurate job timeouts
 - Historical result analysis
 - Promotion of jobs (removing the "allow failure" setting)

Changes from v4:

 - Fixed typo in docs/devel/ci.rst, s/maintanance/maintenance/ (Thomas)
 - Removed "[local]" group from inventory file (Erik)
 - Removed sections from the playbooks which *would* be applied on
   hardware/OS that are currently not available to QEMU
 - Removed duplicated "here" on documentation (Thomas)
 - Moved description of current jobs, and possible direction of future
   jobs to the patch description (Thomas)
 - Remove comments around "when" conditions (Andrea)
 - Switch to always use explicit lists on "when" blocks (Andrea)
 - Switch from using module "apt" to using generic action module "package",
   which involved adding a new task to update the apt cache (Andrea)
 - Fix playbook indentation in the non-s390x package installation task (Andrea)
 - Changed gitlab-runner tags examples from FreeBSD to Ubuntu, which is
   covered by jobs added on this version
 - Fixed typo in commit message s/s390/s390x/ (Phil)
 - Allow all custom-runner jobs to fail at this time
 - Cleared "Reviewed-by" in one patch due to large changes

  Changes requested in v4 but *not* seen here due to sections of the
  playbook being removed:

 - Replace SDL-devel for SDL2-devel on CentOS, according to 5ed7ca3 (Thomas)
 - Correct missing step 10 on the FreeBSD gitlab-runner installation
   instructions (Erik)

Changes from v3:

- Applied changes to match <20201014135416.1290679-1-pbonzini@redhat.com>,
  that is, added ninja-build to "build-environment.yml" list of packages
  and enabled PowerTools repository on CentOS 8.

Changes from v2:

- The overall idea of "Gating CI" has been re-worded "custom runners",
  given that the other jobs running on shared runners are also
  considered gating (Daniel)

- Fixed wording and typos on the documentation, including:
 * update -> up to date (Erik)
 * a different set of CI jobs -> different CI jobs (Erik)
 * Pull requests will only be merged -> code will only be merged (Stefan)
 * Setup -> set up (Stefan)
 * them -> they (Stefan)
 * the -> where the (Stefan)
 * dropped "in the near future" (Stefan)

- Changed comment on "build-environment.yml" regarding the origin of
  the package list (Stefan)

- Removed inclusion of "vars.yml" from "build-environment.yml", given that
  no external variable is used there

- Updated package list in "build-environment.yml" from current
  dockerfiles

- Tested "build-environment" on Fedora 31 and 32 (in addition to Fedora 30),
  and noted that it's possible to use it on those distros

- Moved CI documentation from "testing.rst" to its own file (Phillipe)

- Split "GitLab Gating CI: initial set of jobs, documentation and scripts"
  into (Phillipe):
  1) Basic documentation and configuration (gitlab-ci.yml) placeholder
  2) Playbooks for setting up a build environment
  3) Playbooks for setting up gitlab-runner
  4) Actual GitLab CI jobs configuration

- Set custom jobs to be on the "build" stage, given that they combine
  build and test.

- Set custom jobs to not depend on any other job, so they can start
  right away.

- Set rules for starting jobs so that all pushing to any branch that
  start with name "staging".  This allows the project maintainer to
  use the "push to staging" workflow, while also allowing others to
  generate similar jobs.  If this project has configured custom
  runners, the jobs will run, if not, the pipeline will be marked as
  "stuck".

- Changed "scripts" on custom jobs to follow the now common pattern
  (on other jobs) of creating a "build" directory.

Changes from v1:

- Added jobs that require specific GitLab runners already available
  (Ubuntu 20.04 on aarch64, and Ubuntu 18.04 on s390x)
- Removed jobs that require specific GitLab runners not yet available
  (Fedora 30, FreeBSD 12.1)
- Updated documentation
- Added copyright and license to new scripts
- Moved script to from "contrib" to "scripts/ci/"
- Moved setup playbooks form "contrib" to "scripts/ci/setup"
- Moved "gating.yml" to ".gitlab-ci.d" directory
- Removed "staging" only branch restriction on jobs defined in
  ".gitlab-ci.yml", assumes that the additional jobs on the staging
  branch running on the freely available gitlab shared runner are
  positive
- Dropped patches 1-3 (already merged)
- Simplified amount of version specifity on Ubuntu, from 18.04.3 to
  simply 18.04 (assumes no diverse minor levels will be used or
  specific runners)

Changes from the RFC patches[2] accompanying the RFC document:

- Moved gating job definitions to .gitlab-ci-gating.yml
- Added info on "--disable-libssh" build option requirement
  (https://bugs.launchpad.net/qemu/+bug/1838763) to Ubuntu 18.04 jobs
- Added info on "--disable-glusterfs" build option requirement
  (there's no static version of those libs in distro supplied
  packages) to one
- Dropped ubuntu-18.04.3-x86_64-notools job definition, because it
  doesn't fall into the general scope of gating job described by PMM
  (and it did not run any test)
- Added w32 and w64 cross builds based on Fedora 30
- Added a FreeBSD based job that builds all targets and runs `make
  check`
- Added "-j`nproc`" and "-j`sysctl -n hw.ncpu`" options to make as a
  simple but effective way of speeding up the builds and tests by
  using a number of make jobs matching the number of CPUs
- Because the Ansible playbooks reference the content on Dockerfiles,
  some fixes to some Dockerfiles caught in the process were included
- New patch with script to check or wait on a pipeline execution

[1] - https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg00231.html
[2] - https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg00154.html

Cleber Rosa (4):
  Jobs based on custom runners: documentation and configuration
    placeholder
  Jobs based on custom runners: build environment docs and playbook
  Jobs based on custom runners: docs and gitlab-runner setup playbook
  Jobs based on custom runners: add job definitions for QEMU's machines

 .gitlab-ci.d/custom-runners.yml        | 222 +++++++++++++++++++++++++
 .gitlab-ci.d/qemu-project.yml          |   1 +
 docs/devel/ci.rst                      | 127 ++++++++++++++
 docs/devel/index.rst                   |   1 +
 scripts/ci/setup/.gitignore            |   2 +
 scripts/ci/setup/build-environment.yml | 116 +++++++++++++
 scripts/ci/setup/gitlab-runner.yml     |  71 ++++++++
 scripts/ci/setup/inventory.template    |   1 +
 scripts/ci/setup/vars.yml.template     |  12 ++
 9 files changed, 553 insertions(+)
 create mode 100644 .gitlab-ci.d/custom-runners.yml
 create mode 100644 docs/devel/ci.rst
 create mode 100644 scripts/ci/setup/.gitignore
 create mode 100644 scripts/ci/setup/build-environment.yml
 create mode 100644 scripts/ci/setup/gitlab-runner.yml
 create mode 100644 scripts/ci/setup/inventory.template
 create mode 100644 scripts/ci/setup/vars.yml.template

Comments

Alex Bennée July 2, 2021, 11:02 a.m. UTC | #1
Cleber Rosa <crosa@redhat.com> writes:

> TL;DR: this should allow the QEMU maintainer to push to the staging
> branch, and have custom jobs running on the project's aarch64 and
> s390x machines.  Jobs in this version are allowed to fail, to allow
> for the inclusion of the novel machines/jobs without CI disruption.
> Simple usage looks like:
>
>    git push remote staging
>    ./scripts/ci/gitlab-pipeline-status --verbose --wait

Queued to testing/next, thanks.