Message ID | 20241021113500.122500-1-thuth@redhat.com |
---|---|
State | New |
Headers | show |
On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote: > > The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954: > > Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-10-18 10:42:56 +0100) > > are available in the Git repository at: > > https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21 > > for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16: > > tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21 13:25:12 +0200) > > ---------------------------------------------------------------- > * Convert the Tuxrun Avocado tests to the new functional framework > * Update the OpenBSD CI image to OpenBSD v7.6 > * Bump timeout of the ide-test > * New maintainer for the QTests > * Disable the pci-bridge on s390x by default > > ---------------------------------------------------------------- Couple of failures on the functional-tests: https://gitlab.com/qemu-project/qemu/-/jobs/8140716604 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15 SIGTERM https://gitlab.com/qemu-project/qemu/-/jobs/8140716520 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough / func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15 SIGTERM I'm retrying to see if these are intermittent, but they suggest that we should bump the timeout for these. thanks -- PMM
On 21/10/2024 15.00, Peter Maydell wrote: > On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote: >> >> The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954: >> >> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-10-18 10:42:56 +0100) >> >> are available in the Git repository at: >> >> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21 >> >> for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16: >> >> tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21 13:25:12 +0200) >> >> ---------------------------------------------------------------- >> * Convert the Tuxrun Avocado tests to the new functional framework >> * Update the OpenBSD CI image to OpenBSD v7.6 >> * Bump timeout of the ide-test >> * New maintainer for the QTests >> * Disable the pci-bridge on s390x by default >> >> ---------------------------------------------------------------- > > Couple of failures on the functional-tests: > > https://gitlab.com/qemu-project/qemu/-/jobs/8140716604 > > 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / > func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15 > SIGTERM > > https://gitlab.com/qemu-project/qemu/-/jobs/8140716520 > > 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough / > func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15 > SIGTERM > > I'm retrying to see if these are intermittent, but they > suggest that we should bump the timeout for these. Everything was fine with the gitlab shared runners (https://gitlab.com/thuth/qemu/-/pipelines/1504882880), but yes, it's likely the private runners being slow again... So please don't merge it yet, I'll go through the jobs of the private runners and update the timeouts of the failed jobs and the ones where it is getting close to the limit. Thomas
On 21/10/2024 15.18, Thomas Huth wrote: > On 21/10/2024 15.00, Peter Maydell wrote: >> On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote: >>> >>> The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954: >>> >>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into >>> staging (2024-10-18 10:42:56 +0100) >>> >>> are available in the Git repository at: >>> >>> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21 >>> >>> for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16: >>> >>> tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21 >>> 13:25:12 +0200) >>> >>> ---------------------------------------------------------------- >>> * Convert the Tuxrun Avocado tests to the new functional framework >>> * Update the OpenBSD CI image to OpenBSD v7.6 >>> * Bump timeout of the ide-test >>> * New maintainer for the QTests >>> * Disable the pci-bridge on s390x by default >>> >>> ---------------------------------------------------------------- >> >> Couple of failures on the functional-tests: >> >> https://gitlab.com/qemu-project/qemu/-/jobs/8140716604 >> >> 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / >> func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15 >> SIGTERM >> >> https://gitlab.com/qemu-project/qemu/-/jobs/8140716520 >> >> 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough / >> func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15 >> SIGTERM >> >> I'm retrying to see if these are intermittent, but they >> suggest that we should bump the timeout for these. > > Everything was fine with the gitlab shared runners (https://gitlab.com/ > thuth/qemu/-/pipelines/1504882880), but yes, it's likely the private runners > being slow again... > > So please don't merge it yet, I'll go through the jobs of the private > runners and update the timeouts of the failed jobs and the ones where it is > getting close to the limit. Actually, looking at it again, the func-loongarch64-loongarch64_virt test is not a new one, this has been merged quite a while ago already. And in previous runs, it only took 6 - 10 seconds: https://gitlab.com/qemu-project/qemu/-/jobs/8125336852#L810 https://gitlab.com/qemu-project/qemu/-/jobs/8111434905#L740 So maybe this was just a temporary blip in the test runners indeed? Could you please try to rerun the jobs to see how long they take then? Thanks Thomas
On Mon, 21 Oct 2024 at 14:55, Thomas Huth <thuth@redhat.com> wrote: > > On 21/10/2024 15.18, Thomas Huth wrote: > > On 21/10/2024 15.00, Peter Maydell wrote: > >> On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote: > >>> > >>> The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954: > >>> > >>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into > >>> staging (2024-10-18 10:42:56 +0100) > >>> > >>> are available in the Git repository at: > >>> > >>> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21 > >>> > >>> for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16: > >>> > >>> tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21 > >>> 13:25:12 +0200) > >>> > >>> ---------------------------------------------------------------- > >>> * Convert the Tuxrun Avocado tests to the new functional framework > >>> * Update the OpenBSD CI image to OpenBSD v7.6 > >>> * Bump timeout of the ide-test > >>> * New maintainer for the QTests > >>> * Disable the pci-bridge on s390x by default > >>> > >>> ---------------------------------------------------------------- > >> > >> Couple of failures on the functional-tests: > >> > >> https://gitlab.com/qemu-project/qemu/-/jobs/8140716604 > >> > >> 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / > >> func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15 > >> SIGTERM > >> > >> https://gitlab.com/qemu-project/qemu/-/jobs/8140716520 > >> > >> 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough / > >> func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15 > >> SIGTERM > >> > >> I'm retrying to see if these are intermittent, but they > >> suggest that we should bump the timeout for these. > > > > Everything was fine with the gitlab shared runners (https://gitlab.com/ > > thuth/qemu/-/pipelines/1504882880), but yes, it's likely the private runners > > being slow again... > > > > So please don't merge it yet, I'll go through the jobs of the private > > runners and update the timeouts of the failed jobs and the ones where it is > > getting close to the limit. > > Actually, looking at it again, the func-loongarch64-loongarch64_virt test is > not a new one, this has been merged quite a while ago already. And in > previous runs, it only took 6 - 10 seconds: > > https://gitlab.com/qemu-project/qemu/-/jobs/8125336852#L810 > https://gitlab.com/qemu-project/qemu/-/jobs/8111434905#L740 > > So maybe this was just a temporary blip in the test runners indeed? Could > you please try to rerun the jobs to see how long they take then? The alpine job passed on the retry: https://gitlab.com/qemu-project/qemu/-/jobs/8141648479 and the func-loongarch64-loongarch64_virt test took 5.08s. The opensuse job failed again: https://gitlab.com/qemu-project/qemu/-/jobs/8141649069 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / func-aarch64-aarch64_tuxrun TIMEOUT 120.04s killed by signal 15 SIGTERM -- PMM
On 21/10/2024 15.59, Peter Maydell wrote: > On Mon, 21 Oct 2024 at 14:55, Thomas Huth <thuth@redhat.com> wrote: >> >> On 21/10/2024 15.18, Thomas Huth wrote: >>> On 21/10/2024 15.00, Peter Maydell wrote: >>>> On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote: >>>>> >>>>> The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954: >>>>> >>>>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into >>>>> staging (2024-10-18 10:42:56 +0100) >>>>> >>>>> are available in the Git repository at: >>>>> >>>>> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21 >>>>> >>>>> for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16: >>>>> >>>>> tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21 >>>>> 13:25:12 +0200) >>>>> >>>>> ---------------------------------------------------------------- >>>>> * Convert the Tuxrun Avocado tests to the new functional framework >>>>> * Update the OpenBSD CI image to OpenBSD v7.6 >>>>> * Bump timeout of the ide-test >>>>> * New maintainer for the QTests >>>>> * Disable the pci-bridge on s390x by default >>>>> >>>>> ---------------------------------------------------------------- >>>> >>>> Couple of failures on the functional-tests: >>>> >>>> https://gitlab.com/qemu-project/qemu/-/jobs/8140716604 >>>> >>>> 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / >>>> func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15 >>>> SIGTERM >>>> >>>> https://gitlab.com/qemu-project/qemu/-/jobs/8140716520 >>>> >>>> 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough / >>>> func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15 >>>> SIGTERM >>>> >>>> I'm retrying to see if these are intermittent, but they >>>> suggest that we should bump the timeout for these. >>> >>> Everything was fine with the gitlab shared runners (https://gitlab.com/ >>> thuth/qemu/-/pipelines/1504882880), but yes, it's likely the private runners >>> being slow again... >>> >>> So please don't merge it yet, I'll go through the jobs of the private >>> runners and update the timeouts of the failed jobs and the ones where it is >>> getting close to the limit. >> >> Actually, looking at it again, the func-loongarch64-loongarch64_virt test is >> not a new one, this has been merged quite a while ago already. And in >> previous runs, it only took 6 - 10 seconds: >> >> https://gitlab.com/qemu-project/qemu/-/jobs/8125336852#L810 >> https://gitlab.com/qemu-project/qemu/-/jobs/8111434905#L740 >> >> So maybe this was just a temporary blip in the test runners indeed? Could >> you please try to rerun the jobs to see how long they take then? > > The alpine job passed on the retry: > https://gitlab.com/qemu-project/qemu/-/jobs/8141648479 > and the func-loongarch64-loongarch64_virt test took 5.08s. > > The opensuse job failed again: > https://gitlab.com/qemu-project/qemu/-/jobs/8141649069 > 7/28 qemu:func-thorough+func-aarch64-thorough+thorough / > func-aarch64-aarch64_tuxrun TIMEOUT 120.04s killed by signal 15 > SIGTERM Looking at the log files of the job, I can see in https://gitlab.com/qemu-project/qemu/-/jobs/8141649069/artifacts/browse/build/tests/functional/aarch64/test_aarch64_tuxrun.TuxRunAarch64Test.test_arm64be/ console.log: 2024-10-21 13:20:32,844: Run /sbin/init as init process 2024-10-21 13:20:34,043: EXT4-fs (vda): re-mounted. Opts: (null). Quota mode: none. 2024-10-21 13:20:34,350: Starting syslogd: OK 2024-10-21 13:20:34,423: Starting klogd: OK 2024-10-21 13:20:34,667: Running sysctl: OK 2024-10-21 13:20:34,739: Saving 2048 bits of non-creditable seed for next boot 2024-10-21 13:20:34,966: Starting network: blk_update_request: I/O error, dev vda, sector 5824 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,028: blk_update_request: I/O error, dev vda, sector 8848 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,051: OK 2024-10-21 13:20:35,088: blk_update_request: I/O error, dev vda, sector 12936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,149: blk_update_request: I/O error, dev vda, sector 17032 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,181: Welcome to TuxTest 2024-10-21 13:20:35,882: tuxtest login: blk_update_request: I/O error, dev vda, sector 21128 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector 25224 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector 29320 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 2024-10-21 13:20:35,887: root So this is indeed more than just a timeout setting that is too small... I don't get the virtio errors when running the test locally, though. I guess this needs some more investigation first ... maybe best if I respin the PR without this patch for now 'til this is understood and fixed. Thomas
On Mon, 21 Oct 2024 at 15:11, Thomas Huth <thuth@redhat.com> wrote: > Looking at the log files of the job, I can see in > https://gitlab.com/qemu-project/qemu/-/jobs/8141649069/artifacts/browse/build/tests/functional/aarch64/test_aarch64_tuxrun.TuxRunAarch64Test.test_arm64be/ > console.log: > > 2024-10-21 13:20:32,844: Run /sbin/init as init process > 2024-10-21 13:20:34,043: EXT4-fs (vda): re-mounted. Opts: (null). Quota > mode: none. > 2024-10-21 13:20:34,350: Starting syslogd: OK > 2024-10-21 13:20:34,423: Starting klogd: OK > 2024-10-21 13:20:34,667: Running sysctl: OK > 2024-10-21 13:20:34,739: Saving 2048 bits of non-creditable seed for next boot > 2024-10-21 13:20:34,966: Starting network: blk_update_request: I/O error, > dev vda, sector 5824 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,028: blk_update_request: I/O error, dev vda, sector 8848 > op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,051: OK > 2024-10-21 13:20:35,088: blk_update_request: I/O error, dev vda, sector > 12936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,149: blk_update_request: I/O error, dev vda, sector > 17032 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,181: Welcome to TuxTest > 2024-10-21 13:20:35,882: tuxtest login: blk_update_request: I/O error, dev > vda, sector 21128 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector > 25224 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector > 29320 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 > 2024-10-21 13:20:35,887: root > > So this is indeed more than just a timeout setting that is too small... > I don't get the virtio errors when running the test locally, though. > I guess this needs some more investigation first ... maybe best if I respin > the PR without this patch for now 'til this is understood and fixed. I guess big-endian is one of the setups most likely to be broken :-) -- PMM
On 21/10/2024 17.39, Peter Maydell wrote: > On Mon, 21 Oct 2024 at 15:11, Thomas Huth <thuth@redhat.com> wrote: >> Looking at the log files of the job, I can see in >> https://gitlab.com/qemu-project/qemu/-/jobs/8141649069/artifacts/browse/build/tests/functional/aarch64/test_aarch64_tuxrun.TuxRunAarch64Test.test_arm64be/ >> console.log: >> >> 2024-10-21 13:20:32,844: Run /sbin/init as init process >> 2024-10-21 13:20:34,043: EXT4-fs (vda): re-mounted. Opts: (null). Quota >> mode: none. >> 2024-10-21 13:20:34,350: Starting syslogd: OK >> 2024-10-21 13:20:34,423: Starting klogd: OK >> 2024-10-21 13:20:34,667: Running sysctl: OK >> 2024-10-21 13:20:34,739: Saving 2048 bits of non-creditable seed for next boot >> 2024-10-21 13:20:34,966: Starting network: blk_update_request: I/O error, >> dev vda, sector 5824 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,028: blk_update_request: I/O error, dev vda, sector 8848 >> op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,051: OK >> 2024-10-21 13:20:35,088: blk_update_request: I/O error, dev vda, sector >> 12936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,149: blk_update_request: I/O error, dev vda, sector >> 17032 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,181: Welcome to TuxTest >> 2024-10-21 13:20:35,882: tuxtest login: blk_update_request: I/O error, dev >> vda, sector 21128 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector >> 25224 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector >> 29320 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0 >> 2024-10-21 13:20:35,887: root >> >> So this is indeed more than just a timeout setting that is too small... >> I don't get the virtio errors when running the test locally, though. >> I guess this needs some more investigation first ... maybe best if I respin >> the PR without this patch for now 'til this is understood and fixed. > > I guess big-endian is one of the setups most likely to be > broken :-) The weird thing is that the old version of the test (avocado based) still seems to work fine. And if I run the test locally, I'm also sometimes seeing these errors in the console.log now, but they occur just later, so the test still finishs successfully... I'll try to have a closer look later, but I currently don't have time for such debugging :-( Thomas