Message ID | 20190117185628.21862-16-crosa@redhat.com |
---|---|
State | New |
Headers | show |
Series | Acceptance Tests: target architecture support | expand |
On Thu, Jan 17, 2019 at 01:56:25PM -0500, Cleber Rosa wrote: > Just like the previous tests, boots a Linux kernel on a aarch64 target > using the virt machine. > > One special option added is the CPU type, given that the kernel > selected fails to boot on the virt machine's default CPU (cortex-a15). > > Signed-off-by: Cleber Rosa <crosa@redhat.com> Reviewed-by: Caio Carrara <ccarrara@redhat.com> > --- > .travis.yml | 2 +- > tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+), 1 deletion(-) > {...} >
On 01/17/2019 04:56 PM, Cleber Rosa wrote: > Just like the previous tests, boots a Linux kernel on a aarch64 target > using the virt machine. > > One special option added is the CPU type, given that the kernel > selected fails to boot on the virt machine's default CPU (cortex-a15). > > Signed-off-by: Cleber Rosa <crosa@redhat.com> > --- > .travis.yml | 2 +- > tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/.travis.yml b/.travis.yml > index 54100eea5a..595e8c0b6c 100644 > --- a/.travis.yml > +++ b/.travis.yml > @@ -187,7 +187,7 @@ matrix: > > # Acceptance (Functional) tests > - env: > - - CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > + - CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > - TEST_CMD="make check-acceptance" > addons: > apt: > diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py > index f3ccd23a7a..107700b517 100644 > --- a/tests/acceptance/boot_linux_console.py > +++ b/tests/acceptance/boot_linux_console.py > @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > self.vm.launch() > console_pattern = 'Kernel command line: %s' % kernel_command_line > self.wait_for_console_pattern(console_pattern) > + > + def test_aarch64_virt(self): That test case fails on my system (Fedora 29 x86_64). Avocado seems unable to kill the VM so it reaches the timeout. I compiled QEMU with default configuration: $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) Follows a snippet of the Avocado's job.log file: ---- 2019-01-31 14:41:34,912 test L0602 INFO | START 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA (filename=output.expected) => NOT FOUND (data sources: variant, test, file) 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, path=*, default=aarch64) => 'aarch64' 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => 'aarch64-softmmu/qemu-system-aarch64' 2019-01-31 14:41:34,915 download L0070 INFO | Fetching https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: 'aarch64-softmmu/qemu-system-aarch64 -chardev socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon chardev=mon,mode=control -display none -vga none -machine virt -chardev socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait -serial chardev:console -cpu cortex-a53 -kernel /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': 'qmp_capabilities'} 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034] (...) 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] Policy zone: DMA32 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] Kernel command line: console=ttyAMA0 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': 'quit'} 2019-01-31 14:44:35,636 qemu L0357 WARNI| qemu received signal -9: aarch64-softmmu/qemu-system-aarch64 -chardev socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon chardev=mon,mode=control -display none -vga none -machine virt -chardev socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait -serial chardev:console -cpu cortex-a53 -kernel /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0 (...) 2019-01-31 14:44:35,663 runner L0253 ERROR| ERROR Test reported status but did not finish -> TestAbortError: 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt. 2019-01-31 14:44:35,664 runner L0062 ERROR| Runner error occurred: Timeout reached ---- - Wainer > + """ > + :avocado: tags=arch:aarch64 > + :avocado: tags=machine:virt > + """ > + kernel_url = ('https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/' > + 'releases/29/Server/aarch64/os/images/pxeboot/vmlinuz') > + kernel_hash = '8c73e469fc6ea06a58dc83a628fc695b693b8493' > + kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash) > + > + self.vm.set_machine('virt') > + self.vm.set_console() > + kernel_command_line = 'console=ttyAMA0' > + self.vm.add_args('-cpu', 'cortex-a53', > + '-kernel', kernel_path, > + '-append', kernel_command_line) > + self.vm.launch() > + console_pattern = 'Kernel command line: %s' % kernel_command_line > + self.wait_for_console_pattern(console_pattern)
On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > On 01/17/2019 04:56 PM, Cleber Rosa wrote: >> Just like the previous tests, boots a Linux kernel on a aarch64 target >> using the virt machine. >> >> One special option added is the CPU type, given that the kernel >> selected fails to boot on the virt machine's default CPU (cortex-a15). >> >> Signed-off-by: Cleber Rosa <crosa@redhat.com> >> --- >> .travis.yml | 2 +- >> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ >> 2 files changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/.travis.yml b/.travis.yml >> index 54100eea5a..595e8c0b6c 100644 >> --- a/.travis.yml >> +++ b/.travis.yml >> @@ -187,7 +187,7 @@ matrix: >> # Acceptance (Functional) tests >> - env: >> - - CONFIG="--python=/usr/bin/python3 >> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" >> + - CONFIG="--python=/usr/bin/python3 >> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" >> >> - TEST_CMD="make check-acceptance" >> addons: >> apt: >> diff --git a/tests/acceptance/boot_linux_console.py >> b/tests/acceptance/boot_linux_console.py >> index f3ccd23a7a..107700b517 100644 >> --- a/tests/acceptance/boot_linux_console.py >> +++ b/tests/acceptance/boot_linux_console.py >> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): >> self.vm.launch() >> console_pattern = 'Kernel command line: %s' % >> kernel_command_line >> self.wait_for_console_pattern(console_pattern) >> + >> + def test_aarch64_virt(self): > > That test case fails on my system (Fedora 29 x86_64). Avocado seems > unable to kill the VM so it reaches the timeout. > > I compiled QEMU with default configuration: > > $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > > Follows a snippet of the Avocado's job.log file: > ---- > 2019-01-31 14:41:34,912 test L0602 INFO | START > 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > path=*, default=aarch64) => 'aarch64' > 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > 'aarch64-softmmu/qemu-system-aarch64' > 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > 'aarch64-softmmu/qemu-system-aarch64 -chardev > socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > chardev=mon,mode=control -display none -vga none -machine virt -chardev > socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > -serial chardev:console -cpu cortex-a53 -kernel > /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > 'qmp_capabilities'} > 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > (...) > > 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > Policy zone: DMA32 > 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > Kernel command line: console=ttyAMA0 > 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > 'quit'} Here, a QMP response like "<<< {'return': {}}" would be expected. Since I can not reproduce this on my system (or on Travis-CI jobs I've sent), can you tell me on top of which commit you've applied these patches? Thanks! - Cleber. > 2019-01-31 14:44:35,636 qemu L0357 WARNI| qemu received > signal -9: aarch64-softmmu/qemu-system-aarch64 -chardev > socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > chardev=mon,mode=control -display none -vga none -machine virt -chardev > socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > -serial chardev:console -cpu cortex-a53 -kernel > /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0 > > (...) > > 2019-01-31 14:44:35,663 runner L0253 ERROR| ERROR Test > reported status but did not finish -> TestAbortError: > 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt. > > 2019-01-31 14:44:35,664 runner L0062 ERROR| Runner error > occurred: Timeout reached > ---- > > - Wainer > >> + """ >> + :avocado: tags=arch:aarch64 >> + :avocado: tags=machine:virt >> + """ >> + kernel_url = >> ('https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/' >> + >> 'releases/29/Server/aarch64/os/images/pxeboot/vmlinuz') >> + kernel_hash = '8c73e469fc6ea06a58dc83a628fc695b693b8493' >> + kernel_path = self.fetch_asset(kernel_url, >> asset_hash=kernel_hash) >> + >> + self.vm.set_machine('virt') >> + self.vm.set_console() >> + kernel_command_line = 'console=ttyAMA0' >> + self.vm.add_args('-cpu', 'cortex-a53', >> + '-kernel', kernel_path, >> + '-append', kernel_command_line) >> + self.vm.launch() >> + console_pattern = 'Kernel command line: %s' % >> kernel_command_line >> + self.wait_for_console_pattern(console_pattern) >
On 1/31/19 3:21 PM, Cleber Rosa wrote: > > > On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: >> >> On 01/17/2019 04:56 PM, Cleber Rosa wrote: >>> Just like the previous tests, boots a Linux kernel on a aarch64 target >>> using the virt machine. >>> >>> One special option added is the CPU type, given that the kernel >>> selected fails to boot on the virt machine's default CPU (cortex-a15). >>> >>> Signed-off-by: Cleber Rosa <crosa@redhat.com> >>> --- >>> .travis.yml | 2 +- >>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ >>> 2 files changed, 21 insertions(+), 1 deletion(-) >>> >>> diff --git a/.travis.yml b/.travis.yml >>> index 54100eea5a..595e8c0b6c 100644 >>> --- a/.travis.yml >>> +++ b/.travis.yml >>> @@ -187,7 +187,7 @@ matrix: >>> # Acceptance (Functional) tests >>> - env: >>> - - CONFIG="--python=/usr/bin/python3 >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" >>> + - CONFIG="--python=/usr/bin/python3 >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" >>> >>> - TEST_CMD="make check-acceptance" >>> addons: >>> apt: >>> diff --git a/tests/acceptance/boot_linux_console.py >>> b/tests/acceptance/boot_linux_console.py >>> index f3ccd23a7a..107700b517 100644 >>> --- a/tests/acceptance/boot_linux_console.py >>> +++ b/tests/acceptance/boot_linux_console.py >>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): >>> self.vm.launch() >>> console_pattern = 'Kernel command line: %s' % >>> kernel_command_line >>> self.wait_for_console_pattern(console_pattern) >>> + >>> + def test_aarch64_virt(self): >> >> That test case fails on my system (Fedora 29 x86_64). Avocado seems >> unable to kill the VM so it reaches the timeout. >> >> I compiled QEMU with default configuration: >> >> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu >> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) >> >> >> Follows a snippet of the Avocado's job.log file: >> ---- >> 2019-01-31 14:41:34,912 test L0602 INFO | START >> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt >> >> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA >> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) >> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, >> path=*, default=aarch64) => 'aarch64' >> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS >> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => >> 'aarch64-softmmu/qemu-system-aarch64' >> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching >> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz >> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr >> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL >> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": >> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", >> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" >> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: >> 'aarch64-softmmu/qemu-system-aarch64 -chardev >> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon >> chardev=mon,mode=control -display none -vga none -machine virt -chardev >> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait >> -serial chardev:console -cpu cortex-a53 -kernel >> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' >> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': >> 'qmp_capabilities'} >> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} >> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] >> Booting Linux on physical CPU 0x0000000000 [0x410fd034] >> >> (...) >> >> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] >> Policy zone: DMA32 >> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] >> Kernel command line: console=ttyAMA0 >> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': >> 'quit'} > > Here, a QMP response like "<<< {'return': {}}" would be expected. > > Since I can not reproduce this on my system (or on Travis-CI jobs I've > sent), can you tell me on top of which commit you've applied these patches? > I spoke too soon: https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 This looks like a recent regression, and I'm guessing it's not on the test's side. I'll try to bisect it and let you know. Thanks, - Cleber.
On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: >> >> >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: >>> >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target >>>> using the virt machine. >>>> >>>> One special option added is the CPU type, given that the kernel >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). >>>> >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> >>>> --- >>>> .travis.yml | 2 +- >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ >>>> 2 files changed, 21 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/.travis.yml b/.travis.yml >>>> index 54100eea5a..595e8c0b6c 100644 >>>> --- a/.travis.yml >>>> +++ b/.travis.yml >>>> @@ -187,7 +187,7 @@ matrix: >>>> # Acceptance (Functional) tests >>>> - env: >>>> - - CONFIG="--python=/usr/bin/python3 >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" >>>> + - CONFIG="--python=/usr/bin/python3 >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" >>>> >>>> - TEST_CMD="make check-acceptance" >>>> addons: >>>> apt: >>>> diff --git a/tests/acceptance/boot_linux_console.py >>>> b/tests/acceptance/boot_linux_console.py >>>> index f3ccd23a7a..107700b517 100644 >>>> --- a/tests/acceptance/boot_linux_console.py >>>> +++ b/tests/acceptance/boot_linux_console.py >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): >>>> self.vm.launch() >>>> console_pattern = 'Kernel command line: %s' % >>>> kernel_command_line >>>> self.wait_for_console_pattern(console_pattern) >>>> + >>>> + def test_aarch64_virt(self): >>> >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems >>> unable to kill the VM so it reaches the timeout. >>> >>> I compiled QEMU with default configuration: >>> >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) >>> >>> >>> Follows a snippet of the Avocado's job.log file: >>> ---- >>> 2019-01-31 14:41:34,912 test L0602 INFO | START >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt >>> >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, >>> path=*, default=aarch64) => 'aarch64' >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => >>> 'aarch64-softmmu/qemu-system-aarch64' >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait >>> -serial chardev:console -cpu cortex-a53 -kernel >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': >>> 'qmp_capabilities'} >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] >>> >>> (...) >>> >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] >>> Policy zone: DMA32 >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] >>> Kernel command line: console=ttyAMA0 >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': >>> 'quit'} >> >> Here, a QMP response like "<<< {'return': {}}" would be expected. >> >> Since I can not reproduce this on my system (or on Travis-CI jobs I've >> sent), can you tell me on top of which commit you've applied these patches? >> > > I spoke too soon: > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > This looks like a recent regression, and I'm guessing it's not on the > test's side. I'll try to bisect it and let you know. > On a fresh environment, I am able to get this reproduced on every 2 of runs, more or less. When I hit it, I attached GDB to it, and the backtrace shows: Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: Loadable section ".note.gnu.property" outside of ELF segments warning: Loadable section ".note.gnu.property" outside of ELF segments __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 103 2: movl %edx, %eax (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80 #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) at util/qemu-thread-posix.c:66 #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) at /home/cleber/src/qemu/cpus.c:1849 #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:236 #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4665 Running it with `taskset -c 1` prevents this issue from happening, which AFAICT, contributes even further towards this being a QEMU race condition. I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this seems to limited to that target. Any tips on what to do here? Thanks, - Cleber. > Thanks, > - Cleber. >
On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > >> > >> > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > >>> > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > >>>> using the virt machine. > >>>> > >>>> One special option added is the CPU type, given that the kernel > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > >>>> > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > >>>> --- > >>>> .travis.yml | 2 +- > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/.travis.yml b/.travis.yml > >>>> index 54100eea5a..595e8c0b6c 100644 > >>>> --- a/.travis.yml > >>>> +++ b/.travis.yml > >>>> @@ -187,7 +187,7 @@ matrix: > >>>> # Acceptance (Functional) tests > >>>> - env: > >>>> - - CONFIG="--python=/usr/bin/python3 > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > >>>> + - CONFIG="--python=/usr/bin/python3 > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > >>>> > >>>> - TEST_CMD="make check-acceptance" > >>>> addons: > >>>> apt: > >>>> diff --git a/tests/acceptance/boot_linux_console.py > >>>> b/tests/acceptance/boot_linux_console.py > >>>> index f3ccd23a7a..107700b517 100644 > >>>> --- a/tests/acceptance/boot_linux_console.py > >>>> +++ b/tests/acceptance/boot_linux_console.py > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > >>>> self.vm.launch() > >>>> console_pattern = 'Kernel command line: %s' % > >>>> kernel_command_line > >>>> self.wait_for_console_pattern(console_pattern) > >>>> + > >>>> + def test_aarch64_virt(self): > >>> > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > >>> unable to kill the VM so it reaches the timeout. > >>> > >>> I compiled QEMU with default configuration: > >>> > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > >>> > >>> > >>> Follows a snippet of the Avocado's job.log file: > >>> ---- > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > >>> > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > >>> path=*, default=aarch64) => 'aarch64' > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > >>> 'aarch64-softmmu/qemu-system-aarch64' > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > >>> -serial chardev:console -cpu cortex-a53 -kernel > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > >>> 'qmp_capabilities'} > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > >>> > >>> (...) > >>> > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > >>> Policy zone: DMA32 > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > >>> Kernel command line: console=ttyAMA0 > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > >>> 'quit'} > >> > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > >> > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > >> sent), can you tell me on top of which commit you've applied these patches? > >> > > > > I spoke too soon: > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > This looks like a recent regression, and I'm guessing it's not on the > > test's side. I'll try to bisect it and let you know. > > > > On a fresh environment, I am able to get this reproduced on every 2 of > runs, more or less. When I hit it, I attached GDB to it, and the > backtrace shows: > > Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > warning: Loadable section ".note.gnu.property" outside of ELF segments > warning: Loadable section ".note.gnu.property" outside of ELF segments > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > 103 2: movl %edx, %eax > (gdb) bt > #0 __lll_lock_wait () at > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > ../nptl/pthread_mutex_lock.c:80 > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > at util/qemu-thread-posix.c:66 > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > at /home/cleber/src/qemu/cpus.c:1849 > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > out>) at util/main-loop.c:236 > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > out>, envp=<optimized out>) at vl.c:4665 Tip: run "thread apply all bt" so you can get a backtrace of all threads. > > Running it with `taskset -c 1` prevents this issue from happening, which > AFAICT, contributes even further towards this being a QEMU race condition. > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > seems to limited to that target. Any tips on what to do here? I am hitting this on Travis, too, and I finally could reproduce it locally, The guest is still writing on the serial console, but nobody is reading the data on the other side. A VCPU thread is stuck inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), holding the QEMU global lock. Thread 4 (Thread 0x7f2e45fff700 (LWP 6461)): #0 0x00007f2e4ec03500 in nanosleep () at /lib64/libpthread.so.0 #1 0x00007f2e4fb229d7 in g_usleep () at /lib64/libglib-2.0.so.0 #2 0x0000559a4e7ca4c9 in qemu_chr_write_buffer (s=s@entry=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=1, offset=offset@entry=0x7f2e45ffdd60, write_all=true) at chardev/char.c:115 #3 0x0000559a4e7ca78f in qemu_chr_write (s=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1, write_all=write_all@entry=true) at chardev/char.c:148 #4 0x0000559a4e7cc7e2 in qemu_chr_fe_write_all (be=be@entry=0x559a504b4c50, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1) at chardev/char-fe.c:53 #5 0x0000559a4e58f320 in pl011_write (opaque=0x559a504b47d0, offset=0, value=55, size=<optimized out>) at hw/char/pl011.c:183 #6 0x0000559a4e325121 in memory_region_write_accessor (mr=0x559a504b4ae0, addr=0, value=<optimized out>, size=2, shift=<optimized out>, mask=<optimized out>, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:503 #7 0x0000559a4e322cd6 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f2e45ffded8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry= 0x559a4e3250a0 <memory_region_write_accessor>, mr=0x559a504b4ae0, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:569 #8 0x0000559a4e32763f in memory_region_dispatch_write (mr=mr@entry=0x559a504b4ae0, addr=addr@entry=0, data=<optimized out>, data@entry=55, size=size@entry=2, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:1497 #9 0x0000559a4e338708 in io_writex (env=env@entry=0x559a503d5620, mmu_idx=mmu_idx@entry=1, val=val@entry=55, addr=addr@entry=18446462598867529728, retaddr=139836732143069, size=2, iotlbentry=<optimized out>, iotlbentry=<optimized out>) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:945 #10 0x0000559a4e33d203 in store_helper (big_endian=false, size=2, retaddr=<optimized out>, oi=<optimized out>, val=55, addr=18446462598867529728, env=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1544 #11 0x0000559a4e33d203 in helper_le_stw_mmu (env=0x559a503d5620, addr=18446462598867529728, val=55, oi=<optimized out>, retaddr=139836732143069) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1636 #12 0x00007f2e46bef5dd in code_gen_buffer () #13 0x0000559a4e352381 in cpu_tb_exec (itb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:171 #14 0x0000559a4e352381 in cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:618 #15 0x0000559a4e352381 in cpu_exec (cpu=cpu@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:729 #16 0x0000559a4e30ea0f in tcg_cpu_exec (cpu=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1434 #17 0x0000559a4e310b6b in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1743 #18 0x0000559a4e83669a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502 #19 0x00007f2e4ebf958e in start_thread () at /lib64/libpthread.so.0 #20 0x00007f2e4eb266f3 in clone () at /lib64/libc.so.6 For reference, this is the QEMU command line: aarch64-softmmu/qemu-system-aarch64 -chardev socket,id=mon,path=/var/tmp/tmpxnkcjvf0/qemu-6453-monitor.sock -mon chardev=mon,mode=control -display none -vga none -machine virt -chardev socket,id=console,path=/var/tmp/tmpxnkcjvf0/qemu-6453-console.sock,server,nowait -serial chardev:console -cpu cortex-a53 -kernel /home/ehabkost/rh/proj/virt/qemu/tests/venv/data/cache/by_location/e959d0e1dd72e77653e218e666198db1f3d0c213/vmlinuz -append printk.time=0 console=ttyAMA0
On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > >> > > >> > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > >>> > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > >>>> using the virt machine. > > >>>> > > >>>> One special option added is the CPU type, given that the kernel > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > >>>> > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > >>>> --- > > >>>> .travis.yml | 2 +- > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > >>>> > > >>>> diff --git a/.travis.yml b/.travis.yml > > >>>> index 54100eea5a..595e8c0b6c 100644 > > >>>> --- a/.travis.yml > > >>>> +++ b/.travis.yml > > >>>> @@ -187,7 +187,7 @@ matrix: > > >>>> # Acceptance (Functional) tests > > >>>> - env: > > >>>> - - CONFIG="--python=/usr/bin/python3 > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > >>>> + - CONFIG="--python=/usr/bin/python3 > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > >>>> > > >>>> - TEST_CMD="make check-acceptance" > > >>>> addons: > > >>>> apt: > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > >>>> b/tests/acceptance/boot_linux_console.py > > >>>> index f3ccd23a7a..107700b517 100644 > > >>>> --- a/tests/acceptance/boot_linux_console.py > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > >>>> self.vm.launch() > > >>>> console_pattern = 'Kernel command line: %s' % > > >>>> kernel_command_line > > >>>> self.wait_for_console_pattern(console_pattern) > > >>>> + > > >>>> + def test_aarch64_virt(self): > > >>> > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > >>> unable to kill the VM so it reaches the timeout. > > >>> > > >>> I compiled QEMU with default configuration: > > >>> > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > >>> > > >>> > > >>> Follows a snippet of the Avocado's job.log file: > > >>> ---- > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > >>> > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > >>> path=*, default=aarch64) => 'aarch64' > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > >>> 'qmp_capabilities'} > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > >>> > > >>> (...) > > >>> > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Policy zone: DMA32 > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Kernel command line: console=ttyAMA0 > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > >>> 'quit'} > > >> > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > >> > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > >> sent), can you tell me on top of which commit you've applied these patches? > > >> > > > > > > I spoke too soon: > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > test's side. I'll try to bisect it and let you know. > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > runs, more or less. When I hit it, I attached GDB to it, and the > > backtrace shows: > > > > Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > 103 2: movl %edx, %eax > > (gdb) bt > > #0 __lll_lock_wait () at > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > ../nptl/pthread_mutex_lock.c:80 > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > at util/qemu-thread-posix.c:66 > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > at /home/cleber/src/qemu/cpus.c:1849 > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > out>) at util/main-loop.c:236 > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > out>, envp=<optimized out>) at vl.c:4665 > > Tip: run "thread apply all bt" so you can get a backtrace of all > threads. > > > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > seems to limited to that target. Any tips on what to do here? > > I am hitting this on Travis, too, and I finally could reproduce > it locally, > > The guest is still writing on the serial console, but nobody is > reading the data on the other side. A VCPU thread is stuck > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > holding the QEMU global lock. Experimental fix below. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- python/qemu/__init__.py | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py index 81d9657ec0..4a691f34da 100644 --- a/python/qemu/__init__.py +++ b/python/qemu/__init__.py @@ -274,10 +274,6 @@ class QEMUMachine(object): self._qemu_log_path = None - if self._console_socket is not None: - self._console_socket.close() - self._console_socket = None - if self._temp_dir is not None: shutil.rmtree(self._temp_dir) self._temp_dir = None @@ -336,6 +332,14 @@ class QEMUMachine(object): """ Terminate the VM and clean up """ + + # If we keep the console socket open, we may deadlock waiting + # for QEMU to exit, while QEMU is waiting for the socket to + # become writeable. + if self._console_socket is not None: + self._console_socket.close() + self._console_socket = None + if self.is_running(): try: self._qmp.cmd('quit')
On 06/07/19 05:26, Eduardo Habkost wrote: > Thread 4 (Thread 0x7f2e45fff700 (LWP 6461)): > #0 0x00007f2e4ec03500 in nanosleep () at /lib64/libpthread.so.0 > #1 0x00007f2e4fb229d7 in g_usleep () at /lib64/libglib-2.0.so.0 > #2 0x0000559a4e7ca4c9 in qemu_chr_write_buffer (s=s@entry=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=1, offset=offset@entry=0x7f2e45ffdd60, write_all=true) at chardev/char.c:115 > #3 0x0000559a4e7ca78f in qemu_chr_write (s=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1, write_all=write_all@entry=true) at chardev/char.c:148 > #4 0x0000559a4e7cc7e2 in qemu_chr_fe_write_all (be=be@entry=0x559a504b4c50, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1) at chardev/char-fe.c:53 > #5 0x0000559a4e58f320 in pl011_write (opaque=0x559a504b47d0, offset=0, value=55, size=<optimized out>) at hw/char/pl011.c:183 > #6 0x0000559a4e325121 in memory_region_write_accessor (mr=0x559a504b4ae0, addr=0, value=<optimized out>, size=2, shift=<optimized out>, mask=<optimized out>, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:503 > #7 0x0000559a4e322cd6 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f2e45ffded8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry= > 0x559a4e3250a0 <memory_region_write_accessor>, mr=0x559a504b4ae0, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:569 > #8 0x0000559a4e32763f in memory_region_dispatch_write (mr=mr@entry=0x559a504b4ae0, addr=addr@entry=0, data=<optimized out>, data@entry=55, size=size@entry=2, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:1497 > #9 0x0000559a4e338708 in io_writex (env=env@entry=0x559a503d5620, mmu_idx=mmu_idx@entry=1, val=val@entry=55, addr=addr@entry=18446462598867529728, retaddr=139836732143069, size=2, iotlbentry=<optimized out>, iotlbentry=<optimized out>) > at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:945 > #10 0x0000559a4e33d203 in store_helper (big_endian=false, size=2, retaddr=<optimized out>, oi=<optimized out>, val=55, addr=18446462598867529728, env=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1544 > #11 0x0000559a4e33d203 in helper_le_stw_mmu (env=0x559a503d5620, addr=18446462598867529728, val=55, oi=<optimized out>, retaddr=139836732143069) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1636 > #12 0x00007f2e46bef5dd in code_gen_buffer () > #13 0x0000559a4e352381 in cpu_tb_exec (itb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:171 > #14 0x0000559a4e352381 in cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:618 > #15 0x0000559a4e352381 in cpu_exec (cpu=cpu@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:729 > #16 0x0000559a4e30ea0f in tcg_cpu_exec (cpu=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1434 > #17 0x0000559a4e310b6b in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1743 > #18 0x0000559a4e83669a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502 > #19 0x00007f2e4ebf958e in start_thread () at /lib64/libpthread.so.0 > #20 0x00007f2e4eb266f3 in clone () at /lib64/libc.so.6 See also <https://bugzilla.redhat.com/show_bug.cgi?id=1661940>. $ git show 6ab3fc32ea64 -- hw/char/pl011.c Thanks Laszlo
On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > >> > > >> > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > >>> > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > >>>> using the virt machine. > > >>>> > > >>>> One special option added is the CPU type, given that the kernel > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > >>>> > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > >>>> --- > > >>>> .travis.yml | 2 +- > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > >>>> > > >>>> diff --git a/.travis.yml b/.travis.yml > > >>>> index 54100eea5a..595e8c0b6c 100644 > > >>>> --- a/.travis.yml > > >>>> +++ b/.travis.yml > > >>>> @@ -187,7 +187,7 @@ matrix: > > >>>> # Acceptance (Functional) tests > > >>>> - env: > > >>>> - - CONFIG="--python=/usr/bin/python3 > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > >>>> + - CONFIG="--python=/usr/bin/python3 > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > >>>> > > >>>> - TEST_CMD="make check-acceptance" > > >>>> addons: > > >>>> apt: > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > >>>> b/tests/acceptance/boot_linux_console.py > > >>>> index f3ccd23a7a..107700b517 100644 > > >>>> --- a/tests/acceptance/boot_linux_console.py > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > >>>> self.vm.launch() > > >>>> console_pattern = 'Kernel command line: %s' % > > >>>> kernel_command_line > > >>>> self.wait_for_console_pattern(console_pattern) > > >>>> + > > >>>> + def test_aarch64_virt(self): > > >>> > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > >>> unable to kill the VM so it reaches the timeout. > > >>> > > >>> I compiled QEMU with default configuration: > > >>> > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > >>> > > >>> > > >>> Follows a snippet of the Avocado's job.log file: > > >>> ---- > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > >>> > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > >>> path=*, default=aarch64) => 'aarch64' > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > >>> 'qmp_capabilities'} > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > >>> > > >>> (...) > > >>> > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Policy zone: DMA32 > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > >>> Kernel command line: console=ttyAMA0 > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > >>> 'quit'} > > >> > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > >> > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > >> sent), can you tell me on top of which commit you've applied these patches? > > >> > > > > > > I spoke too soon: > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > test's side. I'll try to bisect it and let you know. > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > runs, more or less. When I hit it, I attached GDB to it, and the > > backtrace shows: > > > > Thread debugging using libthread_db enabled] > > Using host libthread_db library "/lib64/libthread_db.so.1". > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > 103 2: movl %edx, %eax > > (gdb) bt > > #0 __lll_lock_wait () at > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > ../nptl/pthread_mutex_lock.c:80 > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > at util/qemu-thread-posix.c:66 > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > at /home/cleber/src/qemu/cpus.c:1849 > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > out>) at util/main-loop.c:236 > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > out>, envp=<optimized out>) at vl.c:4665 > > Tip: run "thread apply all bt" so you can get a backtrace of all > threads. > > Nice, thanks! > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > seems to limited to that target. Any tips on what to do here? > > I am hitting this on Travis, too, and I finally could reproduce > it locally, > > The guest is still writing on the serial console, but nobody is > reading the data on the other side. A VCPU thread is stuck > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > holding the QEMU global lock. > This is awesome, because it really looks like a perfect explanation for the behavior/reproducer/workaround that I described here: https://bugs.launchpad.net/qemu/+bug/1829779 > Thread 4 (Thread 0x7f2e45fff700 (LWP 6461)): > #0 0x00007f2e4ec03500 in nanosleep () at /lib64/libpthread.so.0 > #1 0x00007f2e4fb229d7 in g_usleep () at /lib64/libglib-2.0.so.0 > #2 0x0000559a4e7ca4c9 in qemu_chr_write_buffer (s=s@entry=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=1, offset=offset@entry=0x7f2e45ffdd60, write_all=true) at chardev/char.c:115 > #3 0x0000559a4e7ca78f in qemu_chr_write (s=0x559a502d0ac0, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1, write_all=write_all@entry=true) at chardev/char.c:148 > #4 0x0000559a4e7cc7e2 in qemu_chr_fe_write_all (be=be@entry=0x559a504b4c50, buf=buf@entry=0x7f2e45ffdd90 "7", len=len@entry=1) at chardev/char-fe.c:53 > #5 0x0000559a4e58f320 in pl011_write (opaque=0x559a504b47d0, offset=0, value=55, size=<optimized out>) at hw/char/pl011.c:183 > #6 0x0000559a4e325121 in memory_region_write_accessor (mr=0x559a504b4ae0, addr=0, value=<optimized out>, size=2, shift=<optimized out>, mask=<optimized out>, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:503 > #7 0x0000559a4e322cd6 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f2e45ffded8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=access_fn@entry= > 0x559a4e3250a0 <memory_region_write_accessor>, mr=0x559a504b4ae0, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:569 > #8 0x0000559a4e32763f in memory_region_dispatch_write (mr=mr@entry=0x559a504b4ae0, addr=addr@entry=0, data=<optimized out>, data@entry=55, size=size@entry=2, attrs=...) at /home/ehabkost/rh/proj/virt/qemu/memory.c:1497 > #9 0x0000559a4e338708 in io_writex (env=env@entry=0x559a503d5620, mmu_idx=mmu_idx@entry=1, val=val@entry=55, addr=addr@entry=18446462598867529728, retaddr=139836732143069, size=2, iotlbentry=<optimized out>, iotlbentry=<optimized out>) > at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:945 > #10 0x0000559a4e33d203 in store_helper (big_endian=false, size=2, retaddr=<optimized out>, oi=<optimized out>, val=55, addr=18446462598867529728, env=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1544 > #11 0x0000559a4e33d203 in helper_le_stw_mmu (env=0x559a503d5620, addr=18446462598867529728, val=55, oi=<optimized out>, retaddr=139836732143069) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cputlb.c:1636 > #12 0x00007f2e46bef5dd in code_gen_buffer () > #13 0x0000559a4e352381 in cpu_tb_exec (itb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:171 > #14 0x0000559a4e352381 in cpu_loop_exec_tb (tb_exit=<synthetic pointer>, last_tb=<synthetic pointer>, tb=<optimized out>, cpu=0x559a503d5620) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:618 > #15 0x0000559a4e352381 in cpu_exec (cpu=cpu@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/accel/tcg/cpu-exec.c:729 > #16 0x0000559a4e30ea0f in tcg_cpu_exec (cpu=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1434 > #17 0x0000559a4e310b6b in qemu_tcg_cpu_thread_fn (arg=arg@entry=0x559a503cd360) at /home/ehabkost/rh/proj/virt/qemu/cpus.c:1743 > #18 0x0000559a4e83669a in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502 > #19 0x00007f2e4ebf958e in start_thread () at /lib64/libpthread.so.0 > #20 0x00007f2e4eb266f3 in clone () at /lib64/libc.so.6 > > > For reference, this is the QEMU command line: > > aarch64-softmmu/qemu-system-aarch64 -chardev socket,id=mon,path=/var/tmp/tmpxnkcjvf0/qemu-6453-monitor.sock -mon chardev=mon,mode=control -display none -vga none -machine virt -chardev socket,id=console,path=/var/tmp/tmpxnkcjvf0/qemu-6453-console.sock,server,nowait -serial chardev:console -cpu cortex-a53 -kernel /home/ehabkost/rh/proj/virt/qemu/tests/venv/data/cache/by_location/e959d0e1dd72e77653e218e666198db1f3d0c213/vmlinuz -append printk.time=0 console=ttyAMA0 > > > > -- > Eduardo Besides the command line, I hope it's also easy to use the following reproducer: https://github.com/clebergnu/qemu/commit/c778e28c24030c4a36548b714293b319f4bf18df It should apply cleanly to QEMU master, and instructions on how to run it are documented in the commit message. Thanks, - Cleber.
On Fri, Jun 07, 2019 at 12:42:14AM -0300, Eduardo Habkost wrote: > On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > > >> > > > >> > > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > > >>> > > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > > >>>> using the virt machine. > > > >>>> > > > >>>> One special option added is the CPU type, given that the kernel > > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > > >>>> > > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > > >>>> --- > > > >>>> .travis.yml | 2 +- > > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > > >>>> > > > >>>> diff --git a/.travis.yml b/.travis.yml > > > >>>> index 54100eea5a..595e8c0b6c 100644 > > > >>>> --- a/.travis.yml > > > >>>> +++ b/.travis.yml > > > >>>> @@ -187,7 +187,7 @@ matrix: > > > >>>> # Acceptance (Functional) tests > > > >>>> - env: > > > >>>> - - CONFIG="--python=/usr/bin/python3 > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > > >>>> + - CONFIG="--python=/usr/bin/python3 > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > > >>>> > > > >>>> - TEST_CMD="make check-acceptance" > > > >>>> addons: > > > >>>> apt: > > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > > >>>> b/tests/acceptance/boot_linux_console.py > > > >>>> index f3ccd23a7a..107700b517 100644 > > > >>>> --- a/tests/acceptance/boot_linux_console.py > > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > > >>>> self.vm.launch() > > > >>>> console_pattern = 'Kernel command line: %s' % > > > >>>> kernel_command_line > > > >>>> self.wait_for_console_pattern(console_pattern) > > > >>>> + > > > >>>> + def test_aarch64_virt(self): > > > >>> > > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > > >>> unable to kill the VM so it reaches the timeout. > > > >>> > > > >>> I compiled QEMU with default configuration: > > > >>> > > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > > >>> > > > >>> > > > >>> Follows a snippet of the Avocado's job.log file: > > > >>> ---- > > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > > >>> > > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > > >>> path=*, default=aarch64) => 'aarch64' > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > > >>> 'qmp_capabilities'} > > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > > >>> > > > >>> (...) > > > >>> > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > >>> Policy zone: DMA32 > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > >>> Kernel command line: console=ttyAMA0 > > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > > >>> 'quit'} > > > >> > > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > > >> > > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > > >> sent), can you tell me on top of which commit you've applied these patches? > > > >> > > > > > > > > I spoke too soon: > > > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > > test's side. I'll try to bisect it and let you know. > > > > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > > runs, more or less. When I hit it, I attached GDB to it, and the > > > backtrace shows: > > > > > > Thread debugging using libthread_db enabled] > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > 103 2: movl %edx, %eax > > > (gdb) bt > > > #0 __lll_lock_wait () at > > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > > ../nptl/pthread_mutex_lock.c:80 > > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > > at util/qemu-thread-posix.c:66 > > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > > at /home/cleber/src/qemu/cpus.c:1849 > > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > > out>) at util/main-loop.c:236 > > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > > out>, envp=<optimized out>) at vl.c:4665 > > > > Tip: run "thread apply all bt" so you can get a backtrace of all > > threads. > > > > > > > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > > seems to limited to that target. Any tips on what to do here? > > > > I am hitting this on Travis, too, and I finally could reproduce > > it locally, > > > > The guest is still writing on the serial console, but nobody is > > reading the data on the other side. A VCPU thread is stuck > > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > > holding the QEMU global lock. > > Experimental fix below. > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > --- > python/qemu/__init__.py | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py > index 81d9657ec0..4a691f34da 100644 > --- a/python/qemu/__init__.py > +++ b/python/qemu/__init__.py > @@ -274,10 +274,6 @@ class QEMUMachine(object): > > self._qemu_log_path = None > > - if self._console_socket is not None: > - self._console_socket.close() > - self._console_socket = None > - > if self._temp_dir is not None: > shutil.rmtree(self._temp_dir) > self._temp_dir = None > @@ -336,6 +332,14 @@ class QEMUMachine(object): > """ > Terminate the VM and clean up > """ > + > + # If we keep the console socket open, we may deadlock waiting > + # for QEMU to exit, while QEMU is waiting for the socket to > + # become writeable. > + if self._console_socket is not None: > + self._console_socket.close() > + self._console_socket = None > + Right, this is somewhat equivalent to the following "workaround": https://github.com/clebergnu/qemu/commit/e1713f3b91972ad57c089f276c54db3f3fa63423 I could not tell at the moment, though, if closing (or shutting down) the console socket was the appropriate fix. What I see is that Rick's commit pointed to by Lazlo is dated from 2016, and it says that a virtio-console fix is necessary. I'd imagine that, if a fix was ever proposed, it'd have been incorporated in the F29 kernel used in this test... and that this workaround/fix would not be the best solution. But, I'm mostly attempting to navigate this using my senses , few hard data and even less background :). Regards, - Cleber. > if self.is_running(): > try: > self._qmp.cmd('quit') > -- > 2.18.0.rc1.1.g3f1ff2140 > > -- > Eduardo
CCing Daniel, who wrote commit 6ab3fc32ea64. On Fri, Jun 07, 2019 at 11:44:32AM -0400, Cleber Rosa wrote: > On Fri, Jun 07, 2019 at 12:42:14AM -0300, Eduardo Habkost wrote: > > On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > > > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > > > >> > > > > >> > > > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > > > >>> > > > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > > > >>>> using the virt machine. > > > > >>>> > > > > >>>> One special option added is the CPU type, given that the kernel > > > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > > > >>>> > > > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > > > >>>> --- > > > > >>>> .travis.yml | 2 +- > > > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > > > >>>> > > > > >>>> diff --git a/.travis.yml b/.travis.yml > > > > >>>> index 54100eea5a..595e8c0b6c 100644 > > > > >>>> --- a/.travis.yml > > > > >>>> +++ b/.travis.yml > > > > >>>> @@ -187,7 +187,7 @@ matrix: > > > > >>>> # Acceptance (Functional) tests > > > > >>>> - env: > > > > >>>> - - CONFIG="--python=/usr/bin/python3 > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > > > >>>> + - CONFIG="--python=/usr/bin/python3 > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > > > >>>> > > > > >>>> - TEST_CMD="make check-acceptance" > > > > >>>> addons: > > > > >>>> apt: > > > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > > > >>>> b/tests/acceptance/boot_linux_console.py > > > > >>>> index f3ccd23a7a..107700b517 100644 > > > > >>>> --- a/tests/acceptance/boot_linux_console.py > > > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > > > >>>> self.vm.launch() > > > > >>>> console_pattern = 'Kernel command line: %s' % > > > > >>>> kernel_command_line > > > > >>>> self.wait_for_console_pattern(console_pattern) > > > > >>>> + > > > > >>>> + def test_aarch64_virt(self): > > > > >>> > > > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > > > >>> unable to kill the VM so it reaches the timeout. > > > > >>> > > > > >>> I compiled QEMU with default configuration: > > > > >>> > > > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > > > >>> > > > > >>> > > > > >>> Follows a snippet of the Avocado's job.log file: > > > > >>> ---- > > > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > > > >>> > > > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > > > >>> path=*, default=aarch64) => 'aarch64' > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > > > >>> 'qmp_capabilities'} > > > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > > > >>> > > > > >>> (...) > > > > >>> > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > >>> Policy zone: DMA32 > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > >>> Kernel command line: console=ttyAMA0 > > > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > > > >>> 'quit'} > > > > >> > > > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > > > >> > > > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > > > >> sent), can you tell me on top of which commit you've applied these patches? > > > > >> > > > > > > > > > > I spoke too soon: > > > > > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > > > test's side. I'll try to bisect it and let you know. > > > > > > > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > > > runs, more or less. When I hit it, I attached GDB to it, and the > > > > backtrace shows: > > > > > > > > Thread debugging using libthread_db enabled] > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > 103 2: movl %edx, %eax > > > > (gdb) bt > > > > #0 __lll_lock_wait () at > > > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > > > ../nptl/pthread_mutex_lock.c:80 > > > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > > > at util/qemu-thread-posix.c:66 > > > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > > > at /home/cleber/src/qemu/cpus.c:1849 > > > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > > > out>) at util/main-loop.c:236 > > > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > > > out>, envp=<optimized out>) at vl.c:4665 > > > > > > Tip: run "thread apply all bt" so you can get a backtrace of all > > > threads. > > > > > > > > > > > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > > > seems to limited to that target. Any tips on what to do here? > > > > > > I am hitting this on Travis, too, and I finally could reproduce > > > it locally, > > > > > > The guest is still writing on the serial console, but nobody is > > > reading the data on the other side. A VCPU thread is stuck > > > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > > > holding the QEMU global lock. > > > > Experimental fix below. > > > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > > --- > > python/qemu/__init__.py | 12 ++++++++---- > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py > > index 81d9657ec0..4a691f34da 100644 > > --- a/python/qemu/__init__.py > > +++ b/python/qemu/__init__.py > > @@ -274,10 +274,6 @@ class QEMUMachine(object): > > > > self._qemu_log_path = None > > > > - if self._console_socket is not None: > > - self._console_socket.close() > > - self._console_socket = None > > - > > if self._temp_dir is not None: > > shutil.rmtree(self._temp_dir) > > self._temp_dir = None > > @@ -336,6 +332,14 @@ class QEMUMachine(object): > > """ > > Terminate the VM and clean up > > """ > > + > > + # If we keep the console socket open, we may deadlock waiting > > + # for QEMU to exit, while QEMU is waiting for the socket to > > + # become writeable. > > + if self._console_socket is not None: > > + self._console_socket.close() > > + self._console_socket = None > > + > > Right, this is somewhat equivalent to the following "workaround": > > https://github.com/clebergnu/qemu/commit/e1713f3b91972ad57c089f276c54db3f3fa63423 > > I could not tell at the moment, though, if closing (or shutting down) > the console socket was the appropriate fix. > > What I see is that Rick's commit pointed to by Lazlo is dated from > 2016, and it says that a virtio-console fix is necessary. I'd imagine > that, if a fix was ever proposed, it'd have been incorporated in the > F29 kernel used in this test... and that this workaround/fix would > not be the best solution. If I understood this correctly, fixing the guest driver wouldn't help here. The commit mentioned by Laszlo (6ab3fc32ea64 "hw: replace most use of qemu_chr_fe_write with qemu_chr_fe_write_all") fixes data loss bugs but introduces the potential for deadlocks like the one we are seeing. Unless we replace the existing ~30 qemu_chr_fe_write_all() calls in device code, we need to make sure we are constantly reading data from the console socket. > > But, I'm mostly attempting to navigate this using my senses , few hard > data and even less background :). > > Regards, > - Cleber. > > > if self.is_running(): > > try: > > self._qmp.cmd('quit') > > -- > > 2.18.0.rc1.1.g3f1ff2140 > > > > -- > > Eduardo
On Fri, Jun 07, 2019 at 03:58:57PM -0300, Eduardo Habkost wrote: > CCing Daniel, who wrote commit 6ab3fc32ea64. > > On Fri, Jun 07, 2019 at 11:44:32AM -0400, Cleber Rosa wrote: > > On Fri, Jun 07, 2019 at 12:42:14AM -0300, Eduardo Habkost wrote: > > > On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > > > > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > > > > > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > > > > >> > > > > > >> > > > > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > > > > >>> > > > > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > > > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > > > > >>>> using the virt machine. > > > > > >>>> > > > > > >>>> One special option added is the CPU type, given that the kernel > > > > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > > > > >>>> > > > > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > > > > >>>> --- > > > > > >>>> .travis.yml | 2 +- > > > > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > > > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > >>>> > > > > > >>>> diff --git a/.travis.yml b/.travis.yml > > > > > >>>> index 54100eea5a..595e8c0b6c 100644 > > > > > >>>> --- a/.travis.yml > > > > > >>>> +++ b/.travis.yml > > > > > >>>> @@ -187,7 +187,7 @@ matrix: > > > > > >>>> # Acceptance (Functional) tests > > > > > >>>> - env: > > > > > >>>> - - CONFIG="--python=/usr/bin/python3 > > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > > > > >>>> + - CONFIG="--python=/usr/bin/python3 > > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > > > > >>>> > > > > > >>>> - TEST_CMD="make check-acceptance" > > > > > >>>> addons: > > > > > >>>> apt: > > > > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > > > > >>>> b/tests/acceptance/boot_linux_console.py > > > > > >>>> index f3ccd23a7a..107700b517 100644 > > > > > >>>> --- a/tests/acceptance/boot_linux_console.py > > > > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > > > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > > > > >>>> self.vm.launch() > > > > > >>>> console_pattern = 'Kernel command line: %s' % > > > > > >>>> kernel_command_line > > > > > >>>> self.wait_for_console_pattern(console_pattern) > > > > > >>>> + > > > > > >>>> + def test_aarch64_virt(self): > > > > > >>> > > > > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > > > > >>> unable to kill the VM so it reaches the timeout. > > > > > >>> > > > > > >>> I compiled QEMU with default configuration: > > > > > >>> > > > > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > > > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > > > > >>> > > > > > >>> > > > > > >>> Follows a snippet of the Avocado's job.log file: > > > > > >>> ---- > > > > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > > > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > > > > >>> > > > > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > > > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > > > > >>> path=*, default=aarch64) => 'aarch64' > > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > > > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > > > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > > > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > > > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > > > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > > > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > > > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > > > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > > > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > > > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > > > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > > > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > > > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > > > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > > > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > > > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > > > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > > > > >>> 'qmp_capabilities'} > > > > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > > > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > > > > >>> > > > > > >>> (...) > > > > > >>> > > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > >>> Policy zone: DMA32 > > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > >>> Kernel command line: console=ttyAMA0 > > > > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > > > > >>> 'quit'} > > > > > >> > > > > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > > > > >> > > > > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > > > > >> sent), can you tell me on top of which commit you've applied these patches? > > > > > >> > > > > > > > > > > > > I spoke too soon: > > > > > > > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > > > > test's side. I'll try to bisect it and let you know. > > > > > > > > > > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > > > > runs, more or less. When I hit it, I attached GDB to it, and the > > > > > backtrace shows: > > > > > > > > > > Thread debugging using libthread_db enabled] > > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > > 103 2: movl %edx, %eax > > > > > (gdb) bt > > > > > #0 __lll_lock_wait () at > > > > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > > > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > > > > ../nptl/pthread_mutex_lock.c:80 > > > > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > > > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > > > > at util/qemu-thread-posix.c:66 > > > > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > > > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > > > > at /home/cleber/src/qemu/cpus.c:1849 > > > > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > > > > out>) at util/main-loop.c:236 > > > > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > > > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > > > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > > > > out>, envp=<optimized out>) at vl.c:4665 > > > > > > > > Tip: run "thread apply all bt" so you can get a backtrace of all > > > > threads. > > > > > > > > > > > > > > > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > > > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > > > > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > > > > seems to limited to that target. Any tips on what to do here? > > > > > > > > I am hitting this on Travis, too, and I finally could reproduce > > > > it locally, > > > > > > > > The guest is still writing on the serial console, but nobody is > > > > reading the data on the other side. A VCPU thread is stuck > > > > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > > > > holding the QEMU global lock. > > > > > > Experimental fix below. > > > > > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > > > --- > > > python/qemu/__init__.py | 12 ++++++++---- > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py > > > index 81d9657ec0..4a691f34da 100644 > > > --- a/python/qemu/__init__.py > > > +++ b/python/qemu/__init__.py > > > @@ -274,10 +274,6 @@ class QEMUMachine(object): > > > > > > self._qemu_log_path = None > > > > > > - if self._console_socket is not None: > > > - self._console_socket.close() > > > - self._console_socket = None > > > - > > > if self._temp_dir is not None: > > > shutil.rmtree(self._temp_dir) > > > self._temp_dir = None > > > @@ -336,6 +332,14 @@ class QEMUMachine(object): > > > """ > > > Terminate the VM and clean up > > > """ > > > + > > > + # If we keep the console socket open, we may deadlock waiting > > > + # for QEMU to exit, while QEMU is waiting for the socket to > > > + # become writeable. > > > + if self._console_socket is not None: > > > + self._console_socket.close() > > > + self._console_socket = None > > > + > > > > Right, this is somewhat equivalent to the following "workaround": > > > > https://github.com/clebergnu/qemu/commit/e1713f3b91972ad57c089f276c54db3f3fa63423 > > > > I could not tell at the moment, though, if closing (or shutting down) > > the console socket was the appropriate fix. > > > > What I see is that Rick's commit pointed to by Lazlo is dated from > > 2016, and it says that a virtio-console fix is necessary. I'd imagine > > that, if a fix was ever proposed, it'd have been incorporated in the > > F29 kernel used in this test... and that this workaround/fix would > > not be the best solution. > > If I understood this correctly, fixing the guest driver wouldn't > help here. The commit mentioned by Laszlo (6ab3fc32ea64 "hw: > replace most use of qemu_chr_fe_write with qemu_chr_fe_write_all") > fixes data loss bugs but introduces the potential for deadlocks > like the one we are seeing. > > Unless we replace the existing ~30 qemu_chr_fe_write_all() calls > in device code, we need to make sure we are constantly reading > data from the console socket. Yes, you must *always* have something reading from the chardev backend. This really sucks, but it is preferrable to letting data end up in /dev/null. If this is being a problem for tests then consider it motivation to fix the root cause problem and make the devices properly do async I/O for chardevs. Only a handful of them properly do this right now. Regards, Daniel
On Mon, Jun 10, 2019 at 09:53:03AM +0100, Daniel P. Berrangé wrote: > On Fri, Jun 07, 2019 at 03:58:57PM -0300, Eduardo Habkost wrote: > > CCing Daniel, who wrote commit 6ab3fc32ea64. > > > > On Fri, Jun 07, 2019 at 11:44:32AM -0400, Cleber Rosa wrote: > > > On Fri, Jun 07, 2019 at 12:42:14AM -0300, Eduardo Habkost wrote: > > > > On Fri, Jun 07, 2019 at 12:26:48AM -0300, Eduardo Habkost wrote: > > > > > On Fri, Feb 01, 2019 at 11:10:31AM -0500, Cleber Rosa wrote: > > > > > > > > > > > > > > > > > > On 1/31/19 4:26 PM, Cleber Rosa wrote: > > > > > > > > > > > > > > > > > > > > > On 1/31/19 3:21 PM, Cleber Rosa wrote: > > > > > > >> > > > > > > >> > > > > > > >> On 1/31/19 3:02 PM, Wainer dos Santos Moschetta wrote: > > > > > > >>> > > > > > > >>> On 01/17/2019 04:56 PM, Cleber Rosa wrote: > > > > > > >>>> Just like the previous tests, boots a Linux kernel on a aarch64 target > > > > > > >>>> using the virt machine. > > > > > > >>>> > > > > > > >>>> One special option added is the CPU type, given that the kernel > > > > > > >>>> selected fails to boot on the virt machine's default CPU (cortex-a15). > > > > > > >>>> > > > > > > >>>> Signed-off-by: Cleber Rosa <crosa@redhat.com> > > > > > > >>>> --- > > > > > > >>>> .travis.yml | 2 +- > > > > > > >>>> tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ > > > > > > >>>> 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > > >>>> > > > > > > >>>> diff --git a/.travis.yml b/.travis.yml > > > > > > >>>> index 54100eea5a..595e8c0b6c 100644 > > > > > > >>>> --- a/.travis.yml > > > > > > >>>> +++ b/.travis.yml > > > > > > >>>> @@ -187,7 +187,7 @@ matrix: > > > > > > >>>> # Acceptance (Functional) tests > > > > > > >>>> - env: > > > > > > >>>> - - CONFIG="--python=/usr/bin/python3 > > > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" > > > > > > >>>> + - CONFIG="--python=/usr/bin/python3 > > > > > > >>>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" > > > > > > >>>> > > > > > > >>>> - TEST_CMD="make check-acceptance" > > > > > > >>>> addons: > > > > > > >>>> apt: > > > > > > >>>> diff --git a/tests/acceptance/boot_linux_console.py > > > > > > >>>> b/tests/acceptance/boot_linux_console.py > > > > > > >>>> index f3ccd23a7a..107700b517 100644 > > > > > > >>>> --- a/tests/acceptance/boot_linux_console.py > > > > > > >>>> +++ b/tests/acceptance/boot_linux_console.py > > > > > > >>>> @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): > > > > > > >>>> self.vm.launch() > > > > > > >>>> console_pattern = 'Kernel command line: %s' % > > > > > > >>>> kernel_command_line > > > > > > >>>> self.wait_for_console_pattern(console_pattern) > > > > > > >>>> + > > > > > > >>>> + def test_aarch64_virt(self): > > > > > > >>> > > > > > > >>> That test case fails on my system (Fedora 29 x86_64). Avocado seems > > > > > > >>> unable to kill the VM so it reaches the timeout. > > > > > > >>> > > > > > > >>> I compiled QEMU with default configuration: > > > > > > >>> > > > > > > >>> $ configure --python=/usr/bin/python3 --target-list=x86_64-softmmu > > > > > > >>> --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu) > > > > > > >>> > > > > > > >>> > > > > > > >>> Follows a snippet of the Avocado's job.log file: > > > > > > >>> ---- > > > > > > >>> 2019-01-31 14:41:34,912 test L0602 INFO | START > > > > > > >>> 07-/root/src/qemu/tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_aarch64_virt > > > > > > >>> > > > > > > >>> 2019-01-31 14:41:34,912 test L0298 DEBUG| DATA > > > > > > >>> (filename=output.expected) => NOT FOUND (data sources: variant, test, file) > > > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS (key=arch, > > > > > > >>> path=*, default=aarch64) => 'aarch64' > > > > > > >>> 2019-01-31 14:41:34,913 parameters L0146 DEBUG| PARAMS > > > > > > >>> (key=qemu_bin, path=*, default=aarch64-softmmu/qemu-system-aarch64) => > > > > > > >>> 'aarch64-softmmu/qemu-system-aarch64' > > > > > > >>> 2019-01-31 14:41:34,915 download L0070 INFO | Fetching > > > > > > >>> https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz > > > > > > >>> -> /var/lib/avocado/data/cache/by_name/vmlinuz.3upct2pr > > > > > > >>> 2019-01-31 14:41:35,490 download L0054 DEBUG| Retrieved URL > > > > > > >>> "https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/releases/29/Server/aarch64/os/images/pxeboot/vmlinuz": > > > > > > >>> content-length 8623423, date: "Thu, 31 Jan 2019 19:41:35 GMT", > > > > > > >>> last-modified: "Sun, 21 Oct 2018 00:43:09 GMT" > > > > > > >>> 2019-01-31 14:41:41,765 qemu L0317 DEBUG| VM launch command: > > > > > > >>> 'aarch64-softmmu/qemu-system-aarch64 -chardev > > > > > > >>> socket,id=mon,path=/var/tmp/tmpizirkcud/qemu-32609-monitor.sock -mon > > > > > > >>> chardev=mon,mode=control -display none -vga none -machine virt -chardev > > > > > > >>> socket,id=console,path=/var/tmp/tmpizirkcud/qemu-32609-console.sock,server,nowait > > > > > > >>> -serial chardev:console -cpu cortex-a53 -kernel > > > > > > >>> /var/lib/avocado/data/cache/by_name/vmlinuz -append console=ttyAMA0' > > > > > > >>> 2019-01-31 14:41:41,779 qmp L0167 DEBUG| >>> {'execute': > > > > > > >>> 'qmp_capabilities'} > > > > > > >>> 2019-01-31 14:41:41,931 qmp L0175 DEBUG| <<< {'return': {}} > > > > > > >>> 2019-01-31 14:41:42,830 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > > >>> Booting Linux on physical CPU 0x0000000000 [0x410fd034] > > > > > > >>> > > > > > > >>> (...) > > > > > > >>> > > > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > > >>> Policy zone: DMA32 > > > > > > >>> 2019-01-31 14:41:42,833 boot_linux_conso L0041 DEBUG| [ 0.000000] > > > > > > >>> Kernel command line: console=ttyAMA0 > > > > > > >>> 2019-01-31 14:41:42,833 qmp L0167 DEBUG| >>> {'execute': > > > > > > >>> 'quit'} > > > > > > >> > > > > > > >> Here, a QMP response like "<<< {'return': {}}" would be expected. > > > > > > >> > > > > > > >> Since I can not reproduce this on my system (or on Travis-CI jobs I've > > > > > > >> sent), can you tell me on top of which commit you've applied these patches? > > > > > > >> > > > > > > > > > > > > > > I spoke too soon: > > > > > > > > > > > > > > https://travis-ci.org/clebergnu/qemu/jobs/487121425#L3033 > > > > > > > > > > > > > > This looks like a recent regression, and I'm guessing it's not on the > > > > > > > test's side. I'll try to bisect it and let you know. > > > > > > > > > > > > > > > > > > > On a fresh environment, I am able to get this reproduced on every 2 of > > > > > > runs, more or less. When I hit it, I attached GDB to it, and the > > > > > > backtrace shows: > > > > > > > > > > > > Thread debugging using libthread_db enabled] > > > > > > Using host libthread_db library "/lib64/libthread_db.so.1". > > > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > > > warning: Loadable section ".note.gnu.property" outside of ELF segments > > > > > > __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > > > 103 2: movl %edx, %eax > > > > > > (gdb) bt > > > > > > #0 __lll_lock_wait () at > > > > > > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > > > > > > #1 0x00007fc6ba1a2e09 in __GI___pthread_mutex_lock > > > > > > (mutex=mutex@entry=0x5615a233d020 <qemu_global_mutex>) at > > > > > > ../nptl/pthread_mutex_lock.c:80 > > > > > > #2 0x00005615a1bb7593 in qemu_mutex_lock_impl (mutex=0x5615a233d020 > > > > > > <qemu_global_mutex>, file=0x5615a1db2d4c "util/main-loop.c", line=236) > > > > > > at util/qemu-thread-posix.c:66 > > > > > > #3 0x00005615a171125e in qemu_mutex_lock_iothread_impl > > > > > > (file=file@entry=0x5615a1db2d4c "util/main-loop.c", line=line@entry=236) > > > > > > at /home/cleber/src/qemu/cpus.c:1849 > > > > > > #4 0x00005615a1bb415d in os_host_main_loop_wait (timeout=<optimized > > > > > > out>) at util/main-loop.c:236 > > > > > > #5 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:497 > > > > > > #6 0x00005615a18fdd39 in main_loop () at vl.c:1928 > > > > > > #7 0x00005615a16c9ee9 in main (argc=<optimized out>, argv=<optimized > > > > > > out>, envp=<optimized out>) at vl.c:4665 > > > > > > > > > > Tip: run "thread apply all bt" so you can get a backtrace of all > > > > > threads. > > > > > > > > > > > > > > > > > > > > > > Running it with `taskset -c 1` prevents this issue from happening, which > > > > > > AFAICT, contributes even further towards this being a QEMU race condition. > > > > > > > > > > > > I'm CC'ing Peter and Claudio (listed maintainers of aarch64), as this > > > > > > seems to limited to that target. Any tips on what to do here? > > > > > > > > > > I am hitting this on Travis, too, and I finally could reproduce > > > > > it locally, > > > > > > > > > > The guest is still writing on the serial console, but nobody is > > > > > reading the data on the other side. A VCPU thread is stuck > > > > > inside the EAGAIN/nanosleep loop at qemu_chr_write_buffer(), > > > > > holding the QEMU global lock. > > > > > > > > Experimental fix below. > > > > > > > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > > > > --- > > > > python/qemu/__init__.py | 12 ++++++++---- > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/python/qemu/__init__.py b/python/qemu/__init__.py > > > > index 81d9657ec0..4a691f34da 100644 > > > > --- a/python/qemu/__init__.py > > > > +++ b/python/qemu/__init__.py > > > > @@ -274,10 +274,6 @@ class QEMUMachine(object): > > > > > > > > self._qemu_log_path = None > > > > > > > > - if self._console_socket is not None: > > > > - self._console_socket.close() > > > > - self._console_socket = None > > > > - > > > > if self._temp_dir is not None: > > > > shutil.rmtree(self._temp_dir) > > > > self._temp_dir = None > > > > @@ -336,6 +332,14 @@ class QEMUMachine(object): > > > > """ > > > > Terminate the VM and clean up > > > > """ > > > > + > > > > + # If we keep the console socket open, we may deadlock waiting > > > > + # for QEMU to exit, while QEMU is waiting for the socket to > > > > + # become writeable. > > > > + if self._console_socket is not None: > > > > + self._console_socket.close() > > > > + self._console_socket = None > > > > + > > > > > > Right, this is somewhat equivalent to the following "workaround": > > > > > > https://github.com/clebergnu/qemu/commit/e1713f3b91972ad57c089f276c54db3f3fa63423 > > > > > > I could not tell at the moment, though, if closing (or shutting down) > > > the console socket was the appropriate fix. > > > > > > What I see is that Rick's commit pointed to by Lazlo is dated from > > > 2016, and it says that a virtio-console fix is necessary. I'd imagine > > > that, if a fix was ever proposed, it'd have been incorporated in the > > > F29 kernel used in this test... and that this workaround/fix would > > > not be the best solution. > > > > If I understood this correctly, fixing the guest driver wouldn't > > help here. The commit mentioned by Laszlo (6ab3fc32ea64 "hw: > > replace most use of qemu_chr_fe_write with qemu_chr_fe_write_all") > > fixes data loss bugs but introduces the potential for deadlocks > > like the one we are seeing. > > > > Unless we replace the existing ~30 qemu_chr_fe_write_all() calls > > in device code, we need to make sure we are constantly reading > > data from the console socket. > > Yes, you must *always* have something reading from the chardev backend. > This really sucks, but it is preferrable to letting data end up in > /dev/null. > > If this is being a problem for tests then consider it motivation to > fix the root cause problem and make the devices properly do async > I/O for chardevs. Only a handful of them properly do this right > now. > > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| Daniel, Thanks for the explanation. I don't feel completely capable of tackling the root cause, but I'll give it a go anyway. Eduardo, Looks like we'll need this type of workaround now anyways. As seen in the reproducer I wrote, a test can cause this deadlock before it gets to the test tearDown(), and either implicitly or explicitly gets to QEMUMachine::shutdown(). I would also like to avoid exposing the user (test writer) to that kind of situation though, which seems like what you tried to do. But, maybe the most honest approach right now is to make a explicit call to read from and close the socket, paired with documentation/comment about why that is being done? This would be similar to the "XXX" comments on 6ab3fc32ea64. Regards! - Cleber.
diff --git a/.travis.yml b/.travis.yml index 54100eea5a..595e8c0b6c 100644 --- a/.travis.yml +++ b/.travis.yml @@ -187,7 +187,7 @@ matrix: # Acceptance (Functional) tests - env: - - CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu" + - CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu" - TEST_CMD="make check-acceptance" addons: apt: diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py index f3ccd23a7a..107700b517 100644 --- a/tests/acceptance/boot_linux_console.py +++ b/tests/acceptance/boot_linux_console.py @@ -138,3 +138,23 @@ class BootLinuxConsole(Test): self.vm.launch() console_pattern = 'Kernel command line: %s' % kernel_command_line self.wait_for_console_pattern(console_pattern) + + def test_aarch64_virt(self): + """ + :avocado: tags=arch:aarch64 + :avocado: tags=machine:virt + """ + kernel_url = ('https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/' + 'releases/29/Server/aarch64/os/images/pxeboot/vmlinuz') + kernel_hash = '8c73e469fc6ea06a58dc83a628fc695b693b8493' + kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash) + + self.vm.set_machine('virt') + self.vm.set_console() + kernel_command_line = 'console=ttyAMA0' + self.vm.add_args('-cpu', 'cortex-a53', + '-kernel', kernel_path, + '-append', kernel_command_line) + self.vm.launch() + console_pattern = 'Kernel command line: %s' % kernel_command_line + self.wait_for_console_pattern(console_pattern)
Just like the previous tests, boots a Linux kernel on a aarch64 target using the virt machine. One special option added is the CPU type, given that the kernel selected fails to boot on the virt machine's default CPU (cortex-a15). Signed-off-by: Cleber Rosa <crosa@redhat.com> --- .travis.yml | 2 +- tests/acceptance/boot_linux_console.py | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+), 1 deletion(-)