qemu-2.8-rc4 is broken

Message ID	000c01d2754d$59a4cd70$0cee6850$@ru
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: "Pavel Dovgalyuk" <dovgaluk@ispras.ru> To: =?utf-8?Q?'Alex_Benn=C3=A9e'?= <alex.bennee@linaro.org> References: <000301d259dc$f9d097c0$ed71c740$@ru> <000601d25a95$12b1b9f0$38152dd0$@ru> <20161220102126.GE5602@stefanha-x1.localdomain> <002501d25ab1$af024b00$0d06e100$@ru> <CAJSP0QXm9ssLC5C+gV_agkEW_fdUY=NWBMvHqMh55UhYTR276g@mail.gmail.com> <000301d25b4f$20018440$60048cc0$@ru> <CAJSP0QW1pOcthcEt16Pp5xQu3miUh2Leq3kn3yvQqWqhi1P=QQ@mail.gmail.com> <000801d26bd9$dca56db0$95f04910$@ru> <87o9zd3jta.fsf@linaro.org> <000e01d26caa$dfdb3150$9f9193f0$@ru> <87mveleiw8.fsf@linaro.org> In-Reply-To: <87mveleiw8.fsf@linaro.org> Date: Mon, 23 Jan 2017 10:50:22 +0300 Message-ID: <000c01d2754d$59a4cd70$0cee6850$@ru> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Thread-Index: AdJzQ0cZnNFM4WDkS9Wuhg3/SJdPPACCY+0g Content-Language: ru Subject: Re: [Qemu-devel] qemu-2.8-rc4 is broken Precedence: list Cc: 'Peter Maydell' <peter.maydell@linaro.org>, 'Stefan Hajnoczi' <stefanha@gmail.com>, 'qemu-devel' <qemu-devel@nongnu.org>, 'Pavel Dovgalyuk' <pavel.dovgaluk@ispras.ru>, 'Paolo Bonzini' <pbonzini@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Message ID

000c01d2754d$59a4cd70$0cee6850$@ru

State

New

Headers

From: "Pavel Dovgalyuk" <dovgaluk@ispras.ru>
To: =?utf-8?Q?'Alex_Benn=C3=A9e'?= <alex.bennee@linaro.org>
References: <000301d259dc$f9d097c0$ed71c740$@ru>
	<000601d25a95$12b1b9f0$38152dd0$@ru>
	<20161220102126.GE5602@stefanha-x1.localdomain>
	<002501d25ab1$af024b00$0d06e100$@ru>
	<CAJSP0QXm9ssLC5C+gV_agkEW_fdUY=NWBMvHqMh55UhYTR276g@mail.gmail.com>
	<000301d25b4f$20018440$60048cc0$@ru>
	<CAJSP0QW1pOcthcEt16Pp5xQu3miUh2Leq3kn3yvQqWqhi1P=QQ@mail.gmail.com>
	<000801d26bd9$dca56db0$95f04910$@ru> <87o9zd3jta.fsf@linaro.org>
	<000e01d26caa$dfdb3150$9f9193f0$@ru> <87mveleiw8.fsf@linaro.org>
In-Reply-To: <87mveleiw8.fsf@linaro.org>
Date: Mon, 23 Jan 2017 10:50:22 +0300
Message-ID: <000c01d2754d$59a4cd70$0cee6850$@ru>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AdJzQ0cZnNFM4WDkS9Wuhg3/SJdPPACCY+0g
Content-Language: ru
Subject: Re: [Qemu-devel] qemu-2.8-rc4 is broken
Precedence: list
Cc: 'Peter Maydell' <peter.maydell@linaro.org>,
	'Stefan Hajnoczi' <stefanha@gmail.com>,
	'qemu-devel' <qemu-devel@nongnu.org>,
	'Pavel Dovgalyuk' <pavel.dovgaluk@ispras.ru>,
	'Paolo Bonzini' <pbonzini@redhat.com>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Commit Message

Pavel Dovgalyuk Jan. 23, 2017, 7:50 a.m. UTC

> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
> 
> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> >
> > Sorry, this is another problem which occurs only in icount replay mode:
> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
> >    running out all the planned instructions:
> >     } else if (replay_has_exception()
> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >         /* try to cause an exception pending in the log */
> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >         *ret = -1;
> >         return true;
> >
> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
> >             if (cpu_handle_exception(cpu, &ret)) {
> >                 break;
> >             }
> >    is checked again causing an infinite loop.
> >
> > TB cache is not flushed because we never execute that break and real work of tb_flush
> > is made outside this loop.
> 
> I think what we need is a:
> 
> 
>   if (cpu->exit_request)
>     break;

Where this exit_request is supposed to be set?

> 
> before the cpu_handle_exception() call to ensure any queued work gets
> processed first. Can you give me you current command line so I can
> reproduce this and check the fix works?

I solved the problem using following patch:


Pavel Dovgalyuk

Comments

Alex Bennée Jan. 23, 2017, 9:38 a.m. UTC | #1

Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>>
>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> >
>> > Sorry, this is another problem which occurs only in icount replay mode:
>> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
>> >    running out all the planned instructions:
>> >     } else if (replay_has_exception()
>> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> >         /* try to cause an exception pending in the log */
>> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>> >         *ret = -1;
>> >         return true;
>> >
>> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
>> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
>> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
>> >             if (cpu_handle_exception(cpu, &ret)) {
>> >                 break;
>> >             }
>> >    is checked again causing an infinite loop.
>> >
>> > TB cache is not flushed because we never execute that break and real work of tb_flush
>> > is made outside this loop.
>>
>> I think what we need is a:
>>
>>
>>   if (cpu->exit_request)
>>     break;
>
> Where this exit_request is supposed to be set?

Ahh my mistake. Currently it is a global exit_request (becoming a
per-cpu exit_request when MTTCG is merged). It's set by qemu_cpu_kick()
when work is queued up, in this case the tb_flush async work.


>> before the cpu_handle_exception() call to ensure any queued work gets
>> processed first. Can you give me you current command line so I can
>> reproduce this and check the fix works?
>
> I solved the problem using following patch:
>
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  #ifndef CONFIG_USER_ONLY
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +        /* Break the execution loop in case of running out of TB cache.
> +           This is needed to make flushing of the TB cache, because
> +           real flush is queued to be executed outside the cpu loop. */
> +        cpu->exception_index = EXCP_INTERRUPT;
>          /* try to cause an exception pending in the log */
>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>          *ret = -1;

I wonder if it worth renaming EXCP_INTERRUPT? I always get it confused
with a guest interrupt. But the effect is the same as we set it on an
exit_request.

--
Alex Bennée

--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -451,6 +451,10 @@  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 #ifndef CONFIG_USER_ONLY
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+        /* Break the execution loop in case of running out of TB cache.
+           This is needed to make flushing of the TB cache, because
+           real flush is queued to be executed outside the cpu loop. */
+        cpu->exception_index = EXCP_INTERRUPT;
         /* try to cause an exception pending in the log */
         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;

qemu-2.8-rc4 is broken

Commit Message

Comments

Patch