diff mbox

qemu-2.8-rc4 is broken

Message ID 000c01d2754d$59a4cd70$0cee6850$@ru
State New
Headers show

Commit Message

Pavel Dovgalyuk Jan. 23, 2017, 7:50 a.m. UTC
> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
> 
> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> >
> > Sorry, this is another problem which occurs only in icount replay mode:
> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
> >    running out all the planned instructions:
> >     } else if (replay_has_exception()
> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >         /* try to cause an exception pending in the log */
> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >         *ret = -1;
> >         return true;
> >
> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
> >             if (cpu_handle_exception(cpu, &ret)) {
> >                 break;
> >             }
> >    is checked again causing an infinite loop.
> >
> > TB cache is not flushed because we never execute that break and real work of tb_flush
> > is made outside this loop.
> 
> I think what we need is a:
> 
> 
>   if (cpu->exit_request)
>     break;

Where this exit_request is supposed to be set?

> 
> before the cpu_handle_exception() call to ensure any queued work gets
> processed first. Can you give me you current command line so I can
> reproduce this and check the fix works?

I solved the problem using following patch:


Pavel Dovgalyuk

Comments

Alex Bennée Jan. 23, 2017, 9:38 a.m. UTC | #1
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>>
>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> >
>> > Sorry, this is another problem which occurs only in icount replay mode:
>> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
>> >    running out all the planned instructions:
>> >     } else if (replay_has_exception()
>> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> >         /* try to cause an exception pending in the log */
>> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>> >         *ret = -1;
>> >         return true;
>> >
>> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
>> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
>> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
>> >             if (cpu_handle_exception(cpu, &ret)) {
>> >                 break;
>> >             }
>> >    is checked again causing an infinite loop.
>> >
>> > TB cache is not flushed because we never execute that break and real work of tb_flush
>> > is made outside this loop.
>>
>> I think what we need is a:
>>
>>
>>   if (cpu->exit_request)
>>     break;
>
> Where this exit_request is supposed to be set?

Ahh my mistake. Currently it is a global exit_request (becoming a
per-cpu exit_request when MTTCG is merged). It's set by qemu_cpu_kick()
when work is queued up, in this case the tb_flush async work.


>> before the cpu_handle_exception() call to ensure any queued work gets
>> processed first. Can you give me you current command line so I can
>> reproduce this and check the fix works?
>
> I solved the problem using following patch:
>
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  #ifndef CONFIG_USER_ONLY
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +        /* Break the execution loop in case of running out of TB cache.
> +           This is needed to make flushing of the TB cache, because
> +           real flush is queued to be executed outside the cpu loop. */
> +        cpu->exception_index = EXCP_INTERRUPT;
>          /* try to cause an exception pending in the log */
>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>          *ret = -1;

I wonder if it worth renaming EXCP_INTERRUPT? I always get it confused
with a guest interrupt. But the effect is the same as we set it on an
exit_request.

--
Alex Bennée
diff mbox

Patch

--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -451,6 +451,10 @@  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 #ifndef CONFIG_USER_ONLY
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+        /* Break the execution loop in case of running out of TB cache.
+           This is needed to make flushing of the TB cache, because
+           real flush is queued to be executed outside the cpu loop. */
+        cpu->exception_index = EXCP_INTERRUPT;
         /* try to cause an exception pending in the log */
         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;