Message ID | 20230829010658.8252-1-npiggin@gmail.com |
---|---|
State | New |
Headers | show |
Series | accel/tcg: mttcg remove false-negative halted assertion | expand |
On 8/28/23 18:06, Nicholas Piggin wrote: > mttcg asserts that an execution ending with EXCP_HALTED must have > cpu->halted. However between the event or instruction that sets > cpu->halted and requests exit and the assertion here, an > asynchronous event could clear cpu->halted. > > This leads to crashes running AIX on ppc/pseries because it uses > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > H_PROD sets other cpu->halted = 0 and kicks it. > > H_PROD could be turned into an interrupt to wake, but several other > places in ppc, sparc, and semihosting follow what looks like a similar > pattern setting halted = 0 directly. So remove this assertion. > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > accel/tcg/tcg-accel-ops-mttcg.c | 11 ----------- > 1 file changed, 11 deletions(-) The adjustments of 'halted' and 'prod' are done under the io lock in both cases, so there's no race there. It is perfectly reasonable that after thread A sets halted and drops the lock, thread B may acquire the lock and clear halted before thread A has a chance to complete longjmp and cycle through its main loop. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > index b276262007..d0b6f288d9 100644 > --- a/accel/tcg/tcg-accel-ops-mttcg.c > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > case EXCP_DEBUG: > cpu_handle_guest_debug(cpu); > break; > - case EXCP_HALTED: > - /* > - * during start-up the vCPU is reset and the thread is > - * kicked several times. If we don't ensure we go back > - * to sleep in the halted state we won't cleanly > - * start-up when the vCPU is enabled. > - * > - * cpu->halted should ensure we sleep in wait_io_event > - */ > - g_assert(cpu->halted); > - break; I adjusted the patch to keep the case label and update the comment, still dropping the assert. Queued to tcg-next. r~
29.08.2023 04:06, Nicholas Piggin wrote: > mttcg asserts that an execution ending with EXCP_HALTED must have > cpu->halted. However between the event or instruction that sets > cpu->halted and requests exit and the assertion here, an > asynchronous event could clear cpu->halted. > > This leads to crashes running AIX on ppc/pseries because it uses > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > H_PROD sets other cpu->halted = 0 and kicks it. > > H_PROD could be turned into an interrupt to wake, but several other > places in ppc, sparc, and semihosting follow what looks like a similar > pattern setting halted = 0 directly. So remove this assertion. > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> This one also smells like a stable material, is it not? Thanks, /mjt > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > index b276262007..d0b6f288d9 100644 > --- a/accel/tcg/tcg-accel-ops-mttcg.c > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > case EXCP_DEBUG: > cpu_handle_guest_debug(cpu); > break; > - case EXCP_HALTED: > - /* > - * during start-up the vCPU is reset and the thread is > - * kicked several times. If we don't ensure we go back > - * to sleep in the halted state we won't cleanly > - * start-up when the vCPU is enabled. > - * > - * cpu->halted should ensure we sleep in wait_io_event > - */ > - g_assert(cpu->halted); > - break; > case EXCP_ATOMIC: > qemu_mutex_unlock_iothread(); > cpu_exec_step_atomic(cpu);
On Fri Sep 22, 2023 at 4:25 AM AEST, Michael Tokarev wrote: > 29.08.2023 04:06, Nicholas Piggin wrote: > > mttcg asserts that an execution ending with EXCP_HALTED must have > > cpu->halted. However between the event or instruction that sets > > cpu->halted and requests exit and the assertion here, an > > asynchronous event could clear cpu->halted. > > > > This leads to crashes running AIX on ppc/pseries because it uses > > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > > H_PROD sets other cpu->halted = 0 and kicks it. > > > > H_PROD could be turned into an interrupt to wake, but several other > > places in ppc, sparc, and semihosting follow what looks like a similar > > pattern setting halted = 0 directly. So remove this assertion. > > > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > This one also smells like a stable material, is it not? Yeah I would say it is. Thanks, Nick > > Thanks, > > /mjt > > > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > > index b276262007..d0b6f288d9 100644 > > --- a/accel/tcg/tcg-accel-ops-mttcg.c > > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > > case EXCP_DEBUG: > > cpu_handle_guest_debug(cpu); > > break; > > - case EXCP_HALTED: > > - /* > > - * during start-up the vCPU is reset and the thread is > > - * kicked several times. If we don't ensure we go back > > - * to sleep in the halted state we won't cleanly > > - * start-up when the vCPU is enabled. > > - * > > - * cpu->halted should ensure we sleep in wait_io_event > > - */ > > - g_assert(cpu->halted); > > - break; > > case EXCP_ATOMIC: > > qemu_mutex_unlock_iothread(); > > cpu_exec_step_atomic(cpu);
diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c index b276262007..d0b6f288d9 100644 --- a/accel/tcg/tcg-accel-ops-mttcg.c +++ b/accel/tcg/tcg-accel-ops-mttcg.c @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) case EXCP_DEBUG: cpu_handle_guest_debug(cpu); break; - case EXCP_HALTED: - /* - * during start-up the vCPU is reset and the thread is - * kicked several times. If we don't ensure we go back - * to sleep in the halted state we won't cleanly - * start-up when the vCPU is enabled. - * - * cpu->halted should ensure we sleep in wait_io_event - */ - g_assert(cpu->halted); - break; case EXCP_ATOMIC: qemu_mutex_unlock_iothread(); cpu_exec_step_atomic(cpu);
mttcg asserts that an execution ending with EXCP_HALTED must have cpu->halted. However between the event or instruction that sets cpu->halted and requests exit and the assertion here, an asynchronous event could clear cpu->halted. This leads to crashes running AIX on ppc/pseries because it uses H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and H_PROD sets other cpu->halted = 0 and kicks it. H_PROD could be turned into an interrupt to wake, but several other places in ppc, sparc, and semihosting follow what looks like a similar pattern setting halted = 0 directly. So remove this assertion. Reported-by: Ivan Warren <ivan@vmfacility.fr> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- accel/tcg/tcg-accel-ops-mttcg.c | 11 ----------- 1 file changed, 11 deletions(-)