Message ID | 20220725073855.76049-14-eesposit@redhat.com |
---|---|
State | New |
Headers | show |
Series | job: replace AioContext lock with job_mutex | expand |
Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben: > From: Paolo Bonzini <pbonzini@redhat.com> > > We want to make sure access of job->aio_context is always done > under either BQL or job_mutex. Is this the goal of this series? If so, it would have been useful to state somewhere more obvious, because I had assumed that holding the BQL would not be considered enough, but everyone needs to hold the job_mutex. > The problem is that using > aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond > makes the coroutine immediately resume, so we can't hold the job lock. > And caching it is not safe either, as it might change. > > job_start is under BQL, so it can freely read job->aiocontext, but > job_enter_cond is not. In order to fix this, use aio_co_wake(): > the advantage is that it won't use job->aiocontext, but the > main disadvantage is that it won't be able to detect a change of > job AioContext. > > Calling bdrv_try_set_aio_context() will issue the following calls > (simplified): > * in terms of bdrv callbacks: > .drained_begin -> .set_aio_context -> .drained_end > * in terms of child_job functions: > child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end > * in terms of job functions: > job_pause_locked -> job_set_aio_context -> job_resume_locked > > We can see that after setting the new aio_context, job_resume_locked > calls again job_enter_cond, which then invokes aio_co_wake(). But > while job->aiocontext has been set in job_set_aio_context, > job->co->ctx has not changed, so the coroutine would be entering in > the wrong aiocontext. > > Using aio_co_schedule in job_resume_locked() might seem as a valid > alternative, but the problem is that the bh resuming the coroutine > is not scheduled immediately, and if in the meanwhile another > bdrv_try_set_aio_context() is run (see test_propagate_mirror() in > test-block-iothread.c), we would have the first schedule in the > wrong aiocontext, and the second set of drains won't even manage > to schedule the coroutine, as job->busy would still be true from > the previous job_resume_locked(). > > The solution is to stick with aio_co_wake(), but then detect every time > the coroutine resumes back from yielding if job->aio_context > has changed. If so, we can reschedule it to the new context. Hm, but with this in place, what does aio_co_wake() actually buy us compared to aio_co_enter()? I guess it's a bit simpler code because you don't have to explicitly specify the AioContext, but we're still going to enter the coroutine in the wrong AioContext occasionally and have to reschedule it, just like in the existing code (except that the rescheduling doesn't exist there yet). So while I don't disagree with the change, I don't think the justification in the commit message is right for this part. > Check for the aiocontext change in job_do_yield_locked because: > 1) aio_co_reschedule_self requires to be in the running coroutine > 2) since child_job_set_aio_context allows changing the aiocontext only > while the job is paused, this is the exact place where the coroutine > resumes, before running JobDriver's code. > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > job.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/job.c b/job.c > index b0729e2bb2..ecec66b44e 100644 > --- a/job.c > +++ b/job.c > @@ -585,7 +585,9 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job)) > timer_del(&job->sleep_timer); > job->busy = true; > real_job_unlock(); > - aio_co_enter(job->aio_context, job->co); > + job_unlock(); > + aio_co_wake(job->co); > + job_lock(); The addition of job_unlock/lock is unrelated, this was necessary even before this patch. > } > > void job_enter_cond(Job *job, bool(*fn)(Job *job)) > @@ -611,6 +613,8 @@ void job_enter(Job *job) > */ > static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) > { > + AioContext *next_aio_context; > + > real_job_lock(); > if (ns != -1) { > timer_mod(&job->sleep_timer, ns); > @@ -622,7 +626,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) > qemu_coroutine_yield(); > job_lock(); > > - /* Set by job_enter_cond() before re-entering the coroutine. */ > + next_aio_context = job->aio_context; > + /* > + * Coroutine has resumed, but in the meanwhile the job AioContext > + * might have changed via bdrv_try_set_aio_context(), so we need to move > + * the coroutine too in the new aiocontext. > + */ > + while (qemu_get_current_aio_context() != next_aio_context) { > + job_unlock(); > + aio_co_reschedule_self(next_aio_context); > + job_lock(); > + next_aio_context = job->aio_context; > + } > + > + /* Set by job_enter_cond_locked() before re-entering the coroutine. */ > assert(job->busy); > } The actual code changes look good. Kevin
Am 05/08/2022 um 10:37 schrieb Kevin Wolf: > Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben: >> From: Paolo Bonzini <pbonzini@redhat.com> >> >> We want to make sure access of job->aio_context is always done >> under either BQL or job_mutex. > > Is this the goal of this series? If so, it would have been useful to > state somewhere more obvious, because I had assumed that holding the BQL > would not be considered enough, but everyone needs to hold the job_mutex. It is the goal for this patch :) The whole job API can't rely on BQL since there are coroutines running in another aiocontext. > >> The problem is that using >> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond >> makes the coroutine immediately resume, so we can't hold the job lock. >> And caching it is not safe either, as it might change. >> >> job_start is under BQL, so it can freely read job->aiocontext, but >> job_enter_cond is not. In order to fix this, use aio_co_wake(): >> the advantage is that it won't use job->aiocontext, but the >> main disadvantage is that it won't be able to detect a change of >> job AioContext. >> >> Calling bdrv_try_set_aio_context() will issue the following calls >> (simplified): >> * in terms of bdrv callbacks: >> .drained_begin -> .set_aio_context -> .drained_end >> * in terms of child_job functions: >> child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end >> * in terms of job functions: >> job_pause_locked -> job_set_aio_context -> job_resume_locked >> >> We can see that after setting the new aio_context, job_resume_locked >> calls again job_enter_cond, which then invokes aio_co_wake(). But >> while job->aiocontext has been set in job_set_aio_context, >> job->co->ctx has not changed, so the coroutine would be entering in >> the wrong aiocontext. >> >> Using aio_co_schedule in job_resume_locked() might seem as a valid >> alternative, but the problem is that the bh resuming the coroutine >> is not scheduled immediately, and if in the meanwhile another >> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in >> test-block-iothread.c), we would have the first schedule in the >> wrong aiocontext, and the second set of drains won't even manage >> to schedule the coroutine, as job->busy would still be true from >> the previous job_resume_locked(). >> >> The solution is to stick with aio_co_wake(), but then detect every time >> the coroutine resumes back from yielding if job->aio_context >> has changed. If so, we can reschedule it to the new context. > > Hm, but with this in place, what does aio_co_wake() actually buy us > compared to aio_co_enter()? > > I guess it's a bit simpler code because you don't have to explicitly > specify the AioContext, but we're still going to enter the coroutine in > the wrong AioContext occasionally and have to reschedule it, just like > in the existing code (except that the rescheduling doesn't exist there > yet). > > So while I don't disagree with the change, I don't think the > justification in the commit message is right for this part. What do you suggest to change? > >> Check for the aiocontext change in job_do_yield_locked because: >> 1) aio_co_reschedule_self requires to be in the running coroutine >> 2) since child_job_set_aio_context allows changing the aiocontext only >> while the job is paused, this is the exact place where the coroutine >> resumes, before running JobDriver's code. >> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> job.c | 21 +++++++++++++++++++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/job.c b/job.c >> index b0729e2bb2..ecec66b44e 100644 >> --- a/job.c >> +++ b/job.c >> @@ -585,7 +585,9 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job)) >> timer_del(&job->sleep_timer); >> job->busy = true; >> real_job_unlock(); >> - aio_co_enter(job->aio_context, job->co); >> + job_unlock(); >> + aio_co_wake(job->co); >> + job_lock(); > > The addition of job_unlock/lock is unrelated, this was necessary even > before this patch. Ok > >> } >> >> void job_enter_cond(Job *job, bool(*fn)(Job *job)) >> @@ -611,6 +613,8 @@ void job_enter(Job *job) >> */ >> static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) >> { >> + AioContext *next_aio_context; >> + >> real_job_lock(); >> if (ns != -1) { >> timer_mod(&job->sleep_timer, ns); >> @@ -622,7 +626,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) >> qemu_coroutine_yield(); >> job_lock(); >> >> - /* Set by job_enter_cond() before re-entering the coroutine. */ >> + next_aio_context = job->aio_context; >> + /* >> + * Coroutine has resumed, but in the meanwhile the job AioContext >> + * might have changed via bdrv_try_set_aio_context(), so we need to move >> + * the coroutine too in the new aiocontext. >> + */ >> + while (qemu_get_current_aio_context() != next_aio_context) { >> + job_unlock(); >> + aio_co_reschedule_self(next_aio_context); >> + job_lock(); >> + next_aio_context = job->aio_context; >> + } >> + >> + /* Set by job_enter_cond_locked() before re-entering the coroutine. */ >> assert(job->busy); >> } > Emanuele
Am 16.08.2022 um 17:09 hat Emanuele Giuseppe Esposito geschrieben: > > > Am 05/08/2022 um 10:37 schrieb Kevin Wolf: > > Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben: > >> From: Paolo Bonzini <pbonzini@redhat.com> > >> > >> We want to make sure access of job->aio_context is always done > >> under either BQL or job_mutex. > > > > Is this the goal of this series? If so, it would have been useful to > > state somewhere more obvious, because I had assumed that holding the BQL > > would not be considered enough, but everyone needs to hold the job_mutex. > > It is the goal for this patch :) > The whole job API can't rely on BQL since there are coroutines running > in another aiocontext. Yes, as I saw in patch 14, which describes the goal more clearly in the commit message and also adds the corresponding documentation to Job.aio_context. Maybe it would have been clearer if the documentation were already in this patch. > >> The problem is that using > >> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond > >> makes the coroutine immediately resume, so we can't hold the job lock. > >> And caching it is not safe either, as it might change. > >> > >> job_start is under BQL, so it can freely read job->aiocontext, but > >> job_enter_cond is not. In order to fix this, use aio_co_wake(): > >> the advantage is that it won't use job->aiocontext, but the > >> main disadvantage is that it won't be able to detect a change of > >> job AioContext. > >> > >> Calling bdrv_try_set_aio_context() will issue the following calls > >> (simplified): > >> * in terms of bdrv callbacks: > >> .drained_begin -> .set_aio_context -> .drained_end > >> * in terms of child_job functions: > >> child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end > >> * in terms of job functions: > >> job_pause_locked -> job_set_aio_context -> job_resume_locked > >> > >> We can see that after setting the new aio_context, job_resume_locked > >> calls again job_enter_cond, which then invokes aio_co_wake(). But > >> while job->aiocontext has been set in job_set_aio_context, > >> job->co->ctx has not changed, so the coroutine would be entering in > >> the wrong aiocontext. > >> > >> Using aio_co_schedule in job_resume_locked() might seem as a valid > >> alternative, but the problem is that the bh resuming the coroutine > >> is not scheduled immediately, and if in the meanwhile another > >> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in > >> test-block-iothread.c), we would have the first schedule in the > >> wrong aiocontext, and the second set of drains won't even manage > >> to schedule the coroutine, as job->busy would still be true from > >> the previous job_resume_locked(). > >> > >> The solution is to stick with aio_co_wake(), but then detect every time > >> the coroutine resumes back from yielding if job->aio_context > >> has changed. If so, we can reschedule it to the new context. > > > > Hm, but with this in place, what does aio_co_wake() actually buy us > > compared to aio_co_enter()? > > > > I guess it's a bit simpler code because you don't have to explicitly > > specify the AioContext, but we're still going to enter the coroutine in > > the wrong AioContext occasionally and have to reschedule it, just like > > in the existing code (except that the rescheduling doesn't exist there > > yet). > > > > So while I don't disagree with the change, I don't think the > > justification in the commit message is right for this part. > > What do you suggest to change? The commit message shouldn't pretend that aio_co_wake() solves the problem (it says "In order to fix this, use aio_co_wake"), even if that's what you thought at first before you saw that the problem wasn't fully fixed by it. I would move the real solution up in the commit message ("In order to fix this, detect every time..."), and then maybe mention why aio_co_wake() doesn't solve the problem, but you're leaving it in anyway because it's nicer than the previous sequence or something like that. > >> Check for the aiocontext change in job_do_yield_locked because: > >> 1) aio_co_reschedule_self requires to be in the running coroutine > >> 2) since child_job_set_aio_context allows changing the aiocontext only > >> while the job is paused, this is the exact place where the coroutine > >> resumes, before running JobDriver's code. > >> > >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Kevin
Am 17/08/2022 um 10:34 schrieb Kevin Wolf: > Am 16.08.2022 um 17:09 hat Emanuele Giuseppe Esposito geschrieben: >> >> >> Am 05/08/2022 um 10:37 schrieb Kevin Wolf: >>> Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben: >>>> From: Paolo Bonzini <pbonzini@redhat.com> >>>> >>>> We want to make sure access of job->aio_context is always done >>>> under either BQL or job_mutex. >>> >>> Is this the goal of this series? If so, it would have been useful to >>> state somewhere more obvious, because I had assumed that holding the BQL >>> would not be considered enough, but everyone needs to hold the job_mutex. >> >> It is the goal for this patch :) >> The whole job API can't rely on BQL since there are coroutines running >> in another aiocontext. > > Yes, as I saw in patch 14, which describes the goal more clearly in the > commit message and also adds the corresponding documentation to > Job.aio_context. Maybe it would have been clearer if the documentation > were already in this patch. > I don't understand what you mean here, sorry. You mean job_set_aiocontext documentation? I don't see what is confusing here, could you please clarify? >>>> The problem is that using >>>> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond >>>> makes the coroutine immediately resume, so we can't hold the job lock. >>>> And caching it is not safe either, as it might change. >>>> >>>> job_start is under BQL, so it can freely read job->aiocontext, but >>>> job_enter_cond is not. In order to fix this, use aio_co_wake(): >>>> the advantage is that it won't use job->aiocontext, but the >>>> main disadvantage is that it won't be able to detect a change of >>>> job AioContext. >>>> >>>> Calling bdrv_try_set_aio_context() will issue the following calls >>>> (simplified): >>>> * in terms of bdrv callbacks: >>>> .drained_begin -> .set_aio_context -> .drained_end >>>> * in terms of child_job functions: >>>> child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end >>>> * in terms of job functions: >>>> job_pause_locked -> job_set_aio_context -> job_resume_locked >>>> >>>> We can see that after setting the new aio_context, job_resume_locked >>>> calls again job_enter_cond, which then invokes aio_co_wake(). But >>>> while job->aiocontext has been set in job_set_aio_context, >>>> job->co->ctx has not changed, so the coroutine would be entering in >>>> the wrong aiocontext. >>>> >>>> Using aio_co_schedule in job_resume_locked() might seem as a valid >>>> alternative, but the problem is that the bh resuming the coroutine >>>> is not scheduled immediately, and if in the meanwhile another >>>> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in >>>> test-block-iothread.c), we would have the first schedule in the >>>> wrong aiocontext, and the second set of drains won't even manage >>>> to schedule the coroutine, as job->busy would still be true from >>>> the previous job_resume_locked(). >>>> >>>> The solution is to stick with aio_co_wake(), but then detect every time >>>> the coroutine resumes back from yielding if job->aio_context >>>> has changed. If so, we can reschedule it to the new context. >>> >>> Hm, but with this in place, what does aio_co_wake() actually buy us >>> compared to aio_co_enter()? >>> >>> I guess it's a bit simpler code because you don't have to explicitly >>> specify the AioContext, but we're still going to enter the coroutine in >>> the wrong AioContext occasionally and have to reschedule it, just like >>> in the existing code (except that the rescheduling doesn't exist there >>> yet). >>> >>> So while I don't disagree with the change, I don't think the >>> justification in the commit message is right for this part. >> >> What do you suggest to change? > > The commit message shouldn't pretend that aio_co_wake() solves the > problem (it says "In order to fix this, use aio_co_wake"), even if > that's what you thought at first before you saw that the problem wasn't > fully fixed by it. > > I would move the real solution up in the commit message ("In order to > fix this, detect every time..."), and then maybe mention why > aio_co_wake() doesn't solve the problem, but you're leaving it in anyway > because it's nicer than the previous sequence or something like that. I think the issue is "In order to fix this", meaning what "this" refers to. It is not the fix, it is the problem stated in the previous sentence. Since job_enter_cond is not under BQL, we can't read job->aiocontext, therefore use aio_co_wake to avoid doing that. Would it be ok if I replace the paragraph with: job_start is under BQL, so it can freely read job->aiocontext, but job_enter_cond is not. In order to avoid reading the job aiocontext, use aio_co_wake(), since it resorts so job->co->ctx. The only disadvantage is that it won't be able to detect a change of job AioContext, as explained below. Emanuele > >>>> Check for the aiocontext change in job_do_yield_locked because: >>>> 1) aio_co_reschedule_self requires to be in the running coroutine >>>> 2) since child_job_set_aio_context allows changing the aiocontext only >>>> while the job is paused, this is the exact place where the coroutine >>>> resumes, before running JobDriver's code. >>>> >>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> >>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > Kevin >
diff --git a/job.c b/job.c index b0729e2bb2..ecec66b44e 100644 --- a/job.c +++ b/job.c @@ -585,7 +585,9 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job)) timer_del(&job->sleep_timer); job->busy = true; real_job_unlock(); - aio_co_enter(job->aio_context, job->co); + job_unlock(); + aio_co_wake(job->co); + job_lock(); } void job_enter_cond(Job *job, bool(*fn)(Job *job)) @@ -611,6 +613,8 @@ void job_enter(Job *job) */ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) { + AioContext *next_aio_context; + real_job_lock(); if (ns != -1) { timer_mod(&job->sleep_timer, ns); @@ -622,7 +626,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns) qemu_coroutine_yield(); job_lock(); - /* Set by job_enter_cond() before re-entering the coroutine. */ + next_aio_context = job->aio_context; + /* + * Coroutine has resumed, but in the meanwhile the job AioContext + * might have changed via bdrv_try_set_aio_context(), so we need to move + * the coroutine too in the new aiocontext. + */ + while (qemu_get_current_aio_context() != next_aio_context) { + job_unlock(); + aio_co_reschedule_self(next_aio_context); + job_lock(); + next_aio_context = job->aio_context; + } + + /* Set by job_enter_cond_locked() before re-entering the coroutine. */ assert(job->busy); }