Message ID | 20220616131835.2004262-2-eesposit@redhat.com |
---|---|
State | New |
Headers | show |
Series | job: replace AioContext lock with job_mutex | expand |
On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote: > job mutex will be used to protect the job struct elements and list, > replacing AioContext locks. > > Right now use a shared lock for all jobs, in order to keep things > simple. Once the AioContext lock is gone, we can introduce per-job > locks. > > To simplify the switch from aiocontext to job lock, introduce > *nop* lock/unlock functions and macros. > We want to always call job_lock/unlock outside the AioContext locks, > and not vice-versa, otherwise we might get a deadlock. This is not > straightforward to do, and that's why we start with nop functions. > Once everything is protected by job_lock/unlock, we can change the nop into > an actual mutex and remove the aiocontext lock. > > Since job_mutex is already being used, add static > real_job_{lock/unlock} for the existing usage. > > Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com> > Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
I've already acked this (honestly, because Stefan do), but still, want to clarify: On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote: > job mutex will be used to protect the job struct elements and list, > replacing AioContext locks. > > Right now use a shared lock for all jobs, in order to keep things > simple. Once the AioContext lock is gone, we can introduce per-job > locks. > > To simplify the switch from aiocontext to job lock, introduce > *nop* lock/unlock functions and macros. > We want to always call job_lock/unlock outside the AioContext locks, > and not vice-versa, otherwise we might get a deadlock. Could you describe here, why we get a deadlock? As I understand, we'll deadlock if two code paths exist simultaneously: 1. we take job mutex under aiocontext lock 2. we take aiocontex lock under job mutex If these paths exists, it's possible that one thread goes through [1] and another through [2]. If thread [1] holds job-mutex and want to take aiocontext-lock, and in the same time thread [2] holds aiocontext-lock and want to take job-mutext, that's a dead-lock. If you say, that we must avoid [1], do you have in mind that we have [2] somewhere? If so, this should be mentioned here. If not, could we just make a normal mutex, not a noop? > This is not > straightforward to do, and that's why we start with nop functions. > Once everything is protected by job_lock/unlock, we can change the nop into > an actual mutex and remove the aiocontext lock. > > Since job_mutex is already being used, add static > real_job_{lock/unlock} for the existing usage. > > Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com> > Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy: > I've already acked this (honestly, because Stefan do), but still, want > to clarify: > > On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote: >> job mutex will be used to protect the job struct elements and list, >> replacing AioContext locks. >> >> Right now use a shared lock for all jobs, in order to keep things >> simple. Once the AioContext lock is gone, we can introduce per-job >> locks. >> >> To simplify the switch from aiocontext to job lock, introduce >> *nop* lock/unlock functions and macros. >> We want to always call job_lock/unlock outside the AioContext locks, >> and not vice-versa, otherwise we might get a deadlock. > > Could you describe here, why we get a deadlock? > > As I understand, we'll deadlock if two code paths exist simultaneously: > > 1. we take job mutex under aiocontext lock > 2. we take aiocontex lock under job mutex > > If these paths exists, it's possible that one thread goes through [1] > and another through [2]. If thread [1] holds job-mutex and want to take > aiocontext-lock, and in the same time thread [2] holds aiocontext-lock > and want to take job-mutext, that's a dead-lock. > > If you say, that we must avoid [1], do you have in mind that we have [2] > somewhere? If so, this should be mentioned here > > If not, could we just make a normal mutex, not a noop? Of course we have [2] somewhere, otherwise I wouldn't even think about creating a noop function. This idea came up in v1-v2. Regarding the specific case, I don't remember. But there are tons of functions that are acquiring the AioContext lock and then calling job_* API, such as job_cancel_sync in blockdev.c. I might use job_cancel_sync as example and write it in the commit message though. Thank you, Emanuele >> This is not >> straightforward to do, and that's why we start with nop functions. >> Once everything is protected by job_lock/unlock, we can change the nop >> into >> an actual mutex and remove the aiocontext lock. >> >> Since job_mutex is already being used, add static >> real_job_{lock/unlock} for the existing usage. >> >> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com> >> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com> > >
On 6/28/22 16:08, Emanuele Giuseppe Esposito wrote: > > > Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy: >> I've already acked this (honestly, because Stefan do), but still, want >> to clarify: >> >> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote: >>> job mutex will be used to protect the job struct elements and list, >>> replacing AioContext locks. >>> >>> Right now use a shared lock for all jobs, in order to keep things >>> simple. Once the AioContext lock is gone, we can introduce per-job >>> locks. >>> >>> To simplify the switch from aiocontext to job lock, introduce >>> *nop* lock/unlock functions and macros. >>> We want to always call job_lock/unlock outside the AioContext locks, >>> and not vice-versa, otherwise we might get a deadlock. >> >> Could you describe here, why we get a deadlock? >> >> As I understand, we'll deadlock if two code paths exist simultaneously: >> >> 1. we take job mutex under aiocontext lock >> 2. we take aiocontex lock under job mutex >> >> If these paths exists, it's possible that one thread goes through [1] >> and another through [2]. If thread [1] holds job-mutex and want to take >> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock >> and want to take job-mutext, that's a dead-lock. >> >> If you say, that we must avoid [1], do you have in mind that we have [2] >> somewhere? If so, this should be mentioned here >> >> If not, could we just make a normal mutex, not a noop? > > Of course we have [2] somewhere, otherwise I wouldn't even think about > creating a noop function. This idea came up in v1-v2. > > Regarding the specific case, I don't remember. But there are tons of > functions that are acquiring the AioContext lock and then calling job_* > API, such as job_cancel_sync in blockdev.c. > > I might use job_cancel_sync as example and write it in the commit > message though. > Yes, that's obvious that we have tons of [1]. That's why an example of [2] would be lot more valuable.
diff --git a/include/qemu/job.h b/include/qemu/job.h index c105b31076..d1192ffd61 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -303,6 +303,30 @@ typedef enum JobCreateFlags { JOB_MANUAL_DISMISS = 0x04, } JobCreateFlags; +extern QemuMutex job_mutex; + +#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */ + +#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */ + +/** + * job_lock: + * + * Take the mutex protecting the list of jobs and their status. + * Most functions called by the monitor need to call job_lock + * and job_unlock manually. On the other hand, function called + * by the block jobs themselves and by the block layer will take the + * lock for you. + */ +void job_lock(void); + +/** + * job_unlock: + * + * Release the mutex protecting the list of jobs and their status. + */ +void job_unlock(void); + /** * Allocate and return a new job transaction. Jobs can be added to the * transaction using job_txn_add_job(). diff --git a/job.c b/job.c index 075c6f3a20..2b4ffca9d4 100644 --- a/job.c +++ b/job.c @@ -32,6 +32,12 @@ #include "trace/trace-root.h" #include "qapi/qapi-events-job.h" +/* + * job_mutex protects the jobs list, but also makes the + * struct job fields thread-safe. + */ +QemuMutex job_mutex; + static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs); /* Job State Transition Table */ @@ -74,17 +80,22 @@ struct JobTxn { int refcnt; }; -/* Right now, this mutex is only needed to synchronize accesses to job->busy - * and job->sleep_timer, such as concurrent calls to job_do_yield and - * job_enter. */ -static QemuMutex job_mutex; +void job_lock(void) +{ + /* nop */ +} + +void job_unlock(void) +{ + /* nop */ +} -static void job_lock(void) +static void real_job_lock(void) { qemu_mutex_lock(&job_mutex); } -static void job_unlock(void) +static void real_job_unlock(void) { qemu_mutex_unlock(&job_mutex); } @@ -450,21 +461,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job)) return; } - job_lock(); + real_job_lock(); if (job->busy) { - job_unlock(); + real_job_unlock(); return; } if (fn && !fn(job)) { - job_unlock(); + real_job_unlock(); return; } assert(!job->deferred_to_main_loop); timer_del(&job->sleep_timer); job->busy = true; - job_unlock(); + real_job_unlock(); aio_co_enter(job->aio_context, job->co); } @@ -481,13 +492,13 @@ void job_enter(Job *job) * called explicitly. */ static void coroutine_fn job_do_yield(Job *job, uint64_t ns) { - job_lock(); + real_job_lock(); if (ns != -1) { timer_mod(&job->sleep_timer, ns); } job->busy = false; job_event_idle(job); - job_unlock(); + real_job_unlock(); qemu_coroutine_yield(); /* Set by job_enter_cond() before re-entering the coroutine. */