diff mbox series

[v7,01/18] job.c: make job_mutex and job_lock/unlock() public

Message ID 20220616131835.2004262-2-eesposit@redhat.com
State New
Headers show
Series job: replace AioContext lock with job_mutex | expand

Commit Message

Emanuele Giuseppe Esposito June 16, 2022, 1:18 p.m. UTC
job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros.
We want to always call job_lock/unlock outside the AioContext locks,
and not vice-versa, otherwise we might get a deadlock. This is not
straightforward to do, and that's why we start with nop functions.
Once everything is protected by job_lock/unlock, we can change the nop into
an actual mutex and remove the aiocontext lock.

Since job_mutex is already being used, add static
real_job_{lock/unlock} for the existing usage.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/job.h | 24 ++++++++++++++++++++++++
 job.c              | 35 +++++++++++++++++++++++------------
 2 files changed, 47 insertions(+), 12 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy June 21, 2022, 1:47 p.m. UTC | #1
On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock. This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Vladimir Sementsov-Ogievskiy June 24, 2022, 6:22 p.m. UTC | #2
I've already acked this (honestly, because Stefan do), but still, want to clarify:

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock.

Could you describe here, why we get a deadlock?

As I understand, we'll deadlock if two code paths exist simultaneously:

1. we take job mutex under aiocontext lock
2. we take aiocontex lock under job mutex

If these paths exists, it's possible that one thread goes through [1] and another through [2]. If thread [1] holds job-mutex and want to take aiocontext-lock, and in the same time thread [2] holds aiocontext-lock and want to take job-mutext, that's a dead-lock.

If you say, that we must avoid [1], do you have in mind that we have [2] somewhere? If so, this should be mentioned here.

If not, could we just make a normal mutex, not a noop?

> This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
Emanuele Giuseppe Esposito June 28, 2022, 1:08 p.m. UTC | #3
Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
> I've already acked this (honestly, because Stefan do), but still, want
> to clarify:
> 
> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>> job mutex will be used to protect the job struct elements and list,
>> replacing AioContext locks.
>>
>> Right now use a shared lock for all jobs, in order to keep things
>> simple. Once the AioContext lock is gone, we can introduce per-job
>> locks.
>>
>> To simplify the switch from aiocontext to job lock, introduce
>> *nop*  lock/unlock functions and macros.
>> We want to always call job_lock/unlock outside the AioContext locks,
>> and not vice-versa, otherwise we might get a deadlock.
> 
> Could you describe here, why we get a deadlock?
> 
> As I understand, we'll deadlock if two code paths exist simultaneously:
> 
> 1. we take job mutex under aiocontext lock
> 2. we take aiocontex lock under job mutex
> 
> If these paths exists, it's possible that one thread goes through [1]
> and another through [2]. If thread [1] holds job-mutex and want to take
> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
> and want to take job-mutext, that's a dead-lock.
> 
> If you say, that we must avoid [1], do you have in mind that we have [2]
> somewhere? If so, this should be mentioned here
> 
> If not, could we just make a normal mutex, not a noop?

Of course we have [2] somewhere, otherwise I wouldn't even think about
creating a noop function. This idea came up in v1-v2.

Regarding the specific case, I don't remember. But there are tons of
functions that are acquiring the AioContext lock and then calling job_*
API, such as job_cancel_sync in blockdev.c.

I might use job_cancel_sync as example and write it in the commit
message though.

Thank you,
Emanuele
>> This is not
>> straightforward to do, and that's why we start with nop functions.
>> Once everything is protected by job_lock/unlock, we can change the nop
>> into
>> an actual mutex and remove the aiocontext lock.
>>
>> Since job_mutex is already being used, add static
>> real_job_{lock/unlock} for the existing usage.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
> 
>
Vladimir Sementsov-Ogievskiy June 28, 2022, 3:20 p.m. UTC | #4
On 6/28/22 16:08, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
>> I've already acked this (honestly, because Stefan do), but still, want
>> to clarify:
>>
>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>> job mutex will be used to protect the job struct elements and list,
>>> replacing AioContext locks.
>>>
>>> Right now use a shared lock for all jobs, in order to keep things
>>> simple. Once the AioContext lock is gone, we can introduce per-job
>>> locks.
>>>
>>> To simplify the switch from aiocontext to job lock, introduce
>>> *nop*  lock/unlock functions and macros.
>>> We want to always call job_lock/unlock outside the AioContext locks,
>>> and not vice-versa, otherwise we might get a deadlock.
>>
>> Could you describe here, why we get a deadlock?
>>
>> As I understand, we'll deadlock if two code paths exist simultaneously:
>>
>> 1. we take job mutex under aiocontext lock
>> 2. we take aiocontex lock under job mutex
>>
>> If these paths exists, it's possible that one thread goes through [1]
>> and another through [2]. If thread [1] holds job-mutex and want to take
>> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
>> and want to take job-mutext, that's a dead-lock.
>>
>> If you say, that we must avoid [1], do you have in mind that we have [2]
>> somewhere? If so, this should be mentioned here
>>
>> If not, could we just make a normal mutex, not a noop?
> 
> Of course we have [2] somewhere, otherwise I wouldn't even think about
> creating a noop function. This idea came up in v1-v2.
> 
> Regarding the specific case, I don't remember. But there are tons of
> functions that are acquiring the AioContext lock and then calling job_*
> API, such as job_cancel_sync in blockdev.c.
> 
> I might use job_cancel_sync as example and write it in the commit
> message though.
> 

Yes, that's obvious that we have tons of [1]. That's why an example of [2] would be lot more valuable.
diff mbox series

Patch

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c105b31076..d1192ffd61 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -303,6 +303,30 @@  typedef enum JobCreateFlags {
     JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+extern QemuMutex job_mutex;
+
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 075c6f3a20..2b4ffca9d4 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@ 
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,22 @@  struct JobTxn {
     int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+void job_lock(void)
+{
+    /* nop */
+}
+
+void job_unlock(void)
+{
+    /* nop */
+}
 
-static void job_lock(void)
+static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void real_job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -450,21 +461,21 @@  void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    job_lock();
+    real_job_lock();
     if (job->busy) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    job_unlock();
+    real_job_unlock();
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -481,13 +492,13 @@  void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    job_lock();
+    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    job_unlock();
+    real_job_unlock();
     qemu_coroutine_yield();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */