[v7,01/18] job.c: make job_mutex and job_lock/unlock() public

Message ID	20220616131835.2004262-2-eesposit@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Emanuele Giuseppe Esposito <eesposit@redhat.com> To: qemu-block@nongnu.org Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>, Markus Armbruster <armbru@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, qemu-devel@nongnu.org, Emanuele Giuseppe Esposito <eesposit@redhat.com> Subject: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public Date: Thu, 16 Jun 2022 09:18:18 -0400 Message-Id: <20220616131835.2004262-2-eesposit@redhat.com> In-Reply-To: <20220616131835.2004262-1-eesposit@redhat.com> References: <20220616131835.2004262-1-eesposit@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.129.124; envelope-from=eesposit@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	job: replace AioContext lock with job_mutex \| expand [v7,00/18] job: replace AioContext lock with job_mutex [v7,01/18] job.c: make job_mutex and job_lock/unlock() public [v7,02/18] job.h: categorize fields in struct Job [v7,03/18] job.c: API functions not used outside should be static [v7,04/18] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED [v7,05/18] job.h: add _locked duplicates for job API functions called with and without job_mutex [v7,06/18] jobs: protect jobs with job_lock/unlock [v7,07/18] jobs: add job lock in find_* functions [v7,08/18] jobs: use job locks also in the unit tests [v7,09/18] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU [v7,10/18] jobs: rename static functions called with job_mutex held [v7,11/18] job.h: rename job API functions called with job_mutex held [v7,12/18] block_job: rename block_job functions called with job_mutex held [v7,13/18] job.h: define unlocked functions [v7,14/18] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aioconte… [v7,15/18] job: detect change of aiocontext within job coroutine [v7,16/18] jobs: protect job.aio_context with BQL and job_mutex [v7,17/18] job.c: enable job lock/unlock and remove Aiocontext locks [v7,18/18] block_job_query: remove atomic read

Message ID

20220616131835.2004262-2-eesposit@redhat.com

State

New

Headers

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
 Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
 Wen Congyang <wencongyang2@huawei.com>,
 Xie Changlong <xiechanglong.d@gmail.com>,
 Markus Armbruster <armbru@redhat.com>,
 Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>,
 qemu-devel@nongnu.org, Emanuele Giuseppe Esposito <eesposit@redhat.com>
Subject: [PATCH v7 01/18] job.c: make job_mutex and job_lock/unlock() public
Date: Thu, 16 Jun 2022 09:18:18 -0400
Message-Id: <20220616131835.2004262-2-eesposit@redhat.com>
In-Reply-To: <20220616131835.2004262-1-eesposit@redhat.com>
References: <20220616131835.2004262-1-eesposit@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=170.10.129.124;
 envelope-from=eesposit@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -21
X-Spam_score: -2.2
X-Spam_bar: --
X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Series

job: replace AioContext lock with job_mutex | expand

Commit Message

Emanuele Giuseppe Esposito June 16, 2022, 1:18 p.m. UTC

job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros.
We want to always call job_lock/unlock outside the AioContext locks,
and not vice-versa, otherwise we might get a deadlock. This is not
straightforward to do, and that's why we start with nop functions.
Once everything is protected by job_lock/unlock, we can change the nop into
an actual mutex and remove the aiocontext lock.

Since job_mutex is already being used, add static
real_job_{lock/unlock} for the existing usage.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/job.h | 24 ++++++++++++++++++++++++
 job.c              | 35 +++++++++++++++++++++++------------
 2 files changed, 47 insertions(+), 12 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy June 21, 2022, 1:47 p.m. UTC | #1

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock. This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Vladimir Sementsov-Ogievskiy June 24, 2022, 6:22 p.m. UTC | #2

I've already acked this (honestly, because Stefan do), but still, want to clarify:

On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
> job mutex will be used to protect the job struct elements and list,
> replacing AioContext locks.
> 
> Right now use a shared lock for all jobs, in order to keep things
> simple. Once the AioContext lock is gone, we can introduce per-job
> locks.
> 
> To simplify the switch from aiocontext to job lock, introduce
> *nop*  lock/unlock functions and macros.
> We want to always call job_lock/unlock outside the AioContext locks,
> and not vice-versa, otherwise we might get a deadlock.

Could you describe here, why we get a deadlock?

As I understand, we'll deadlock if two code paths exist simultaneously:

1. we take job mutex under aiocontext lock
2. we take aiocontex lock under job mutex

If these paths exists, it's possible that one thread goes through [1] and another through [2]. If thread [1] holds job-mutex and want to take aiocontext-lock, and in the same time thread [2] holds aiocontext-lock and want to take job-mutext, that's a dead-lock.

If you say, that we must avoid [1], do you have in mind that we have [2] somewhere? If so, this should be mentioned here.

If not, could we just make a normal mutex, not a noop?

> This is not
> straightforward to do, and that's why we start with nop functions.
> Once everything is protected by job_lock/unlock, we can change the nop into
> an actual mutex and remove the aiocontext lock.
> 
> Since job_mutex is already being used, add static
> real_job_{lock/unlock} for the existing usage.
> 
> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>

Emanuele Giuseppe Esposito June 28, 2022, 1:08 p.m. UTC | #3

Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
> I've already acked this (honestly, because Stefan do), but still, want
> to clarify:
> 
> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>> job mutex will be used to protect the job struct elements and list,
>> replacing AioContext locks.
>>
>> Right now use a shared lock for all jobs, in order to keep things
>> simple. Once the AioContext lock is gone, we can introduce per-job
>> locks.
>>
>> To simplify the switch from aiocontext to job lock, introduce
>> *nop*  lock/unlock functions and macros.
>> We want to always call job_lock/unlock outside the AioContext locks,
>> and not vice-versa, otherwise we might get a deadlock.
> 
> Could you describe here, why we get a deadlock?
> 
> As I understand, we'll deadlock if two code paths exist simultaneously:
> 
> 1. we take job mutex under aiocontext lock
> 2. we take aiocontex lock under job mutex
> 
> If these paths exists, it's possible that one thread goes through [1]
> and another through [2]. If thread [1] holds job-mutex and want to take
> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
> and want to take job-mutext, that's a dead-lock.
> 
> If you say, that we must avoid [1], do you have in mind that we have [2]
> somewhere? If so, this should be mentioned here
> 
> If not, could we just make a normal mutex, not a noop?

Of course we have [2] somewhere, otherwise I wouldn't even think about
creating a noop function. This idea came up in v1-v2.

Regarding the specific case, I don't remember. But there are tons of
functions that are acquiring the AioContext lock and then calling job_*
API, such as job_cancel_sync in blockdev.c.

I might use job_cancel_sync as example and write it in the commit
message though.

Thank you,
Emanuele
>> This is not
>> straightforward to do, and that's why we start with nop functions.
>> Once everything is protected by job_lock/unlock, we can change the nop
>> into
>> an actual mutex and remove the aiocontext lock.
>>
>> Since job_mutex is already being used, add static
>> real_job_{lock/unlock} for the existing usage.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito<eesposit@redhat.com>
>> Reviewed-by: Stefan Hajnoczi<stefanha@redhat.com>
> 
>

Vladimir Sementsov-Ogievskiy June 28, 2022, 3:20 p.m. UTC | #4

On 6/28/22 16:08, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 24/06/2022 um 20:22 schrieb Vladimir Sementsov-Ogievskiy:
>> I've already acked this (honestly, because Stefan do), but still, want
>> to clarify:
>>
>> On 6/16/22 16:18, Emanuele Giuseppe Esposito wrote:
>>> job mutex will be used to protect the job struct elements and list,
>>> replacing AioContext locks.
>>>
>>> Right now use a shared lock for all jobs, in order to keep things
>>> simple. Once the AioContext lock is gone, we can introduce per-job
>>> locks.
>>>
>>> To simplify the switch from aiocontext to job lock, introduce
>>> *nop*  lock/unlock functions and macros.
>>> We want to always call job_lock/unlock outside the AioContext locks,
>>> and not vice-versa, otherwise we might get a deadlock.
>>
>> Could you describe here, why we get a deadlock?
>>
>> As I understand, we'll deadlock if two code paths exist simultaneously:
>>
>> 1. we take job mutex under aiocontext lock
>> 2. we take aiocontex lock under job mutex
>>
>> If these paths exists, it's possible that one thread goes through [1]
>> and another through [2]. If thread [1] holds job-mutex and want to take
>> aiocontext-lock, and in the same time thread [2] holds aiocontext-lock
>> and want to take job-mutext, that's a dead-lock.
>>
>> If you say, that we must avoid [1], do you have in mind that we have [2]
>> somewhere? If so, this should be mentioned here
>>
>> If not, could we just make a normal mutex, not a noop?
> 
> Of course we have [2] somewhere, otherwise I wouldn't even think about
> creating a noop function. This idea came up in v1-v2.
> 
> Regarding the specific case, I don't remember. But there are tons of
> functions that are acquiring the AioContext lock and then calling job_*
> API, such as job_cancel_sync in blockdev.c.
> 
> I might use job_cancel_sync as example and write it in the commit
> message though.
> 

Yes, that's obvious that we have tons of [1]. That's why an example of [2] would be lot more valuable.

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c105b31076..d1192ffd61 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -303,6 +303,30 @@  typedef enum JobCreateFlags {
     JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+extern QemuMutex job_mutex;
+
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 075c6f3a20..2b4ffca9d4 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@ 
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,22 @@  struct JobTxn {
     int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+void job_lock(void)
+{
+    /* nop */
+}
+
+void job_unlock(void)
+{
+    /* nop */
+}
 
-static void job_lock(void)
+static void real_job_lock(void)
 {
     qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void real_job_unlock(void)
 {
     qemu_mutex_unlock(&job_mutex);
 }
@@ -450,21 +461,21 @@  void job_enter_cond(Job *job, bool(*fn)(Job *job))
         return;
     }
 
-    job_lock();
+    real_job_lock();
     if (job->busy) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     if (fn && !fn(job)) {
-        job_unlock();
+        real_job_unlock();
         return;
     }
 
     assert(!job->deferred_to_main_loop);
     timer_del(&job->sleep_timer);
     job->busy = true;
-    job_unlock();
+    real_job_unlock();
     aio_co_enter(job->aio_context, job->co);
 }
 
@@ -481,13 +492,13 @@  void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-    job_lock();
+    real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
     }
     job->busy = false;
     job_event_idle(job);
-    job_unlock();
+    real_job_unlock();
     qemu_coroutine_yield();
 
     /* Set by job_enter_cond() before re-entering the coroutine.  */

[v7,01/18] job.c: make job_mutex and job_lock/unlock() public

Commit Message

Comments

Patch