[v5,0/6] blkdebug: fix racing condition when iterating on

Message ID	20210614082931.24925-1-eesposit@redhat.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Emanuele Giuseppe Esposito <eesposit@redhat.com> To: qemu-block@nongnu.org Subject: [PATCH v5 0/6] blkdebug: fix racing condition when iterating on Date: Mon, 14 Jun 2021 10:29:25 +0200 Message-Id: <20210614082931.24925-1-eesposit@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Received-SPF: pass client-ip=216.205.24.124; envelope-from=eesposit@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -29 X-Spam_score: -3.0 X-Spam_bar: --- X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.199, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: Kevin Wolf <kwolf@redhat.com>, Emanuele Giuseppe Esposito <eesposit@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Eric Blake <eblake@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	blkdebug: fix racing condition when iterating on \| expand [v5,0/6] blkdebug: fix racing condition when iterating on [v5,1/6] blkdebug: refactor removal of a suspended request [v5,2/6] blkdebug: move post-resume handling to resume_req_by_tag [v5,3/6] blkdebug: track all actions [v5,4/6] blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE [v5,5/6] block/blkdebug: remove new_state field and instead use a local variable [v5,6/6] blkdebug: protect rules and suspended_reqs with a lock

Message ID

20210614082931.24925-1-eesposit@redhat.com

Headers

From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: qemu-block@nongnu.org
Subject: [PATCH v5 0/6] blkdebug: fix racing condition when iterating on
Date: Mon, 14 Jun 2021 10:29:25 +0200
Message-Id: <20210614082931.24925-1-eesposit@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="US-ASCII"
Received-SPF: pass client-ip=216.205.24.124;
 envelope-from=eesposit@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -29
X-Spam_score: -3.0
X-Spam_bar: ---
X-Spam_report: (-3.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.199,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Kevin Wolf <kwolf@redhat.com>,
 Emanuele Giuseppe Esposito <eesposit@redhat.com>,
 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
 qemu-devel@nongnu.org,
 Max Reitz <mreitz@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
 Eric Blake <eblake@redhat.com>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

Series

blkdebug: fix racing condition when iterating on | expand

Message

Emanuele Giuseppe Esposito June 14, 2021, 8:29 a.m. UTC

When qemu_coroutine_enter is executed in a loop
(even QEMU_FOREACH_SAFE), the new routine can modify the list,
for example removing an element, causing problem when control
is given back to the caller that continues iterating on the same list. 

Patch 1 solves the issue in blkdebug_debug_resume by restarting
the list walk after every coroutine_enter if list has to be fully iterated.
Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
all actions that the rules make in a counter and invoking 
the respective coroutine_yeld only after processing all requests.

Patch 5-6 are somewhat independent of the others, patch 5 removes the need
of new_state field, and patch 6 adds a lock to
protect rules and suspended_reqs; right now everything works because
it's protected by the AioContext lock.
This is a preparation for the current proposal of removing the AioContext
lock and instead using smaller granularity locks to allow multiple
iothread execution in the same block device.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
v5:
* Add comment in patch 1 to explain why we don't need _SAFE in for loop
* Move the state update (s->state = new_state) in patch 5, to maintain
  the same existing effect in all patches

Emanuele Giuseppe Esposito (6):
  blkdebug: refactor removal of a suspended request
  blkdebug: move post-resume handling to resume_req_by_tag
  blkdebug: track all actions
  blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
  block/blkdebug: remove new_state field and instead use a local
    variable
  blkdebug: protect rules and suspended_reqs with a lock

 block/blkdebug.c | 136 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 92 insertions(+), 44 deletions(-)

Comments

Max Reitz July 15, 2021, 10:06 a.m. UTC | #1

On 14.06.21 10:29, Emanuele Giuseppe Esposito wrote:
> When qemu_coroutine_enter is executed in a loop
> (even QEMU_FOREACH_SAFE), the new routine can modify the list,
> for example removing an element, causing problem when control
> is given back to the caller that continues iterating on the same list.
>
> Patch 1 solves the issue in blkdebug_debug_resume by restarting
> the list walk after every coroutine_enter if list has to be fully iterated.
> Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
> all actions that the rules make in a counter and invoking
> the respective coroutine_yeld only after processing all requests.
>
> Patch 5-6 are somewhat independent of the others, patch 5 removes the need
> of new_state field, and patch 6 adds a lock to
> protect rules and suspended_reqs; right now everything works because
> it's protected by the AioContext lock.
> This is a preparation for the current proposal of removing the AioContext
> lock and instead using smaller granularity locks to allow multiple
> iothread execution in the same block device.
>
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> v5:
> * Add comment in patch 1 to explain why we don't need _SAFE in for loop
> * Move the state update (s->state = new_state) in patch 5, to maintain
>    the same existing effect in all patches

I’m not sure whether this actually fixes a user-visible bug…?  The first 
paragraph makes it sound like it, but there is no test, so I’m not sure.

I’m mostly asking because of freeze; but you make it sound like there’s 
a bug, and as this only concerns blkdebug (i.e., a block driver used 
only for testing), I feel like applying this series after soft freeze 
should be fine, so:

Thanks, I’ve applied this series to my block branch:

https://github.com/XanClic/qemu/commits/block

Max