From patchwork Fri Oct 8 15:50:01 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yehuda Sadeh Weinraub X-Patchwork-Id: 67241 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 04A4FB70D4 for ; Sat, 9 Oct 2010 04:12:20 +1100 (EST) Received: from localhost ([127.0.0.1]:39033 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P4GDh-0008Jd-AW for incoming@patchwork.ozlabs.org; Fri, 08 Oct 2010 12:54:57 -0400 Received: from [140.186.70.92] (port=49212 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P4Fps-0002wJ-MU for qemu-devel@nongnu.org; Fri, 08 Oct 2010 12:30:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P4FCs-00083n-Tg for qemu-devel@nongnu.org; Fri, 08 Oct 2010 11:50:04 -0400 Received: from mail-wy0-f173.google.com ([74.125.82.173]:45886) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P4FCs-00083i-MB for qemu-devel@nongnu.org; Fri, 08 Oct 2010 11:50:02 -0400 Received: by wyb33 with SMTP id 33so91974wyb.4 for ; Fri, 08 Oct 2010 08:50:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=mmjAgcjSA6gglv1Qz28dNBXl3hEpSpFJ+GwHbajpR0g=; b=jeN1d4A7AsKN8U/9T7DdC4ASbOE3KcU0soNofrUJ/FAxvBJlfITNsJidVtwBCEsmH6 JoOCZVNB3YquBKg3vOlYR/eNkBe985U20QoDxoYzJEXbg9INRrKTmwNsKqhLwnSNZHRg zNSj7wY4t/OCCJ5lOr5iufNaPvL8l5vC1PGNg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=om18lBHKy/+l+J0G4xLL41JjGF2IYorFEERdaEOZ3wBpgWeQZBdFvH9r52DNO1ZPvH XayNQ9YjOI9D1KyzAdV3DS5p3IsoEem+mCQTQ+1sojiMiRT1rnX8KWa/IymQ2slISJpO CLq9bx3emKzBdkFepoIDnnjdD98epw++apQLE= MIME-Version: 1.0 Received: by 10.216.233.163 with SMTP id p35mr783578weq.98.1286553001671; Fri, 08 Oct 2010 08:50:01 -0700 (PDT) Received: by 10.216.160.77 with HTTP; Fri, 8 Oct 2010 08:50:01 -0700 (PDT) In-Reply-To: <4CAF255A.1030103@codemonkey.ws> References: <20100802194631.GA4923@chb-desktop> <20100803201407.GD1475@chb-desktop> <4CADD567.9010606@codemonkey.ws> <4CAE13BA.70707@codemonkey.ws> <4CAE24C5.8030007@codemonkey.ws> <4CAE35C5.2010809@codemonkey.ws> <4CAE41BD.2070508@codemonkey.ws> <4CAF255A.1030103@codemonkey.ws> Date: Fri, 8 Oct 2010 08:50:01 -0700 Message-ID: Subject: Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4) From: Yehuda Sadeh Weinraub To: Anthony Liguori X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) Cc: Kevin Wolf , kvm@vger.kernel.org, qemu-devel@nongnu.org, Sage Weil , ceph-devel@vger.kernel.org, Christian Brunner X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Fri, Oct 8, 2010 at 7:06 AM, Anthony Liguori wrote: > On 10/07/2010 05:45 PM, Sage Weil wrote: >> >> I'm sorry, I'm having a hard time understanding what it is you're >> objecting to, or what you would prefer, as there are two different things >> we're talking about here (callbacks and fd glue/pipes).  (Please bear with >> me as I am not a qemu expert!) >> >> The first is the aio completion.  You said a few messages back: >> >> >>> >>> It looks like you just use the eventfd to signal aio completion >>> callbacks.  A better way to do this would be to schedule a bottom half. >>> >> >> This is what we're doing.  The librados makes a callback to rbd.c's >> rbd_finish_aiocb(), which updates some internal rbd accounting and then >> calls qemu_bh_schedule().  Is that part right? >> > > No.  You're calling qemu_bh_schedule() in a separate thread in parallel to > other operations. Oh, that makes it more clean. Considering that we did it for kvm, and looking at the kvm qemu_bh_schedule() implementation, it does look thread safe (there might be an issue though with canceling the bh though, haven't looked at it, not really relevant). In any case, we already have the completion code with the pipes (currently eventfd) that runs in the qemu context, so just moving the bh scheduling there would work. Something like this (still needs to add some mutex around qemu_aio_count that was missing before): acb->aiocnt--; @@ -570,13 +572,10 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) acb->ret += r; } } - if (write(acb->s->efd, &buf, sizeof(buf)) < 0) + if (write(acb->s->efd, (void *)&acb, sizeof(&acb)) < 0) error_report("failed writing to acb->s->efd\n"); qemu_free(rcb); i = 0; - if (!acb->aiocnt && acb->bh) { - qemu_bh_schedule(acb->bh); - } } /* Callback when all queued rados_aio requests are complete */ @@ -584,7 +583,6 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) static void rbd_aio_bh_cb(void *opaque) { RBDAIOCB *acb = opaque; - uint64_t buf = 1; if (!acb->write) { qemu_iovec_from_buffer(acb->qiov, acb->bounce, acb->qiov->size); @@ -594,8 +592,6 @@ static void rbd_aio_bh_cb(void *opaque) qemu_bh_delete(acb->bh); acb->bh = NULL; - if (write(acb->s->efd, &buf, sizeof(buf)) < 0) - error_report("failed writing to acb->s->efd\n"); qemu_aio_release(acb); } @@ -644,7 +640,7 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs, last_segnr = ((off + size - 1) / s->objsize); acb->aiocnt = (last_segnr - segnr) + 1; - s->qemu_aio_count+=acb->aiocnt + 1; /* All the RADOSCB and the related RBDAIOCB */ + s->qemu_aio_count+=acb->aiocnt; /* All the RADOSCB and the related RBDAIOCB */ if (write && s->read_only) { acb->ret = -EROFS; diff --git a/block/rbd.c b/block/rbd.c index 13db079..164e547 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -315,13 +315,16 @@ done: static void rbd_aio_completion_cb(void *opaque) { BDRVRBDState *s = opaque; + RBDAIOCB *acb; - uint64_t val; ssize_t ret; do { - if ((ret = read(s->efd, &val, sizeof(val))) > 0) { - s->qemu_aio_count -= val; + if ((ret = read(s->efd, &acb, sizeof(acb))) > 0) { + s->qemu_aio_count --; + if (!acb->aiocnt && acb->bh) { + qemu_bh_schedule(acb->bh); + } } } while (ret < 0 && errno == EINTR); @@ -539,7 +542,6 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb) { RBDAIOCB *acb = rcb->acb; int64_t r; - uint64_t buf = 1; int i;