Message ID | 57435BB8.3000600@ilande.co.uk |
---|---|
State | New |
Headers | show |
On 23/05/16 20:36, Mark Cave-Ayland wrote: > On 23/05/16 13:54, Paolo Bonzini wrote: > >> scsi-block uses the block layer for reads and writes in order to avoid >> allocating bounce buffers as big as the transferred data. We know how >> to split a large transfer to multiple reads and writes, and thus we can >> use scsi-disk.c's existing code to do I/O in multiple chunks (for non-s/g >> SCSI hosts) or through the DMA helpers (for s/g SCSI hosts). >> >> Unfortunately, this has the side effect of eating the SCSI status except >> in the very few cases where we can convert an errno code back to a SCSI >> status. It puts a big wrench in persistent reservations support in the >> guest, for example. >> >> Luckily, splitting a large transfer into multiple SBC commands is just as >> easy, and this is what the last patch does. It takes the original CDB, >> patches in a modified starting sector and sector count, and executes the >> SCSI command through blk_aio_ioctl. It is also easy to pass a QEMUIOVector >> to SG_IO, so that s/g SCSI hosts keep the performance. >> >> This rebases the patches on top of Eric's changes for byte-based >> BlockBackend access and fixes a few bugs I knew about in the RFC. >> >> Patches 1, 5 and 6 are new. >> >> Paolo >> >> Paolo Bonzini (7): >> dma-helpers: change interface to byte-based >> dma-helpers: change BlockBackend to opaque value in DMAIOFunc >> scsi-disk: introduce a common base class >> scsi-disk: introduce dma_readv and dma_writev >> scsi-disk: add need_fua_emulation to SCSIDiskClass >> scsi-disk: introduce scsi_disk_req_check_error >> scsi-block: always use SG_IO >> >> dma-helpers.c | 54 +++++-- >> hw/block/nvme.c | 6 +- >> hw/ide/ahci.c | 6 +- >> hw/ide/core.c | 20 ++- >> hw/ide/internal.h | 6 +- >> hw/ide/macio.c | 2 +- >> hw/scsi/scsi-disk.c | 409 +++++++++++++++++++++++++++++++++++++-------------- >> include/sysemu/dma.h | 20 +-- >> trace-events | 2 +- >> 9 files changed, 371 insertions(+), 154 deletions(-) > > Hi Paolo, > > I thought I'd give this patchset a spin with a view to seeing whether I > could switch the macio device back to the now byte-based dma-helpers, > but came up with a couple of compile errors. Attached is the minor diff > I applied in order to get a successful compile (apologies for not being > inline, however I couldn't get my mail client to stop wrapping incorrectly). After some head-scratching, I've finally managed to install and boot Darwin PPC using the new byte-based DMA helpers facilitated by this patch in the macio IDE driver. Just switching over to the new helpers provided by this patch seemingly allowed the installation to proceed, but the resulting image was corrupt and refused to boot. I eventually traced the corruption down to this section of code in dma_blk_cb() which was incorrectly truncating the unaligned iovecs: if (dbs->iov.size & ~BDRV_SECTOR_MASK) { qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK); } This was introduced in http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to handle non-sector aligned SG lists, although given that this is a restriction of one particular implementation (PRDT) I'm not sure whether a plain revert is the correct thing to do or whether an alternative solution needs to be found. ATB, Mark.
On 25/05/2016 00:59, Mark Cave-Ayland wrote: > I eventually traced the corruption down to this section of code in > dma_blk_cb() which was incorrectly truncating the unaligned iovecs: > > if (dbs->iov.size & ~BDRV_SECTOR_MASK) { > qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & > ~BDRV_SECTOR_MASK); > } > > This was introduced in > http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to > handle non-sector aligned SG lists, although given that this is a > restriction of one particular implementation (PRDT) I'm not sure whether > a plain revert is the correct thing to do or whether an alternative > solution needs to be found. Yeah, I have a plan for this bit. It's related to this code I'm adding in patch 7: + /* This is not supported yet. It can only happen if the guest does + * reads and writes that are not aligned to one logical sectors + * _and_ cover multiple MemoryRegions. + */ + assert(offset % s->qdev.blocksize == 0); + assert(iov->size % s->qdev.blocksize == 0); The idea behind the "if" is that the I/O code cannot deal with a number of bytes that is not a multiple of the logical sector size. These assertions could be removed by generalizing the "if" to an arbitrary mask, in this case s->qdev.blocksize - 1. There are two things that are wrong however in the logic. First, the "if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)", because the call to qemu_iovec_discard_back can result in a zero-byte QEMUIOVector. Second, the sg_cur_* variables must be rewound too. Thanks, Paolo
On 25/05/16 09:45, Paolo Bonzini wrote: > On 25/05/2016 00:59, Mark Cave-Ayland wrote: >> I eventually traced the corruption down to this section of code in >> dma_blk_cb() which was incorrectly truncating the unaligned iovecs: >> >> if (dbs->iov.size & ~BDRV_SECTOR_MASK) { >> qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & >> ~BDRV_SECTOR_MASK); >> } >> >> This was introduced in >> http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg02236.html to >> handle non-sector aligned SG lists, although given that this is a >> restriction of one particular implementation (PRDT) I'm not sure whether >> a plain revert is the correct thing to do or whether an alternative >> solution needs to be found. > > Yeah, I have a plan for this bit. It's related to this code I'm adding > in patch 7: > > + /* This is not supported yet. It can only happen if the guest does > + * reads and writes that are not aligned to one logical sectors > + * _and_ cover multiple MemoryRegions. > + */ > + assert(offset % s->qdev.blocksize == 0); > + assert(iov->size % s->qdev.blocksize == 0); > > The idea behind the "if" is that the I/O code cannot deal with a number > of bytes that is not a multiple of the logical sector size. These > assertions could be removed by generalizing the "if" to an arbitrary > mask, in this case s->qdev.blocksize - 1. > > There are two things that are wrong however in the logic. First, the > "if" must be moved before the "if (dbs->iov.size & ~BDRV_SECTOR_MASK)", > because the call to qemu_iovec_discard_back can result in a zero-byte > QEMUIOVector. Second, the sg_cur_* variables must be rewound too. Great - as long as you're aware of this I'll wait for the next version of the patchset to come out, and then at least I can re-run the macio installation test to ensure that there are no further corruption issues with unaligned DMA accesses. I'd like to add my thanks to everyone who has worked hard on allowing byte-based accesses in the I/O layer since on my local test case it makes a significant improvement to the macio emulation. With my basic macio conversion applied on top of this patchset, the installation time for a complete Darwin PPC install drops from 25 mins to about 9 mins on my laptop which is a clear 2.5x speed increase over the existing code. ATB, Mark.
On 25/05/2016 11:13, Mark Cave-Ayland wrote: > Great - as long as you're aware of this I'll wait for the next version > of the patchset to come out, and then at least I can re-run the macio > installation test to ensure that there are no further corruption issues > with unaligned DMA accesses. If I get an ack for patches 1 and 2, I will just put these patches in a pull request (obviously with your fixes---I had forgotten to redo the git format-patch step). Thanks, Paolo
diff --git a/dma-helpers.c b/dma-helpers.c index 92c5d55..b521d84 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -192,7 +192,7 @@ static const AIOCBInfo dma_aiocb_info = { }; BlockAIOCB *dma_blk_io(AioContext *ctx, - QEMUSGList *sg, uint64_t opaque, + QEMUSGList *sg, uint64_t offset, DMAIOFunc *io_func, void *io_func_opaque, BlockCompletionFunc *cb, void *opaque, DMADirection dir) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 02faa3a..f7a23fc 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -540,7 +540,6 @@ static void scsi_write_data(SCSIRequest *req) if (r->req.sg) { dma_acct_start(s->qdev.conf.blk, &r->acct, r->req.sg, BLOCK_ACCT_WRITE); r->req.resid -= r->req.sg->size; - scsi_dma_complete, r); r->req.aiocb = dma_blk_io(blk_get_aio_context(s->qdev.conf.blk), r->req.sg, r->sector << BDRV_SECTOR_BITS, sdc->dma_writev, r, scsi_dma_complete, r,