Message ID | 575FB9F9.4000003@kaod.org |
---|---|
State | New |
Headers | show |
Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben: > >> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, > >> bytes=512, qiov=0x7fa7f47fee60, flags=0) > >> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 > >> #5 0x00007fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, bytes=512, qiov=0x7fa80d5191c0, > >> flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 4278124256), flags@entry=(unknown: 0)) > >> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492 > > > > That 'flags' value looks bogus... > > > >> #6 0x00007fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, offset=30878208, bytes=256, qiov=0x7fa80d5191c0, > >> flags=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788 > >> #7 0x00007fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0) > >> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977 > >> #8 0x00007fa81c6c823a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) > >> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78 > >> #9 0x00007fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > > > and we don't get anything further in the backtrace beyond coroutines, to > > see who's sending the bad parameters. I recently debugged a bogus flags > > in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are > > used in blk_aio_prwv(): > > > > https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html > > > > I've just posted v2 of that patch (now a 2/2 series), but in v2 no > > longer kept the assert at that point. But maybe the correct fix, and/or > > the hack for catching the bug prior to coroutines, will help you debug > > where the bad arguments are coming from. > > That does not fix the assert. > > >> #10 0x00007fa80d5189d0 in ?? () > >> #11 0x0000000000000000 in ?? () > >> (gdb) up 4 > >> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, > >> bytes=512, qiov=0x7fa7f47fee60, flags=0) > >> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 > >> 1243 assert(!qiov || bytes == qiov->size); > >> (gdb) p *qiov > >> $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256} > > So, it seems that the issue is coming from the fact that bdrv_co_pwritev() > does not handle alignments less than BDRV_SECTOR_SIZE : > > /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ > uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); > > It calls bdrv_aligned_pwritev() which does the assert : > > assert(!qiov || bytes == qiov->size); Yes, but between these two places, there is code that should actually enforce the right alignment: if ((offset + bytes) & (align - 1)) { ... } You can see in your backtrace that bdrv_aligned_pwritev() gets a different qiov than bdrv_co_pwritev() (which is local_qiov in the latter function). It's just unclear to me why this code extended bytes, but didn't add the tail_buf iovec to local_qiov. > This is because flash_sync_page(), in m25p80.c, now writes with a len of > 0x100, which the page size in flash modules. commit 243e6f69c129 > ("m25p80: Switch to byte-based block access") removed the alignment on > BDRV_SECTOR_SIZE. > > So I think the safest is to restore the alignment on writes. see below. > If this is ok, I will send a little serie of fixes for m25p80 with this > one included. No, bdrv_co_pwritev() is supposed to handle sub-sector writes correctly, so this is what we need to fix. Kevin
On 06/14/2016 10:38 AM, Kevin Wolf wrote: > Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben: >>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, >>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 >>>> #5 0x00007fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, bytes=512, qiov=0x7fa80d5191c0, >>>> flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 4278124256), flags@entry=(unknown: 0)) >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492 >>> >>> That 'flags' value looks bogus... >>> >>>> #6 0x00007fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, offset=30878208, bytes=256, qiov=0x7fa80d5191c0, >>>> flags=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788 >>>> #7 0x00007fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0) >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977 >>>> #8 0x00007fa81c6c823a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78 >>>> #9 0x00007fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >>> >>> and we don't get anything further in the backtrace beyond coroutines, to >>> see who's sending the bad parameters. I recently debugged a bogus flags >>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are >>> used in blk_aio_prwv(): >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html >>> >>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no >>> longer kept the assert at that point. But maybe the correct fix, and/or >>> the hack for catching the bug prior to coroutines, will help you debug >>> where the bad arguments are coming from. >> >> That does not fix the assert. >> >>>> #10 0x00007fa80d5189d0 in ?? () >>>> #11 0x0000000000000000 in ?? () >>>> (gdb) up 4 >>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, >>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 >>>> 1243 assert(!qiov || bytes == qiov->size); >>>> (gdb) p *qiov >>>> $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256} >> >> So, it seems that the issue is coming from the fact that bdrv_co_pwritev() >> does not handle alignments less than BDRV_SECTOR_SIZE : >> >> /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ >> uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); >> >> It calls bdrv_aligned_pwritev() which does the assert : >> >> assert(!qiov || bytes == qiov->size); > > Yes, but between these two places, there is code that should actually > enforce the right alignment: > > if ((offset + bytes) & (align - 1)) { > ... > } > > You can see in your backtrace that bdrv_aligned_pwritev() gets a > different qiov than bdrv_co_pwritev() (which is local_qiov in the latter > function). > > It's just unclear to me why this code extended bytes, but didn't add the > tail_buf iovec to local_qiov. The gdb backtrace is bogus. It does not make sense. May be a gdb issue with multithread on jessie. In the path tracking the tail bytes, we have : if ((offset + bytes) & (align - 1)) { ... tail_bytes = (offset + bytes) & (align - 1); qemu_iovec_add(&local_qiov, tail_buf + tail_bytes, align - tail_bytes); bytes = ROUND_UP(bytes, align); } This is where the issue is I think. The qiov holds 256 and bytes 512. I have no idea how to fix that though. Thanks, C.
Am 14.06.2016 um 18:02 hat Cédric Le Goater geschrieben: > On 06/14/2016 10:38 AM, Kevin Wolf wrote: > > Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben: > >>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, > >>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) > >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 > >>>> #5 0x00007fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, bytes=512, qiov=0x7fa80d5191c0, > >>>> flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 4278124256), flags@entry=(unknown: 0)) > >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492 > >>> > >>> That 'flags' value looks bogus... > >>> > >>>> #6 0x00007fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, offset=30878208, bytes=256, qiov=0x7fa80d5191c0, > >>>> flags=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788 > >>>> #7 0x00007fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0) > >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977 > >>>> #8 0x00007fa81c6c823a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) > >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78 > >>>> #9 0x00007fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > >>> > >>> and we don't get anything further in the backtrace beyond coroutines, to > >>> see who's sending the bad parameters. I recently debugged a bogus flags > >>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are > >>> used in blk_aio_prwv(): > >>> > >>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html > >>> > >>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no > >>> longer kept the assert at that point. But maybe the correct fix, and/or > >>> the hack for catching the bug prior to coroutines, will help you debug > >>> where the bad arguments are coming from. > >> > >> That does not fix the assert. > >> > >>>> #10 0x00007fa80d5189d0 in ?? () > >>>> #11 0x0000000000000000 in ?? () > >>>> (gdb) up 4 > >>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, > >>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) > >>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 > >>>> 1243 assert(!qiov || bytes == qiov->size); > >>>> (gdb) p *qiov > >>>> $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256} > >> > >> So, it seems that the issue is coming from the fact that bdrv_co_pwritev() > >> does not handle alignments less than BDRV_SECTOR_SIZE : > >> > >> /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ > >> uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); > >> > >> It calls bdrv_aligned_pwritev() which does the assert : > >> > >> assert(!qiov || bytes == qiov->size); > > > > Yes, but between these two places, there is code that should actually > > enforce the right alignment: > > > > if ((offset + bytes) & (align - 1)) { > > ... > > } > > > > You can see in your backtrace that bdrv_aligned_pwritev() gets a > > different qiov than bdrv_co_pwritev() (which is local_qiov in the latter > > function). > > > > It's just unclear to me why this code extended bytes, but didn't add the > > tail_buf iovec to local_qiov. > > The gdb backtrace is bogus. It does not make sense. May be a gdb issue > with multithread on jessie. > > In the path tracking the tail bytes, we have : > > if ((offset + bytes) & (align - 1)) { > ... if (!use_local_qiov) { qemu_iovec_init(&local_qiov, qiov->niov + 1); qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size); use_local_qiov = true; } > tail_bytes = (offset + bytes) & (align - 1); > qemu_iovec_add(&local_qiov, tail_buf + tail_bytes, align - tail_bytes); > > bytes = ROUND_UP(bytes, align); > } > > This is where the issue is I think. The qiov holds 256 and bytes 512. > > I have no idea how to fix that though. Added some more context above. qiov->size as passed from the device is already 256 bytes, which are added to local_qiov with qemu_iovec_concat(). And then we add another 256 from tail_buf in the lines that you quoted, so in theory we should end up with a properly aligned 256 + 256 = 512 byte qiov. Kevin
On 06/15/2016 09:57 AM, Kevin Wolf wrote: > Am 14.06.2016 um 18:02 hat Cédric Le Goater geschrieben: >> On 06/14/2016 10:38 AM, Kevin Wolf wrote: >>> Am 14.06.2016 um 10:02 hat Cédric Le Goater geschrieben: >>>>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, >>>>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) >>>>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 >>>>>> #5 0x00007fa81c669ecb in bdrv_co_pwritev (bs=0x7fa81d4dd050, offset=8, bytes=512, qiov=0x7fa80d5191c0, >>>>>> flags=(BDRV_REQ_COPY_ON_READ | BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_SERIALISING | BDRV_REQ_FUA | unknown: 4278124256), flags@entry=(unknown: 0)) >>>>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1492 >>>>> >>>>> That 'flags' value looks bogus... >>>>> >>>>>> #6 0x00007fa81c65e367 in blk_co_pwritev (blk=0x7fa81d4c5b60, offset=30878208, bytes=256, qiov=0x7fa80d5191c0, >>>>>> flags=(unknown: 0)) at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:788 >>>>>> #7 0x00007fa81c65e49b in blk_aio_write_entry (opaque=0x7fa7e849aca0) >>>>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/block-backend.c:977 >>>>>> #8 0x00007fa81c6c823a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) >>>>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/util/coroutine-ucontext.c:78 >>>>>> #9 0x00007fa818ea8f00 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 >>>>> >>>>> and we don't get anything further in the backtrace beyond coroutines, to >>>>> see who's sending the bad parameters. I recently debugged a bogus flags >>>>> in bdrv_aio_preadv, by hoisting an assert to occur before coroutines are >>>>> used in blk_aio_prwv(): >>>>> >>>>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg02948.html >>>>> >>>>> I've just posted v2 of that patch (now a 2/2 series), but in v2 no >>>>> longer kept the assert at that point. But maybe the correct fix, and/or >>>>> the hack for catching the bug prior to coroutines, will help you debug >>>>> where the bad arguments are coming from. >>>> >>>> That does not fix the assert. >>>> >>>>>> #10 0x00007fa80d5189d0 in ?? () >>>>>> #11 0x0000000000000000 in ?? () >>>>>> (gdb) up 4 >>>>>> #4 0x00007fa81c6694ac in bdrv_aligned_pwritev (bs=0x7fa81d4dd050, req=<optimized out>, offset=30878208, >>>>>> bytes=512, qiov=0x7fa7f47fee60, flags=0) >>>>>> at /home/legoater/work/qemu/qemu-ast2400-mainline.git/block/io.c:1243 >>>>>> 1243 assert(!qiov || bytes == qiov->size); >>>>>> (gdb) p *qiov >>>>>> $1 = {iov = 0x7fa81da671d0, niov = 1, nalloc = 1, size = 256} >>>> >>>> So, it seems that the issue is coming from the fact that bdrv_co_pwritev() >>>> does not handle alignments less than BDRV_SECTOR_SIZE : >>>> >>>> /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */ >>>> uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment); >>>> >>>> It calls bdrv_aligned_pwritev() which does the assert : >>>> >>>> assert(!qiov || bytes == qiov->size); >>> >>> Yes, but between these two places, there is code that should actually >>> enforce the right alignment: >>> >>> if ((offset + bytes) & (align - 1)) { >>> ... >>> } >>> >>> You can see in your backtrace that bdrv_aligned_pwritev() gets a >>> different qiov than bdrv_co_pwritev() (which is local_qiov in the latter >>> function). >>> >>> It's just unclear to me why this code extended bytes, but didn't add the >>> tail_buf iovec to local_qiov. >> >> The gdb backtrace is bogus. It does not make sense. May be a gdb issue >> with multithread on jessie. >> >> In the path tracking the tail bytes, we have : >> >> if ((offset + bytes) & (align - 1)) { >> ... > if (!use_local_qiov) { > qemu_iovec_init(&local_qiov, qiov->niov + 1); > qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size); > use_local_qiov = true; > } >> tail_bytes = (offset + bytes) & (align - 1); >> qemu_iovec_add(&local_qiov, tail_buf + tail_bytes, align - tail_bytes); >> >> bytes = ROUND_UP(bytes, align); >> } >> >> This is where the issue is I think. The qiov holds 256 and bytes 512. >> >> I have no idea how to fix that though. > > Added some more context above. qiov->size as passed from the device is > already 256 bytes, which are added to local_qiov with > qemu_iovec_concat(). And then we add another 256 from tail_buf in the > lines that you quoted, so in theory we should end up with a properly > aligned 256 + 256 = 512 byte qiov. yes. It seems that qiov is bogus after the bdrv_aligned_preadv() call. It gets zeroed most of the time, sometime ->size is 1, and then qemu_iovec_concat() constructs an awful local_qiov, which brings the assert in bdrv_aligned_pwritev() How's that possible ? Could it be a serialization issue ? C.
Index: qemu-ast2400-mainline.git/hw/block/m25p80.c =================================================================== --- qemu-ast2400-mainline.git.orig/hw/block/m25p80.c +++ qemu-ast2400-mainline.git/hw/block/m25p80.c @@ -359,21 +359,25 @@ static void blk_sync_complete(void *opaq static void flash_sync_page(Flash *s, int page) { + int blk_sector, nb_sectors; QEMUIOVector iov; if (!s->blk || blk_is_read_only(s->blk)) { return; } + blk_sector = (page * s->pi->page_size) / BDRV_SECTOR_SIZE; + nb_sectors = DIV_ROUND_UP(s->pi->page_size, BDRV_SECTOR_SIZE); qemu_iovec_init(&iov, 1); - qemu_iovec_add(&iov, s->storage + page * s->pi->page_size, - s->pi->page_size); - blk_aio_pwritev(s->blk, page * s->pi->page_size, &iov, 0, + qemu_iovec_add(&iov, s->storage + blk_sector * BDRV_SECTOR_SIZE, + nb_sectors * BDRV_SECTOR_SIZE); + blk_aio_pwritev(s->blk, blk_sector * BDRV_SECTOR_SIZE, &iov, 0, blk_sync_complete, NULL); } static inline void flash_sync_area(Flash *s, int64_t off, int64_t len) { + int64_t start, end, nb_sectors; QEMUIOVector iov; if (!s->blk || blk_is_read_only(s->blk)) { @@ -381,9 +385,13 @@ static inline void flash_sync_area(Flash } assert(!(len % BDRV_SECTOR_SIZE)); + start = off / BDRV_SECTOR_SIZE; + end = (off + len) / BDRV_SECTOR_SIZE; + nb_sectors = end - start; qemu_iovec_init(&iov, 1); - qemu_iovec_add(&iov, s->storage + off, len); - blk_aio_pwritev(s->blk, off, &iov, 0, blk_sync_complete, NULL); + qemu_iovec_add(&iov, s->storage + (start * BDRV_SECTOR_SIZE), + nb_sectors * BDRV_SECTOR_SIZE); + blk_aio_pwritev(s->blk, start, &iov, 0, blk_sync_complete, NULL); } static void flash_erase(Flash *s, int offset, FlashCMD cmd)