Message ID | 20210621093240.4170701-1-philmd@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2] block/nvme: Fix VFIO_MAP_DMA failed: No space left on device | expand |
> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > When the NVMe block driver was introduced (see commit bdd6a90a9e5, > January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning > -ENOMEM in case of error. The driver was correctly handling the > error path to recycle its volatile IOVA mappings. > > To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit > DMA mappings per container", April 2019) added the -ENOSPC error to > signal the user exhausted the DMA mappings available for a container. > > The block driver started to mis-behave: > > qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device > (qemu) > (qemu) info status > VM status: paused (io-error) > (qemu) c > VFIO_MAP_DMA failed: No space left on device > qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. Hi Phil, The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? Reviewed-by: Fam Zheng <fam@euphon.net> > > Fix by handling the new -ENOSPC error (when DMA mappings are > exhausted) without any distinction to the current -ENOMEM error, > so we don't change the behavior on old kernels where the CVE-2019-3882 > fix is not present. > > An easy way to reproduce this bug is to restrict the DMA mapping > limit (65535 by default) when loading the VFIO IOMMU module: > > # modprobe vfio_iommu_type1 dma_entry_limit=666 > > Cc: qemu-stable@nongnu.org > Reported-by: Michal Prívozník <mprivozn@redhat.com> > Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") > Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 > Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> > --- > v2: KISS checking both errors undistinguishedly (Maxim) > --- > block/nvme.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 2b5421e7aa6..c3d2a49866c 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -1030,7 +1030,7 @@ try_map: > r = qemu_vfio_dma_map(s->vfio, > qiov->iov[i].iov_base, > len, true, &iova); > - if (r == -ENOMEM && retry) { > + if ((r == -ENOMEM || r == -ENOSPC) && retry) { > retry = false; > trace_nvme_dma_flush_queue_wait(s); > if (s->dma_map_count) { > -- > 2.31.1 > >
On 6/21/21 3:18 PM, Fam Zheng wrote: > > >> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >> >> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >> -ENOMEM in case of error. The driver was correctly handling the >> error path to recycle its volatile IOVA mappings. >> >> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >> DMA mappings per container", April 2019) added the -ENOSPC error to >> signal the user exhausted the DMA mappings available for a container. >> >> The block driver started to mis-behave: >> >> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >> (qemu) >> (qemu) info status >> VM status: paused (io-error) >> (qemu) c >> VFIO_MAP_DMA failed: No space left on device >> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. > > Hi Phil, > > > The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping exhausted. I don't understand the full "VM resume" path, but this is not what we want (IO_NOSPACE is to warn the operator to add more storage and resume, which is pointless in our case, resuming won't help until we flush the mappings). IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED.
> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > On 6/21/21 3:18 PM, Fam Zheng wrote: >> >> >>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>> >>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>> -ENOMEM in case of error. The driver was correctly handling the >>> error path to recycle its volatile IOVA mappings. >>> >>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>> DMA mappings per container", April 2019) added the -ENOSPC error to >>> signal the user exhausted the DMA mappings available for a container. >>> >>> The block driver started to mis-behave: >>> >>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>> (qemu) >>> (qemu) info status >>> VM status: paused (io-error) >>> (qemu) c >>> VFIO_MAP_DMA failed: No space left on device >>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. >> >> Hi Phil, >> >> >> The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? > > Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE > -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping > exhausted. I don't understand the full "VM resume" path, but this > is not what we want (IO_NOSPACE is to warn the operator to add > more storage and resume, which is pointless in our case, resuming > won't help until we flush the mappings). > > IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? Fam
On 6/21/21 5:36 PM, Fam Zheng wrote: >> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >> On 6/21/21 3:18 PM, Fam Zheng wrote: >>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>> >>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>>> -ENOMEM in case of error. The driver was correctly handling the >>>> error path to recycle its volatile IOVA mappings. >>>> >>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>>> DMA mappings per container", April 2019) added the -ENOSPC error to >>>> signal the user exhausted the DMA mappings available for a container. >>>> >>>> The block driver started to mis-behave: >>>> >>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>>> (qemu) >>>> (qemu) info status >>>> VM status: paused (io-error) >>>> (qemu) c >>>> VFIO_MAP_DMA failed: No space left on device >>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. >>> >>> Hi Phil, >>> >>> >>> The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? >> >> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE >> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping >> exhausted. I don't understand the full "VM resume" path, but this >> is not what we want (IO_NOSPACE is to warn the operator to add >> more storage and resume, which is pointless in our case, resuming >> won't help until we flush the mappings). >> >> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. > > I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? It seems the resuming code path bug has been fixed elsewhere: (qemu) info status info status VM status: paused (io-error) (qemu) c c 2021-06-22T07:27:00.745466Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) info status info status VM status: paused (io-error) (qemu) c c 2021-06-22T07:27:12.458137Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) c c 2021-06-22T07:27:13.439167Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) c c 2021-06-22T07:27:14.272071Z qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu)
On 6/22/21 9:29 AM, Philippe Mathieu-Daudé wrote: > On 6/21/21 5:36 PM, Fam Zheng wrote: >>> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>> On 6/21/21 3:18 PM, Fam Zheng wrote: >>>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>>> >>>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>>>> -ENOMEM in case of error. The driver was correctly handling the >>>>> error path to recycle its volatile IOVA mappings. >>>>> >>>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>>>> DMA mappings per container", April 2019) added the -ENOSPC error to >>>>> signal the user exhausted the DMA mappings available for a container. >>>>> >>>>> The block driver started to mis-behave: >>>>> >>>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>>>> (qemu) >>>>> (qemu) info status >>>>> VM status: paused (io-error) >>>>> (qemu) c >>>>> VFIO_MAP_DMA failed: No space left on device >>>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. >>>> >>>> Hi Phil, >>>> >>>> >>>> The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? >>> >>> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE >>> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping >>> exhausted. I don't understand the full "VM resume" path, but this >>> is not what we want (IO_NOSPACE is to warn the operator to add >>> more storage and resume, which is pointless in our case, resuming >>> won't help until we flush the mappings). >>> >>> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. >> >> I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? > > It seems the resuming code path bug has been fixed elsewhere: > > (qemu) info status > info status > VM status: paused (io-error) > (qemu) c > c > 2021-06-22T07:27:00.745466Z qemu-system-x86_64: VFIO_MAP_DMA failed: No > space left on device > (qemu) info status > info status > VM status: paused (io-error) > (qemu) c > c > 2021-06-22T07:27:12.458137Z qemu-system-x86_64: VFIO_MAP_DMA failed: No > space left on device > (qemu) c > c > 2021-06-22T07:27:13.439167Z qemu-system-x86_64: VFIO_MAP_DMA failed: No > space left on device > (qemu) c > c > 2021-06-22T07:27:14.272071Z qemu-system-x86_64: VFIO_MAP_DMA failed: No > space left on device > (qemu) > I tested all releases up to v4.1.0 and could not trigger the blk_get_aio_context() assertion. Building using --enable-debug. IIRC Gentoo is more aggressive, so I'll restart using -O2.
On 6/22/21 10:06 AM, Philippe Mathieu-Daudé wrote: > On 6/22/21 9:29 AM, Philippe Mathieu-Daudé wrote: >> On 6/21/21 5:36 PM, Fam Zheng wrote: >>>> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>> On 6/21/21 3:18 PM, Fam Zheng wrote: >>>>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>>>> >>>>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>>>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>>>>> -ENOMEM in case of error. The driver was correctly handling the >>>>>> error path to recycle its volatile IOVA mappings. >>>>>> >>>>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>>>>> DMA mappings per container", April 2019) added the -ENOSPC error to >>>>>> signal the user exhausted the DMA mappings available for a container. >>>>>> >>>>>> The block driver started to mis-behave: >>>>>> >>>>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>>>>> (qemu) >>>>>> (qemu) info status >>>>>> VM status: paused (io-error) >>>>>> (qemu) c >>>>>> VFIO_MAP_DMA failed: No space left on device >>>>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. >>>>> >>>>> Hi Phil, >>>>> >>>>> >>>>> The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? >>>> >>>> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE >>>> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping >>>> exhausted. I don't understand the full "VM resume" path, but this >>>> is not what we want (IO_NOSPACE is to warn the operator to add >>>> more storage and resume, which is pointless in our case, resuming >>>> won't help until we flush the mappings). >>>> >>>> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. >>> >>> I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? >> >> It seems the resuming code path bug has been fixed elsewhere: >> >> (qemu) info status >> info status >> VM status: paused (io-error) >> (qemu) c >> c >> 2021-06-22T07:27:00.745466Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >> space left on device >> (qemu) info status >> info status >> VM status: paused (io-error) >> (qemu) c >> c >> 2021-06-22T07:27:12.458137Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >> space left on device >> (qemu) c >> c >> 2021-06-22T07:27:13.439167Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >> space left on device >> (qemu) c >> c >> 2021-06-22T07:27:14.272071Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >> space left on device >> (qemu) >> > > I tested all releases up to v4.1.0 and could not trigger the > blk_get_aio_context() assertion. Building using --enable-debug. > IIRC Gentoo is more aggressive, so I'll restart using -O2. Took 4h30 to test all releases with -O3, couldn't reproduce :( I wish I hadn't postponed writing an Ansible test script... On v1 Michal said he doesn't have access to the machine anymore, so I'll assume the other issue got fixed elsewhere.
> On 22 Jun 2021, at 13:42, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: > > On 6/22/21 10:06 AM, Philippe Mathieu-Daudé wrote: >> On 6/22/21 9:29 AM, Philippe Mathieu-Daudé wrote: >>> On 6/21/21 5:36 PM, Fam Zheng wrote: >>>>> On 21 Jun 2021, at 16:13, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>>> On 6/21/21 3:18 PM, Fam Zheng wrote: >>>>>>> On 21 Jun 2021, at 10:32, Philippe Mathieu-Daudé <philmd@redhat.com> wrote: >>>>>>> >>>>>>> When the NVMe block driver was introduced (see commit bdd6a90a9e5, >>>>>>> January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning >>>>>>> -ENOMEM in case of error. The driver was correctly handling the >>>>>>> error path to recycle its volatile IOVA mappings. >>>>>>> >>>>>>> To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit >>>>>>> DMA mappings per container", April 2019) added the -ENOSPC error to >>>>>>> signal the user exhausted the DMA mappings available for a container. >>>>>>> >>>>>>> The block driver started to mis-behave: >>>>>>> >>>>>>> qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device >>>>>>> (qemu) >>>>>>> (qemu) info status >>>>>>> VM status: paused (io-error) >>>>>>> (qemu) c >>>>>>> VFIO_MAP_DMA failed: No space left on device >>>>>>> qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. >>>>>> >>>>>> Hi Phil, >>>>>> >>>>>> >>>>>> The diff looks good to me, but I’m not sure what exactly caused the assertion failure. There is `if (r) { goto fail; }` that handles -ENOSPC before, so it should be treated as a general case. What am I missing? >>>>> >>>>> Good catch, ENOSPC ends setting BLOCK_DEVICE_IO_STATUS_NOSPACE >>>>> -> BLOCK_ERROR_ACTION_STOP, so the VM is paused with DMA mapping >>>>> exhausted. I don't understand the full "VM resume" path, but this >>>>> is not what we want (IO_NOSPACE is to warn the operator to add >>>>> more storage and resume, which is pointless in our case, resuming >>>>> won't help until we flush the mappings). >>>>> >>>>> IIUC what we want is return ENOMEM to set BLOCK_DEVICE_IO_STATUS_FAILED. >>>> >>>> I agree with that. It just makes me feel there’s another bug in the resuming code path. Can you get a backtrace? >>> >>> It seems the resuming code path bug has been fixed elsewhere: >>> >>> (qemu) info status >>> info status >>> VM status: paused (io-error) >>> (qemu) c >>> c >>> 2021-06-22T07:27:00.745466Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >>> space left on device >>> (qemu) info status >>> info status >>> VM status: paused (io-error) >>> (qemu) c >>> c >>> 2021-06-22T07:27:12.458137Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >>> space left on device >>> (qemu) c >>> c >>> 2021-06-22T07:27:13.439167Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >>> space left on device >>> (qemu) c >>> c >>> 2021-06-22T07:27:14.272071Z qemu-system-x86_64: VFIO_MAP_DMA failed: No >>> space left on device >>> (qemu) >>> >> >> I tested all releases up to v4.1.0 and could not trigger the >> blk_get_aio_context() assertion. Building using --enable-debug. >> IIRC Gentoo is more aggressive, so I'll restart using -O2. > > Took 4h30 to test all releases with -O3, couldn't reproduce :( > > I wish I hadn't postponed writing an Ansible test script... > > On v1 Michal said he doesn't have access to the machine anymore, > so I'll assume the other issue got fixed elsewhere. > > Cool, then I think it’s just the commit message that should be updated. Fam
diff --git a/block/nvme.c b/block/nvme.c index 2b5421e7aa6..c3d2a49866c 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -1030,7 +1030,7 @@ try_map: r = qemu_vfio_dma_map(s->vfio, qiov->iov[i].iov_base, len, true, &iova); - if (r == -ENOMEM && retry) { + if ((r == -ENOMEM || r == -ENOSPC) && retry) { retry = false; trace_nvme_dma_flush_queue_wait(s); if (s->dma_map_count) {
When the NVMe block driver was introduced (see commit bdd6a90a9e5, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device qemu-system-x86_64: block/block-backend.c:1968: blk_get_aio_context: Assertion `ctx == blk->ctx' failed. Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Reported-by: Michal Prívozník <mprivozn@redhat.com> Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> --- v2: KISS checking both errors undistinguishedly (Maxim) --- block/nvme.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)