Message ID | 20210304133446.1521-1-jiangkunkun@huawei.com |
---|---|
State | New |
Headers | show |
Series | vfio: Support host translation granule size | expand |
On Thu, 4 Mar 2021 21:34:46 +0800 Kunkun Jiang <jiangkunkun@huawei.com> wrote: > The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with > the dirty pages of memory by bitmap-traveling, regardless of whether > the bitmap is aligned correctly or not. > > cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > host page size. So it'd better to set bitmap_pgsize to host page size > to support more translation granule sizes. > > Fixes: 87ea529c502 (vfio: Get migration capability flags for container) > Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> > --- > hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- > 1 file changed, 22 insertions(+), 22 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 6ff1daa763..69fb5083a4 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, > { > struct vfio_iommu_type1_dma_unmap *unmap; > struct vfio_bitmap *bitmap; > - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; > + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; > int ret; > > unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); > @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, > bitmap = (struct vfio_bitmap *)&unmap->data; > > /* > - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to > - * TARGET_PAGE_SIZE. > + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize > + * to qemu_real_host_page_size. I don't see that this change is well supported by the code, cpu_physical_memory_set_dirty_lebitmap() seems to operate on TARGET_PAGE_SIZE, and the next three patch chunks take a detour through memory listener code that seem unrelated to the change described in the commit log. This claims to fix something, what is actually broken? Thanks, Alex > */ > > - bitmap->pgsize = TARGET_PAGE_SIZE; > + bitmap->pgsize = qemu_real_host_page_size; > bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / > BITS_PER_BYTE; > > @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, > return; > } > > - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != > - (section->offset_within_region & ~TARGET_PAGE_MASK))) { > + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != > + (section->offset_within_region & ~qemu_real_host_page_mask))) { > error_report("%s received unaligned region", __func__); > return; > } > > - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); > llend = int128_make64(section->offset_within_address_space); > llend = int128_add(llend, section->size); > - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); > > if (int128_ge(int128_make64(iova), llend)) { > return; > @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, > return; > } > > - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != > - (section->offset_within_region & ~TARGET_PAGE_MASK))) { > + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != > + (section->offset_within_region & ~qemu_real_host_page_mask))) { > error_report("%s received unaligned region", __func__); > return; > } > @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, > */ > } > > - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); > llend = int128_make64(section->offset_within_address_space); > llend = int128_add(llend, section->size); > - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); > > if (int128_ge(int128_make64(iova), llend)) { > return; > @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, > range->size = size; > > /* > - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to > - * TARGET_PAGE_SIZE. > + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize > + * to qemu_real_host_page_size. > */ > - range->bitmap.pgsize = TARGET_PAGE_SIZE; > + range->bitmap.pgsize = qemu_real_host_page_size; > > - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; > + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; > range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / > BITS_PER_BYTE; > range->bitmap.data = g_try_malloc0(range->bitmap.size); > @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, > section->offset_within_region; > > return vfio_get_dirty_bitmap(container, > - TARGET_PAGE_ALIGN(section->offset_within_address_space), > + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), > int128_get64(section->size), ram_addr); > } > > @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, > header); > > /* > - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > - * TARGET_PAGE_SIZE to mark those dirty. > + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > + * qemu_real_host_page_size to mark those dirty. > */ > - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { > + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { > container->dirty_pages_supported = true; > container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; > container->dirty_pgsizes = cap_mig->pgsize_bitmap;
Hi Alex, On 2021/3/10 7:17, Alex Williamson wrote: > On Thu, 4 Mar 2021 21:34:46 +0800 > Kunkun Jiang <jiangkunkun@huawei.com> wrote: > >> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with >> the dirty pages of memory by bitmap-traveling, regardless of whether >> the bitmap is aligned correctly or not. >> >> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> host page size. So it'd better to set bitmap_pgsize to host page size >> to support more translation granule sizes. >> >> Fixes: 87ea529c502 (vfio: Get migration capability flags for container) >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> >> --- >> hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- >> 1 file changed, 22 insertions(+), 22 deletions(-) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index 6ff1daa763..69fb5083a4 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >> { >> struct vfio_iommu_type1_dma_unmap *unmap; >> struct vfio_bitmap *bitmap; >> - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; >> + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; >> int ret; >> >> unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); >> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >> bitmap = (struct vfio_bitmap *)&unmap->data; >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to >> - * TARGET_PAGE_SIZE. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize >> + * to qemu_real_host_page_size. > > > I don't see that this change is well supported by the code, > cpu_physical_memory_set_dirty_lebitmap() seems to operate on > TARGET_PAGE_SIZE, and the next three patch chunks take a detour through > memory listener code that seem unrelated to the change described in the > commit log. This claims to fix something, what is actually broken? Actually I have reported this bug long ago. cpu_physical_memory_set_dirty_lebitmap() expects the granule of @bitmap to be qemu_real_host_page_size, as it uses @hpratio to convert the bitmap. Thanks, Keqian > Thanks, > > Alex > >> */ >> >> - bitmap->pgsize = TARGET_PAGE_SIZE; >> + bitmap->pgsize = qemu_real_host_page_size; >> bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >> BITS_PER_BYTE; >> >> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, >> return; >> } >> >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >> error_report("%s received unaligned region", __func__); >> return; >> } >> >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >> llend = int128_make64(section->offset_within_address_space); >> llend = int128_add(llend, section->size); >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >> >> if (int128_ge(int128_make64(iova), llend)) { >> return; >> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, >> return; >> } >> >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >> error_report("%s received unaligned region", __func__); >> return; >> } >> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, >> */ >> } >> >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >> llend = int128_make64(section->offset_within_address_space); >> llend = int128_add(llend, section->size); >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >> >> if (int128_ge(int128_make64(iova), llend)) { >> return; >> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, >> range->size = size; >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to >> - * TARGET_PAGE_SIZE. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize >> + * to qemu_real_host_page_size. >> */ >> - range->bitmap.pgsize = TARGET_PAGE_SIZE; >> + range->bitmap.pgsize = qemu_real_host_page_size; >> >> - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; >> + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; >> range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >> BITS_PER_BYTE; >> range->bitmap.data = g_try_malloc0(range->bitmap.size); >> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, >> section->offset_within_region; >> >> return vfio_get_dirty_bitmap(container, >> - TARGET_PAGE_ALIGN(section->offset_within_address_space), >> + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), >> int128_get64(section->size), ram_addr); >> } >> >> @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, >> header); >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. >> */ >> - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { >> + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { >> container->dirty_pages_supported = true; >> container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; >> container->dirty_pgsizes = cap_mig->pgsize_bitmap; > > . >
Hi Alex, On 2021/3/10 7:17, Alex Williamson wrote: > On Thu, 4 Mar 2021 21:34:46 +0800 > Kunkun Jiang <jiangkunkun@huawei.com> wrote: > >> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with >> the dirty pages of memory by bitmap-traveling, regardless of whether >> the bitmap is aligned correctly or not. >> >> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> host page size. So it'd better to set bitmap_pgsize to host page size >> to support more translation granule sizes. >> >> Fixes: 87ea529c502 (vfio: Get migration capability flags for container) >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> >> --- >> hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- >> 1 file changed, 22 insertions(+), 22 deletions(-) >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index 6ff1daa763..69fb5083a4 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >> { >> struct vfio_iommu_type1_dma_unmap *unmap; >> struct vfio_bitmap *bitmap; >> - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; >> + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; >> int ret; >> >> unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); >> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >> bitmap = (struct vfio_bitmap *)&unmap->data; >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to >> - * TARGET_PAGE_SIZE. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize >> + * to qemu_real_host_page_size. > > I don't see that this change is well supported by the code, > cpu_physical_memory_set_dirty_lebitmap() seems to operate on Yes, cpu_physical_memory_set_dirty_lebitmap() is finally to operate on TARGET_PAGE_SIZE. But actually it supports pages in bitmap of qemu_real_host_page_size to mark those dirty. It uses "hpratio = qemu_real_host_page_size / TARGET_PAGE_SIZE" to adapt to different translation granule size(e.g. 4K 2M 1G). > TARGET_PAGE_SIZE, and the next three patch chunks take a detour through > memory listener code that seem unrelated to the change described in the > commit log. This claims to fix something, what is actually broken? > Thanks, > > Alex This patch 87ea529c502 (vfio: Get migration capability flags for container) is the start of the bug. The code of [1](marked below) restricts the host page size must be TARGET_PAGE_SIZE(e.g. 4K) to set container->dirty_pages_supported = true. It is inappropriate to limit the page size to TARGET_PAGE_SIZE. Best Regards Kunkun Jiang >> */ >> >> - bitmap->pgsize = TARGET_PAGE_SIZE; >> + bitmap->pgsize = qemu_real_host_page_size; >> bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >> BITS_PER_BYTE; >> >> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, >> return; >> } >> >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >> error_report("%s received unaligned region", __func__); >> return; >> } >> >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >> llend = int128_make64(section->offset_within_address_space); >> llend = int128_add(llend, section->size); >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >> >> if (int128_ge(int128_make64(iova), llend)) { >> return; >> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, >> return; >> } >> >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >> error_report("%s received unaligned region", __func__); >> return; >> } >> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, >> */ >> } >> >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >> llend = int128_make64(section->offset_within_address_space); >> llend = int128_add(llend, section->size); >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >> >> if (int128_ge(int128_make64(iova), llend)) { >> return; >> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, >> range->size = size; >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to >> - * TARGET_PAGE_SIZE. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize >> + * to qemu_real_host_page_size. >> */ >> - range->bitmap.pgsize = TARGET_PAGE_SIZE; >> + range->bitmap.pgsize = qemu_real_host_page_size; >> >> - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; >> + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; >> range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >> BITS_PER_BYTE; >> range->bitmap.data = g_try_malloc0(range->bitmap.size); >> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, >> section->offset_within_region; >> >> return vfio_get_dirty_bitmap(container, >> - TARGET_PAGE_ALIGN(section->offset_within_address_space), >> + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), >> int128_get64(section->size), ram_addr); >> } >> >> @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, >> header); >> >> /* >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >> - * TARGET_PAGE_SIZE to mark those dirty. >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >> + * qemu_real_host_page_size to mark those dirty. >> */ >> - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { >> + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { [1] >> container->dirty_pages_supported = true; >> container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; >> container->dirty_pgsizes = cap_mig->pgsize_bitmap; > .
On Wed, 10 Mar 2021 15:19:33 +0800 Kunkun Jiang <jiangkunkun@huawei.com> wrote: > Hi Alex, > > On 2021/3/10 7:17, Alex Williamson wrote: > > On Thu, 4 Mar 2021 21:34:46 +0800 > > Kunkun Jiang <jiangkunkun@huawei.com> wrote: > > > >> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with > >> the dirty pages of memory by bitmap-traveling, regardless of whether > >> the bitmap is aligned correctly or not. > >> > >> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > >> host page size. So it'd better to set bitmap_pgsize to host page size > >> to support more translation granule sizes. > >> > >> Fixes: 87ea529c502 (vfio: Get migration capability flags for container) > >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> > >> --- > >> hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- > >> 1 file changed, 22 insertions(+), 22 deletions(-) > >> > >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >> index 6ff1daa763..69fb5083a4 100644 > >> --- a/hw/vfio/common.c > >> +++ b/hw/vfio/common.c > >> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, > >> { > >> struct vfio_iommu_type1_dma_unmap *unmap; > >> struct vfio_bitmap *bitmap; > >> - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; > >> + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; > >> int ret; > >> > >> unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); > >> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, > >> bitmap = (struct vfio_bitmap *)&unmap->data; > >> > >> /* > >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to > >> - * TARGET_PAGE_SIZE. > >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize > >> + * to qemu_real_host_page_size. > > > > I don't see that this change is well supported by the code, > > cpu_physical_memory_set_dirty_lebitmap() seems to operate on > Yes, cpu_physical_memory_set_dirty_lebitmap() is finally to operate on > TARGET_PAGE_SIZE. But actually it supports pages in bitmap of > qemu_real_host_page_size to mark those dirty. It uses > "hpratio = qemu_real_host_page_size / TARGET_PAGE_SIZE" to adapt to > different translation granule size(e.g. 4K 2M 1G). Thanks for the explanation, it becomes more clear but I'm still a little confused. It appears that cpu_physical_memory_set_dirty_lebitmap() requires a bitmap in terms of qemu_real_host_page_size which is translated to target pages using hpratio. It's not clear to me by the explanation here and in the commit log that we're technically using the wrong page size reference for this function. > > TARGET_PAGE_SIZE, and the next three patch chunks take a detour through > > memory listener code that seem unrelated to the change described in the > > commit log. This claims to fix something, what is actually broken? > > Thanks, > > > > Alex > This patch 87ea529c502 (vfio: Get migration capability flags for container) > is the start of the bug. The code of [1](marked below) restricts the host > page size must be TARGET_PAGE_SIZE(e.g. 4K) to set > container->dirty_pages_supported = true. It is inappropriate to limit the > page size to TARGET_PAGE_SIZE. Right, the noted code requires that vfio supports the target page size, which for all practical purposes implies an hpratio = 1 currently. That unnecessarily restricts migration support to cases where target and host use the same page size, but this change allows migration in any case where vfio dirty bitmap supports the host page size, which is effectively any case where the device supports migration. However, the hpratio calculation requires that qemu_real_host_page_size is >= TARGET_PAGE_SIZE, otherwise the value becomes zero and it appears we'd only ever dirty the target zero page. Is this configuration restricted elsewhere, ex. 64K guest on 4K host? Exchanging an unnecessary restriction for allowing a buggy configuration isn't a good trade-off. Thanks, Alex > >> */ > >> > >> - bitmap->pgsize = TARGET_PAGE_SIZE; > >> + bitmap->pgsize = qemu_real_host_page_size; > >> bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / > >> BITS_PER_BYTE; > >> > >> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, > >> return; > >> } > >> > >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != > >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { > >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != > >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { > >> error_report("%s received unaligned region", __func__); > >> return; > >> } > >> > >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); > >> llend = int128_make64(section->offset_within_address_space); > >> llend = int128_add(llend, section->size); > >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); > >> > >> if (int128_ge(int128_make64(iova), llend)) { > >> return; > >> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, > >> return; > >> } > >> > >> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != > >> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { > >> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != > >> + (section->offset_within_region & ~qemu_real_host_page_mask))) { > >> error_report("%s received unaligned region", __func__); > >> return; > >> } > >> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, > >> */ > >> } > >> > >> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > >> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); > >> llend = int128_make64(section->offset_within_address_space); > >> llend = int128_add(llend, section->size); > >> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); > >> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); > >> > >> if (int128_ge(int128_make64(iova), llend)) { > >> return; > >> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, > >> range->size = size; > >> > >> /* > >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > >> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to > >> - * TARGET_PAGE_SIZE. > >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > >> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize > >> + * to qemu_real_host_page_size. > >> */ > >> - range->bitmap.pgsize = TARGET_PAGE_SIZE; > >> + range->bitmap.pgsize = qemu_real_host_page_size; > >> > >> - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; > >> + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; > >> range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / > >> BITS_PER_BYTE; > >> range->bitmap.data = g_try_malloc0(range->bitmap.size); > >> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, > >> section->offset_within_region; > >> > >> return vfio_get_dirty_bitmap(container, > >> - TARGET_PAGE_ALIGN(section->offset_within_address_space), > >> + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), > >> int128_get64(section->size), ram_addr); > >> } > >> > >> @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, > >> header); > >> > >> /* > >> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of > >> - * TARGET_PAGE_SIZE to mark those dirty. > >> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of > >> + * qemu_real_host_page_size to mark those dirty. > >> */ > >> - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { > >> + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { > [1] > >> container->dirty_pages_supported = true; > >> container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; > >> container->dirty_pgsizes = cap_mig->pgsize_bitmap; > > . > >
Hi Alex, On 2021/3/11 4:24, Alex Williamson wrote: > On Wed, 10 Mar 2021 15:19:33 +0800 > Kunkun Jiang <jiangkunkun@huawei.com> wrote: > >> Hi Alex, >> >> On 2021/3/10 7:17, Alex Williamson wrote: >>> On Thu, 4 Mar 2021 21:34:46 +0800 >>> Kunkun Jiang <jiangkunkun@huawei.com> wrote: >>> >>>> The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with >>>> the dirty pages of memory by bitmap-traveling, regardless of whether >>>> the bitmap is aligned correctly or not. >>>> >>>> cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >>>> host page size. So it'd better to set bitmap_pgsize to host page size >>>> to support more translation granule sizes. >>>> >>>> Fixes: 87ea529c502 (vfio: Get migration capability flags for container) >>>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> >>>> --- >>>> hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- >>>> 1 file changed, 22 insertions(+), 22 deletions(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 6ff1daa763..69fb5083a4 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >>>> { >>>> struct vfio_iommu_type1_dma_unmap *unmap; >>>> struct vfio_bitmap *bitmap; >>>> - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; >>>> + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; >>>> int ret; >>>> >>>> unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); >>>> @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, >>>> bitmap = (struct vfio_bitmap *)&unmap->data; >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to >>>> - * TARGET_PAGE_SIZE. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >>>> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize >>>> + * to qemu_real_host_page_size. >>> >>> I don't see that this change is well supported by the code, >>> cpu_physical_memory_set_dirty_lebitmap() seems to operate on >> Yes, cpu_physical_memory_set_dirty_lebitmap() is finally to operate on >> TARGET_PAGE_SIZE. But actually it supports pages in bitmap of >> qemu_real_host_page_size to mark those dirty. It uses >> "hpratio = qemu_real_host_page_size / TARGET_PAGE_SIZE" to adapt to >> different translation granule size(e.g. 4K 2M 1G). > > Thanks for the explanation, it becomes more clear but I'm still a > little confused. It appears that > cpu_physical_memory_set_dirty_lebitmap() requires a bitmap in terms of > qemu_real_host_page_size which is translated to target pages using > hpratio. It's not clear to me by the explanation here and in the > commit log that we're technically using the wrong page size reference > for this function. > >>> TARGET_PAGE_SIZE, and the next three patch chunks take a detour through >>> memory listener code that seem unrelated to the change described in the >>> commit log. This claims to fix something, what is actually broken? >>> Thanks, >>> >>> Alex >> This patch 87ea529c502 (vfio: Get migration capability flags for container) >> is the start of the bug. The code of [1](marked below) restricts the host >> page size must be TARGET_PAGE_SIZE(e.g. 4K) to set >> container->dirty_pages_supported = true. It is inappropriate to limit the >> page size to TARGET_PAGE_SIZE. > > Right, the noted code requires that vfio supports the target page size, > which for all practical purposes implies an hpratio = 1 currently. > That unnecessarily restricts migration support to cases where target and > host use the same page size, but this change allows migration in any > case where vfio dirty bitmap supports the host page size, which is > effectively any case where the device supports migration. However, the > hpratio calculation requires that qemu_real_host_page_size is >= > TARGET_PAGE_SIZE, otherwise the value becomes zero and it appears we'd > only ever dirty the target zero page. Is this configuration restricted > elsewhere, ex. 64K guest on 4K host? Exchanging an unnecessary > restriction for allowing a buggy configuration isn't a good trade-off. > Thanks, FYI, The restriction that (qemu_real_host_page_size is >= TARGET_PAGE_SIZE) has already existed. As in the kvm_init(), we have an assert: assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size); As I understand, the TARGET_PAGE_SIZE is an architecture specific value, and not affected by the page size of guest OS. For arm64, it is fixed to be 12 bit(4K), while the qemu_real_host_page_size depends on host kernel configuration, it can be 4K, 16K or 64K. IIUC, the kernel vfio/iommu_type1 actually reports supported page_size of dirty log to be host_page_size, so if host use 16K or 64K, Qemu will refuse to support vfio migration. Thanks, Keqian > > Alex > >>>> */ >>>> >>>> - bitmap->pgsize = TARGET_PAGE_SIZE; >>>> + bitmap->pgsize = qemu_real_host_page_size; >>>> bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >>>> BITS_PER_BYTE; >>>> >>>> @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, >>>> return; >>>> } >>>> >>>> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >>>> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >>>> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >>>> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >>>> error_report("%s received unaligned region", __func__); >>>> return; >>>> } >>>> >>>> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >>>> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >>>> llend = int128_make64(section->offset_within_address_space); >>>> llend = int128_add(llend, section->size); >>>> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >>>> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >>>> >>>> if (int128_ge(int128_make64(iova), llend)) { >>>> return; >>>> @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, >>>> return; >>>> } >>>> >>>> - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != >>>> - (section->offset_within_region & ~TARGET_PAGE_MASK))) { >>>> + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != >>>> + (section->offset_within_region & ~qemu_real_host_page_mask))) { >>>> error_report("%s received unaligned region", __func__); >>>> return; >>>> } >>>> @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, >>>> */ >>>> } >>>> >>>> - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); >>>> + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); >>>> llend = int128_make64(section->offset_within_address_space); >>>> llend = int128_add(llend, section->size); >>>> - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); >>>> + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); >>>> >>>> if (int128_ge(int128_make64(iova), llend)) { >>>> return; >>>> @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, >>>> range->size = size; >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to >>>> - * TARGET_PAGE_SIZE. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >>>> + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize >>>> + * to qemu_real_host_page_size. >>>> */ >>>> - range->bitmap.pgsize = TARGET_PAGE_SIZE; >>>> + range->bitmap.pgsize = qemu_real_host_page_size; >>>> >>>> - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; >>>> + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; >>>> range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / >>>> BITS_PER_BYTE; >>>> range->bitmap.data = g_try_malloc0(range->bitmap.size); >>>> @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, >>>> section->offset_within_region; >>>> >>>> return vfio_get_dirty_bitmap(container, >>>> - TARGET_PAGE_ALIGN(section->offset_within_address_space), >>>> + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), >>>> int128_get64(section->size), ram_addr); >>>> } >>>> >>>> @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, >>>> header); >>>> >>>> /* >>>> - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of >>>> - * TARGET_PAGE_SIZE to mark those dirty. >>>> + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of >>>> + * qemu_real_host_page_size to mark those dirty. >>>> */ >>>> - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { >>>> + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { >> [1] >>>> container->dirty_pages_supported = true; >>>> container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; >>>> container->dirty_pgsizes = cap_mig->pgsize_bitmap; >>> . >> >> > > . >
diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 6ff1daa763..69fb5083a4 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -378,7 +378,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, { struct vfio_iommu_type1_dma_unmap *unmap; struct vfio_bitmap *bitmap; - uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS; + uint64_t pages = REAL_HOST_PAGE_ALIGN(size) / qemu_real_host_page_size; int ret; unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap)); @@ -390,12 +390,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container, bitmap = (struct vfio_bitmap *)&unmap->data; /* - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to - * TARGET_PAGE_SIZE. + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of + * qemu_real_host_page_size to mark those dirty. Hence set bitmap_pgsize + * to qemu_real_host_page_size. */ - bitmap->pgsize = TARGET_PAGE_SIZE; + bitmap->pgsize = qemu_real_host_page_size; bitmap->size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / BITS_PER_BYTE; @@ -674,16 +674,16 @@ static void vfio_listener_region_add(MemoryListener *listener, return; } - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != - (section->offset_within_region & ~TARGET_PAGE_MASK))) { + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != + (section->offset_within_region & ~qemu_real_host_page_mask))) { error_report("%s received unaligned region", __func__); return; } - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); llend = int128_make64(section->offset_within_address_space); llend = int128_add(llend, section->size); - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); if (int128_ge(int128_make64(iova), llend)) { return; @@ -892,8 +892,8 @@ static void vfio_listener_region_del(MemoryListener *listener, return; } - if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) != - (section->offset_within_region & ~TARGET_PAGE_MASK))) { + if (unlikely((section->offset_within_address_space & ~qemu_real_host_page_mask) != + (section->offset_within_region & ~qemu_real_host_page_mask))) { error_report("%s received unaligned region", __func__); return; } @@ -921,10 +921,10 @@ static void vfio_listener_region_del(MemoryListener *listener, */ } - iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); + iova = REAL_HOST_PAGE_ALIGN(section->offset_within_address_space); llend = int128_make64(section->offset_within_address_space); llend = int128_add(llend, section->size); - llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK)); + llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask)); if (int128_ge(int128_make64(iova), llend)) { return; @@ -1004,13 +1004,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova, range->size = size; /* - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of - * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to - * TARGET_PAGE_SIZE. + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of + * qemu_real_host_page_size to mark those dirty. Hence set bitmap's pgsize + * to qemu_real_host_page_size. */ - range->bitmap.pgsize = TARGET_PAGE_SIZE; + range->bitmap.pgsize = qemu_real_host_page_size; - pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS; + pages = REAL_HOST_PAGE_ALIGN(range->size) / qemu_real_host_page_size; range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) / BITS_PER_BYTE; range->bitmap.data = g_try_malloc0(range->bitmap.size); @@ -1114,7 +1114,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, section->offset_within_region; return vfio_get_dirty_bitmap(container, - TARGET_PAGE_ALIGN(section->offset_within_address_space), + REAL_HOST_PAGE_ALIGN(section->offset_within_address_space), int128_get64(section->size), ram_addr); } @@ -1655,10 +1655,10 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container, header); /* - * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of - * TARGET_PAGE_SIZE to mark those dirty. + * cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of + * qemu_real_host_page_size to mark those dirty. */ - if (cap_mig->pgsize_bitmap & TARGET_PAGE_SIZE) { + if (cap_mig->pgsize_bitmap & qemu_real_host_page_size) { container->dirty_pages_supported = true; container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size; container->dirty_pgsizes = cap_mig->pgsize_bitmap;
The cpu_physical_memory_set_dirty_lebitmap() can quickly deal with the dirty pages of memory by bitmap-traveling, regardless of whether the bitmap is aligned correctly or not. cpu_physical_memory_set_dirty_lebitmap() supports pages in bitmap of host page size. So it'd better to set bitmap_pgsize to host page size to support more translation granule sizes. Fixes: 87ea529c502 (vfio: Get migration capability flags for container) Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> --- hw/vfio/common.c | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-)