Message ID | 1440147950-1178-5-git-send-email-jasowang@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, 21 Aug 2015 17:05:48 +0800 Jason Wang <jasowang@redhat.com> wrote: > We use data match eventfd for 1.0 notification currently. This could > be slow since software decoding is needed for mmio exit. To speed this > up, we can switch to use wild card mmio eventfd for 1.0 notification > since we can examine the queue index directly from the writing > address. KVM kernel module can utilize this by registering it to fast > mmio bus which could be as fast as pio on ept capable machine. > > Lots of improvements were seen on a ept capable machine: > > Guest RX:(TCP) > size/session/+throughput%/+cpu%/-+per cpu%/ > 64/1/+1.6807%/[-16.2421%]/[+21.3984%]/ > 64/2/+0.6091%/[-11.0187%]/[+13.0678%]/ > 64/4/+0.0553%/[-5.9768%]/[+6.4155%]/ > 64/8/+0.1206%/[-4.0057%]/[+4.2984%]/ > 256/1/-0.0031%/[-10.1166%]/[+11.2517%]/ > 256/2/-0.5058%/[-6.1656%]/+6.0317%]/ > ... > > Guest TX:(TCP) > size/session/+throughput%/+cpu%/-+per cpu%/ > 64/1/[+18.9183%]/-0.2823%/[+19.2550%]/ > 64/2/[+13.5714%]/[+2.2675%]/[+11.0533%]/ > 64/4/[+13.1070%]/[+2.1817%]/[+10.6920%]/ > 64/8/[+13.0426%]/[+2.0887%]/[+10.7299%]/ > 256/1/[+36.2761%]/+6.3434%/[+28.1471%]/ > ... > 1024/1/[+44.8873%]/+2.0811%/[+41.9335%]/ > ... > 1024/4/+0.0228%/[-2.2044%]/[+2.2774%]/ > ... > 16384/2/+0.0127%/[-5.0346%]/[+5.3148%]/ > ... > 65535/1/[+0.0062%]/[-4.1183%]/[+4.3017%]/ > 65535/2/+0.0004%/[-4.2311%]/[+4.4185%]/ > 65535/4/+0.0107%/[-4.6106%]/[+4.8446%]/ > 65535/8/-0.0090%/[-5.5178%]/[+5.8306%]/ > > Latency:(TCP_RR) > size/session/+transaction rate%/+cpu%/-+per cpu%/ > 64/1/[+6.5248%]/[-9.2882%]/[+17.4322%]/ > 64/25/[+11.0854%]/[+0.8000%]/[+10.2038%]/ > 64/50/[+12.1076%]/[+2.4627%]/[+9.4131%]/ > 256/1/[+5.3677%]/[+10.5669%]/-4.7024%/ > 256/25/[+5.6402%]/-0.8962%/[+6.5955%]/ > 256/50/[+5.9685%]/[+1.7766%]/[+4.1188%]/ > 4096/1/+0.2508%/[-10.4941%]/[+12.0047%]/ > 4096/25/[+1.8533%]/-0.0273%/+1.8812%/ > 4096/50/[+1.2156%]/-1.4134%/+2.6667%/ > > Notes: data with '[]' is the one whose significance is greater than 95%. > > Thanks Wenli Quan <wquan@redhat.com> for the benchmarking. > > Cc: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > hw/virtio/virtio-pci.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index d785623..fbd1f1f 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -226,8 +226,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > } > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); > if (modern) { > - memory_region_add_eventfd(modern_mr, modern_addr, 2, > - true, n, notifier); > + memory_region_add_eventfd(modern_mr, modern_addr, 0, > + false, n, notifier); This calls for the following change in memory.c: static void adjust_endianness(MemoryRegion *mr, uint64_t *data, unsigned size) { - if (memory_region_wrong_endianness(mr)) { + if (size && memory_region_wrong_endianness(mr)) { otherwise we abort on PPC64. > } > if (legacy) { > memory_region_add_eventfd(legacy_mr, legacy_addr, 2, > @@ -235,8 +235,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, > } > } else { > if (modern) { > - memory_region_del_eventfd(modern_mr, modern_addr, 2, > - true, n, notifier); > + memory_region_del_eventfd(modern_mr, modern_addr, 0, > + false, n, notifier); > } > if (legacy) { > memory_region_del_eventfd(legacy_mr, legacy_addr, 2,
On 08/25/2015 12:30 AM, Greg Kurz wrote: > On Fri, 21 Aug 2015 17:05:48 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> > We use data match eventfd for 1.0 notification currently. This could >> > be slow since software decoding is needed for mmio exit. To speed this >> > up, we can switch to use wild card mmio eventfd for 1.0 notification >> > since we can examine the queue index directly from the writing >> > address. KVM kernel module can utilize this by registering it to fast >> > mmio bus which could be as fast as pio on ept capable machine. >> > >> > Lots of improvements were seen on a ept capable machine: >> > >> > Guest RX:(TCP) >> > size/session/+throughput%/+cpu%/-+per cpu%/ >> > 64/1/+1.6807%/[-16.2421%]/[+21.3984%]/ >> > 64/2/+0.6091%/[-11.0187%]/[+13.0678%]/ >> > 64/4/+0.0553%/[-5.9768%]/[+6.4155%]/ >> > 64/8/+0.1206%/[-4.0057%]/[+4.2984%]/ >> > 256/1/-0.0031%/[-10.1166%]/[+11.2517%]/ >> > 256/2/-0.5058%/[-6.1656%]/+6.0317%]/ >> > ... >> > >> > Guest TX:(TCP) >> > size/session/+throughput%/+cpu%/-+per cpu%/ >> > 64/1/[+18.9183%]/-0.2823%/[+19.2550%]/ >> > 64/2/[+13.5714%]/[+2.2675%]/[+11.0533%]/ >> > 64/4/[+13.1070%]/[+2.1817%]/[+10.6920%]/ >> > 64/8/[+13.0426%]/[+2.0887%]/[+10.7299%]/ >> > 256/1/[+36.2761%]/+6.3434%/[+28.1471%]/ >> > ... >> > 1024/1/[+44.8873%]/+2.0811%/[+41.9335%]/ >> > ... >> > 1024/4/+0.0228%/[-2.2044%]/[+2.2774%]/ >> > ... >> > 16384/2/+0.0127%/[-5.0346%]/[+5.3148%]/ >> > ... >> > 65535/1/[+0.0062%]/[-4.1183%]/[+4.3017%]/ >> > 65535/2/+0.0004%/[-4.2311%]/[+4.4185%]/ >> > 65535/4/+0.0107%/[-4.6106%]/[+4.8446%]/ >> > 65535/8/-0.0090%/[-5.5178%]/[+5.8306%]/ >> > >> > Latency:(TCP_RR) >> > size/session/+transaction rate%/+cpu%/-+per cpu%/ >> > 64/1/[+6.5248%]/[-9.2882%]/[+17.4322%]/ >> > 64/25/[+11.0854%]/[+0.8000%]/[+10.2038%]/ >> > 64/50/[+12.1076%]/[+2.4627%]/[+9.4131%]/ >> > 256/1/[+5.3677%]/[+10.5669%]/-4.7024%/ >> > 256/25/[+5.6402%]/-0.8962%/[+6.5955%]/ >> > 256/50/[+5.9685%]/[+1.7766%]/[+4.1188%]/ >> > 4096/1/+0.2508%/[-10.4941%]/[+12.0047%]/ >> > 4096/25/[+1.8533%]/-0.0273%/+1.8812%/ >> > 4096/50/[+1.2156%]/-1.4134%/+2.6667%/ >> > >> > Notes: data with '[]' is the one whose significance is greater than 95%. >> > >> > Thanks Wenli Quan <wquan@redhat.com> for the benchmarking. >> > >> > Cc: Michael S. Tsirkin <mst@redhat.com> >> > Signed-off-by: Jason Wang <jasowang@redhat.com> >> > --- >> > hw/virtio/virtio-pci.c | 8 ++++---- >> > 1 file changed, 4 insertions(+), 4 deletions(-) >> > >> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c >> > index d785623..fbd1f1f 100644 >> > --- a/hw/virtio/virtio-pci.c >> > +++ b/hw/virtio/virtio-pci.c >> > @@ -226,8 +226,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, >> > } >> > virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); >> > if (modern) { >> > - memory_region_add_eventfd(modern_mr, modern_addr, 2, >> > - true, n, notifier); >> > + memory_region_add_eventfd(modern_mr, modern_addr, 0, >> > + false, n, notifier); > This calls for the following change in memory.c: > > static void adjust_endianness(MemoryRegion *mr, uint64_t *data, unsigned size) > { > - if (memory_region_wrong_endianness(mr)) { > + if (size && memory_region_wrong_endianness(mr)) { > > > otherwise we abort on PPC64. > Right, will fix this in V2. Thanks
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index d785623..fbd1f1f 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -226,8 +226,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, } virtio_queue_set_host_notifier_fd_handler(vq, true, set_handler); if (modern) { - memory_region_add_eventfd(modern_mr, modern_addr, 2, - true, n, notifier); + memory_region_add_eventfd(modern_mr, modern_addr, 0, + false, n, notifier); } if (legacy) { memory_region_add_eventfd(legacy_mr, legacy_addr, 2, @@ -235,8 +235,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, } } else { if (modern) { - memory_region_del_eventfd(modern_mr, modern_addr, 2, - true, n, notifier); + memory_region_del_eventfd(modern_mr, modern_addr, 0, + false, n, notifier); } if (legacy) { memory_region_del_eventfd(legacy_mr, legacy_addr, 2,
We use data match eventfd for 1.0 notification currently. This could be slow since software decoding is needed for mmio exit. To speed this up, we can switch to use wild card mmio eventfd for 1.0 notification since we can examine the queue index directly from the writing address. KVM kernel module can utilize this by registering it to fast mmio bus which could be as fast as pio on ept capable machine. Lots of improvements were seen on a ept capable machine: Guest RX:(TCP) size/session/+throughput%/+cpu%/-+per cpu%/ 64/1/+1.6807%/[-16.2421%]/[+21.3984%]/ 64/2/+0.6091%/[-11.0187%]/[+13.0678%]/ 64/4/+0.0553%/[-5.9768%]/[+6.4155%]/ 64/8/+0.1206%/[-4.0057%]/[+4.2984%]/ 256/1/-0.0031%/[-10.1166%]/[+11.2517%]/ 256/2/-0.5058%/[-6.1656%]/+6.0317%]/ ... Guest TX:(TCP) size/session/+throughput%/+cpu%/-+per cpu%/ 64/1/[+18.9183%]/-0.2823%/[+19.2550%]/ 64/2/[+13.5714%]/[+2.2675%]/[+11.0533%]/ 64/4/[+13.1070%]/[+2.1817%]/[+10.6920%]/ 64/8/[+13.0426%]/[+2.0887%]/[+10.7299%]/ 256/1/[+36.2761%]/+6.3434%/[+28.1471%]/ ... 1024/1/[+44.8873%]/+2.0811%/[+41.9335%]/ ... 1024/4/+0.0228%/[-2.2044%]/[+2.2774%]/ ... 16384/2/+0.0127%/[-5.0346%]/[+5.3148%]/ ... 65535/1/[+0.0062%]/[-4.1183%]/[+4.3017%]/ 65535/2/+0.0004%/[-4.2311%]/[+4.4185%]/ 65535/4/+0.0107%/[-4.6106%]/[+4.8446%]/ 65535/8/-0.0090%/[-5.5178%]/[+5.8306%]/ Latency:(TCP_RR) size/session/+transaction rate%/+cpu%/-+per cpu%/ 64/1/[+6.5248%]/[-9.2882%]/[+17.4322%]/ 64/25/[+11.0854%]/[+0.8000%]/[+10.2038%]/ 64/50/[+12.1076%]/[+2.4627%]/[+9.4131%]/ 256/1/[+5.3677%]/[+10.5669%]/-4.7024%/ 256/25/[+5.6402%]/-0.8962%/[+6.5955%]/ 256/50/[+5.9685%]/[+1.7766%]/[+4.1188%]/ 4096/1/+0.2508%/[-10.4941%]/[+12.0047%]/ 4096/25/[+1.8533%]/-0.0273%/+1.8812%/ 4096/50/[+1.2156%]/-1.4134%/+2.6667%/ Notes: data with '[]' is the one whose significance is greater than 95%. Thanks Wenli Quan <wquan@redhat.com> for the benchmarking. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> --- hw/virtio/virtio-pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)